Web Scraping-Part 2
In the last post I gave an introduction to web scraping (WebScraping : Part 1)and how you can get started with scraping in python. In this post I will show how to scrap twitter data (without api) and extract the following inormation of a user:
- Followers Count
- Following Count
- Tweets Count
To do this lets get started.
To extract the required info from the page source, we can look up that tag using dev tools in browser.
This is sample html that we need to extract.
We can identify it using its class ProfileNav-value.
Now we have list of element that contains the data, but we only need first 3 (Tweets, Following, Followers) items of the list.
In the above code, we go through all the extracted elements and extracts “data-count” attribute value. In the last we are left with a dictionary that contains the required data.
We can optimize the code and make it reusable by clubbing it into function.