Web scraping is a powerful way to access otherwise unavailable data but the landscape of tools can be overwhelming. We have used scraping to create datasets on many topics like healthcare, rent, social media or Google prices and want to share our learnings with you.
Therefore, we have created a "How to scrape this?" flowchart. If you have some experience in web scraping but wonder: Which tool is best for this scenario? What's the best way to approach a scraping project? Then this session is for you. You'll learn how to prepare for scraping and choose a good strategy based on the use case and while considering robustness, cost, and maintainability. We will share our workflow, real examples and discuss the fit of different tools – from browser plugins, HTTP scraping, browser automation and other helpful tools to paid services.
To follow along, you should have some experience in scraping and, ideally, Python. No special tools are required to follow along.
Stephanie Jauss is a data reporter at the German public broadcaster SWR. She studied Computer Science and Media in Stuttgart as well as Investigative Journalism in Gothenburg.