Name: Scraping the unscrapable: advanced approaches to deal with complex sites and evade anti-scraping systems
Start: 2026-05-29T14:00:00+0200
End: 2026-05-29T15:15:00+0200

Scraping the unscrapable: advanced approaches to deal with complex sites and evade anti-scraping systems

Friday May 29, 2026 2:00pm - 3:15pm CEST

3.02

Scraped data can often be the backbone of an investigation, but some websites are more difficult to scrape than others. This session will cover how to approach dealing with tricky sites, including coping with captchas, IP blocking, and browser fingerprinting. We'll cover how to figure out what might be preventing you from scraping a site, and what options you have to proceed, with their pros, cons, and costs.

This is an advanced session aimed at people who already have experience of writing code to scrape websites and want to move up to the next level: participants will leave with an understanding of how to deal with hard-to-scrape websites, plus the tradeoffs of different approaches. No tools are required to follow along, just a web browser.

Slides: docs.google

Speakers

Max Harlow

Bloomberg News

Max Harlow is a data reporter at Bloomberg News. He also runs Journocoders, a community group for journalists to develop technical skills for use in their reporting.

Friday May 29, 2026 2:00pm - 3:15pm CEST
3.02

Data skills, Workshop

Dataharvest 2026 - the European Investigative Journalism Conference

Max Harlow

Attendees (28)

Get help with the event

Dataharvest 2026 - the European Investigative Journalism Conference

Max Harlow

Attendees (28)

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Get help with the event