Loading…
Friday May 29, 2026 11:30am - 12:45pm CEST
Scraping data from the Internet has become a key skill for many investigations and reporting projects that rely on data. Building custom web scrapers used to require solid coding skills but in two recent environmental investigations supported by the Pulitzer Center, we used Large Language Models (LLMs) like ChatGPT, Google Gemini, or Claude to help us build scrapers for online content without much coding skills. This hands-on workshop will teach you how to inspect a website and choose a scraping strategy. Then it will demonstrate, step-by-step, how to build web scrapers that have been used in the investigations. LLM prompts will be shared and participants can follow along to create their first custom web scraper.

After attending you will understand website structure for scraping and be able to use LLMs to build basic web scrapers.

Participants should come with their own laptops, register a free account on any of the main LLMs (e.g. ChatGPT, Google Gemini, Claude) and have a free Google Colab account at colab.research.google.com.

No coding skill is required but basic familiarity with LLMs is recommended.
Speakers
avatar for Kuang Keng Kuek Ser

Kuang Keng Kuek Ser

Senior Editor for Rainforest Investigations, Pulitzer Center
Kuang Keng Kuek Ser is the Senior Editor for Rainforest Investigations at the Pulitzer Center, a non-profit organization based in Washington, DC that supports independent journalists globally. He supports and mentors three fellowships investigating issues related to tropical rainforest... Read More →
avatar for Anastasiia Morozova

Anastasiia Morozova

Data and investigative journalist, Onet.pl/Ringier Axel Springer
I’m a data and investigative journalist with a background in tracking Russian influence, desinformation operations and sanctions evasion in Europe. I’m especially interested in projects where I can combine data analysis and visual storytelling to expose hidden networks or financial... Read More →
Friday May 29, 2026 11:30am - 12:45pm CEST
3.02

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link