Loading…
Venue: 3.04 clear filter
arrow_back View All Dates
Saturday, May 30
 

9:30am CEST

Scraping with a browser emulator
Saturday May 30, 2026 9:30am - 10:45am CEST
You need to harvest data from a Web site. But there's no download button. It's time to scrape! There are many options, but one of the most consistently effective is launching an automated browser. You tell the browser where to go and what to click, and when to ingest the content. To follow along, participants should have some knowledge of coding in any language.

Participants will come away from this class knowing the basics of Web scraping with a browser emulator. To follow along, participants should have R Studio installed https://posit.co/download/rstudio-desktop/, create a new project, download this file selenium-server-standalone-3.5.3.jar into the project directory, and have the appropriate Chrome binary downloaded into the directory https://googlechromelabs.github.io/chrome-for-testing/last-known-good-versions-with-downloads.json
Speakers
avatar for Robert Gebeloff

Robert Gebeloff

Reporter, New York Times
Robert Gebeloff has worked as a data projects reporter for The New York Times since 2008 and has taught data journalism for many years in newsrooms and at conferences. He was co-winner of the George Polk Award in 2015 and was a Pulitzer Prize finalist in both 2015 and 2016 for projects... Read More →
avatar for Simon Wörpel

Simon Wörpel

Director of Technology, Data and Research Center – DARC

Saturday May 30, 2026 9:30am - 10:45am CEST
3.04

11:15am CEST

Choosing the right web scraping strategy
Saturday May 30, 2026 11:15am - 12:30pm CEST
Web scraping is a powerful way to access otherwise unavailable data, but it’s becoming more complex as websites deploy defenses like Captchas and anti-bot systems. At SWR Data Lab, we’ve tackled this across investigations ranging from Google price comparisons to healthcare platforms and social media scraping, each requiring a different approach. In this session, we share a practical decision framework for choosing the right scraping strategy based on robustness, cost, and maintainability.

In this session, we will present a decision framework for selecting the right scraping strategy based on our learnings. Rather than promoting a single tool, we want to focus on choosing the right approach for your use case, considering robustness, cost, and maintainability in a newsroom context. Using real examples, we walk through our workflow: from analyzing sites with dev tools to selecting between HTTP scraping, browser automation, and advanced tools—along with best practices and when paid services are worth it.

To follow along, you should have some experience in scraping and, ideally, Python. The participants will be able to extend their toolkit, make smarter choices in their scraping workflow, and handle real-world obstacles efficiently. No special tools are required to follow along
Speakers
avatar for Stephanie Jauss

Stephanie Jauss

SWR Data Lab
Stephanie Jauss is a data reporter at the German public broadcaster SWR. She studied Computer Science and Media in Stuttgart as well as Investigative Journalism in Gothenburg.
VS

Verena Steinacher

Data Engineer, pub.tech
Saturday May 30, 2026 11:15am - 12:30pm CEST
3.04

1:45pm CEST

Turning raw data into reliable sources: Python for journalists
Saturday May 30, 2026 1:45pm - 3:00pm CEST
Have you ever tried to investigate how much groceries or rent in your city really impact people’s budgets? Journalists don’t always get all the data in one place. Often, we find it in ads, public announcements, or different sources, then clean, structure, and track it over time, compare it with other datasets, or monitor changes to uncover trends.

This hands-on workshop teaches journalists how to clean, transform, and structure real newsroom data using Python. Participants will learn practical techniques to handle messy data, including changing data types, filtering by values or dates, splitting columns, labeling and recoding, calculating averages and percentages, and extracting quantities from text fields. The session also covers tasks specific to regional datasets, such as converting scripts from Cyrillic to Latin.

With these skills, journalists can analyze grocery prices and compare them with income data or calculate meal costs to report on rising food prices, examine traffic accident data near schools, or track public officials’ gifts and benefits. By the end of the workshop, participants will have concrete tools and workflows to turn raw data into reliable sources ready for investigation and reporting.

To follow along, participants should have some experience with Python basics and working with datasets. After attending this session, participants will be able to turn messy data into clean, reliable sources, compare thousands of entries, and extract insights for investigative stories using Python. Participants should have Python installed on their own computers to follow along. This tutorial can also be accessed via Google Colab, where most of the steps are similar, though Python installed locally is the recommended option for a smoother experience.
Speakers
avatar for Teodora Curcic

Teodora Curcic

BBC
Teodora Ćurčić is an investigative and data journalist from Serbia with over seven years of experience reporting on corruption, political finance, gender-based violence, and social justice. She spent most of her career at the award-winning Center for Investigative Journalism of... Read More →
VS

Verena Steinacher

Data Engineer, pub.tech
Saturday May 30, 2026 1:45pm - 3:00pm CEST
3.04

5:15pm CEST

"The Mechelen Connection" (Escape Room)
Saturday May 30, 2026 5:15pm - 5:45pm CEST
There's a (genuine) story hiding in plain sight. Using a mixture of OSINT skills and clues hidden in the room, you will have 30 minutes to get to the story. First come, first served, max 3-4 teams per session competing to get the exit code and get out!
Speakers
avatar for Jonathan Stoneman

Jonathan Stoneman

Arena for Journalism in Europe
Former BBC journalist, turned datajournalist, trainer, consultant. Works with Arena as Lead Trainer, Arena Academy. 
Saturday May 30, 2026 5:15pm - 5:45pm CEST
3.04

6:00pm CEST

"The Mechelen Connection" (Escape Room - 2nd session)
Saturday May 30, 2026 6:00pm - 6:30pm CEST
There's a (genuine) story hiding in plain sight. Using a mixture of OSINT skills and clues hidden in the room, you will have 30 minutes to get to the story. First come, first served, max 3-4 teams per session competing to get the exit code and get out!
Speakers
avatar for Jonathan Stoneman

Jonathan Stoneman

Arena for Journalism in Europe
Former BBC journalist, turned datajournalist, trainer, consultant. Works with Arena as Lead Trainer, Arena Academy. 
Saturday May 30, 2026 6:00pm - 6:30pm CEST
3.04
 
Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.
Filtered by Date -