Name: Turning raw data into reliable sources: Python for journalists
Start: 2026-05-30T13:45:00+0200
End: 2026-05-30T15:00:00+0200

Turning raw data into reliable sources: Python for journalists

Saturday May 30, 2026 1:45pm - 3:00pm CEST

3.04

Have you ever tried to investigate how much groceries or rent in your city really impact people’s budgets? Journalists don’t always get all the data in one place. Often, we find it in ads, public announcements, or different sources, then clean, structure, and track it over time, compare it with other datasets, or monitor changes to uncover trends.

This hands-on workshop teaches journalists how to clean, transform, and structure real newsroom data using Python. Participants will learn practical techniques to handle messy data, including changing data types, filtering by values or dates, splitting columns, labeling and recoding, calculating averages and percentages, and extracting quantities from text fields. The session also covers tasks specific to regional datasets, such as converting scripts from Cyrillic to Latin.

With these skills, journalists can analyze grocery prices and compare them with income data or calculate meal costs to report on rising food prices, examine traffic accident data near schools, or track public officials’ gifts and benefits. By the end of the workshop, participants will have concrete tools and workflows to turn raw data into reliable sources ready for investigation and reporting.

To follow along, participants should have some experience with Python basics and working with datasets. After attending this session, participants will be able to turn messy data into clean, reliable sources, compare thousands of entries, and extract insights for investigative stories using Python. Participants should have Python installed on their own computers to follow along. This tutorial can also be accessed via Google Colab, where most of the steps are similar, though Python installed locally is the recommended option for a smoother experience.

Materials: https://github.com/teodoracurcic/dh2026-data-cleaning

Speakers

Teodora Curcic

BBC

Teodora Ćurčić is an investigative and data journalist from Serbia with over seven years of experience reporting on corruption, political finance, gender-based violence, and social justice. She spent most of her career at the award-winning Center for Investigative Journalism of... Read More →

Verena Steinacher

Data Engineer, pub.tech

Saturday May 30, 2026 1:45pm - 3:00pm CEST
3.04

Data skills, Workshop

Dataharvest 2026 - the European Investigative Journalism Conference

Teodora Curcic

Verena Steinacher

Attendees (11)

Get help with the event

Dataharvest 2026 - the European Investigative Journalism Conference

Teodora Curcic

Verena Steinacher

Attendees (11)

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Get help with the event