Dataharvest 2026 - the European Investigative Journalism Conference: Full Schedule

arrow_back View All Dates

9:30am CEST

Sunday May 31, 2026 9:30am - 10:45am CEST

Google search is all the same since 1996? No, Google does change over time, but so slow that most people will not notice. The session will give you an update about recent changes (i.e. in the last 4-5 years), will point at workarounds where necessary and will show you what is really new and useful. Towards the end of the session it will give you some advanced Google dorks for immediate journalistic use, but also inspire you to build your own dorks and how to combine LLMs and Google searches.

To follow along, the participants should have used google operators before. After attending the session, you will have an up to date knowledge of Googles web search and other tools for journalistic use.

A Google account can be useful, but is not a must-have.

Speakers

Marcus Lindemann

managing editor / geschäftsführender Autor, autoren(werk) GmbH & Co.KG

Marcus Lindemann is a lecturer in research, television journalism and media law, as well as managing director of the TV production company autoren(werk). For 25 years, he has been producing magazine features and documentaries for public service broadcasters, particularly on economic... Read More →

Sunday May 31, 2026 9:30am - 10:45am CEST
3.05

Data skills, Workshop

11:15am CEST

A map for every reader: how to generate hundreds of images for multiple audiences or partners using QGIS and Python

Sunday May 31, 2026 11:15am - 12:30pm CEST

3.02

The BBC Shared Data Unit wanted to generate a map image for each authority in the UK showing the state of flood defences in that area — so they turned to the mapping tool QGIS’s built-in Python functionality.

In this session, you will learn how to generate and export dozens of maps in QGIS centred at different points, and how AI can help speed up the process.

To follow along, participants should have some basic knowledge of QGIS and be comfortable using Python or vibe coding.

After attending this session, participants should be able to understand how Python works in QGIS and use AI to help generate, understand, and adapt code. Participants should have QGIS and Python installed on the computer (qgis.org/download + python.org/downloads) and a free account with an AI tool such as ChatGPT, Gemini, or Claude

Materials: https://paulbradshaw.github.io/QGIS_param/

Speakers

Paul Bradshaw

Journalist and Academic, BBC/Birmingham City University

Paul Bradshaw runs the MA in Data Journalism at Birmingham City University and also works as a consulting data journalist with the BBC Shared Data Unit. A journalist, writer and trainer, he has worked with news organisations including The Guardian, Telegraph, Mirror, Der Tagesspiegel... Read More →

Ioanna Petsiou

Data Journalist, Freelancer

Ioanna Petsiou is an investigative data journalist working across data analysis, satellite imagery, and mapping to uncover and explain complex stories. She is particularly drawn to environmental reporting and to building clear, reproducible ways of working with data that others can... Read More →

Sunday May 31, 2026 11:15am - 12:30pm CEST
3.02

Data skills, Workshop

11:15am CEST

Text embeddings: navigating text in high dimensions

Sunday May 31, 2026 11:15am - 12:30pm CEST

1.16

Keyword search is great when you know what you're looking for. But what about the structure you don't know is there? The topics, the patterns, the outliers?

This workshop introduces a workflow for making large text collections navigable. The core idea is simple: a machine learning model reads your text and converts each piece into a list of numbers — an embedding — where texts with similar meaning get similar numbers. Once everything is numbers, you can do maths on meaning.

That lets you do a couple things, for example:

Semantically search your corpus for documents with similar meaning to your query
Project it into two dimensions with UMAP, so you can plot your documents on a scatter chart and see where they cluster and where they don't
Find natural groupings with HDBSCAN, a clustering algorithm that discovers groups, and flags documents that don't fit anywhere

The session is hands-on and code-based. You don't need prior experience with machine learning, we'll explain what's happening at each step. By the end, you'll understand how to take a pile of text and turn it into a visual map you can explore.

What to bring: a Google account for Colab. We'll provide example data to work with. You're welcome to bring your own, as long as it's in a text format.

Materials: https://resolveworks.github.io/dataharvest2026/#/

Speakers

Johan Schujit

Data Engineer, Resolve.

I'm a data engineer responsible for EveryPolitician and PoliLoom at OpenSanctions. I'm a self-taught hacker with a stubborn belief that good data should be open and technology should serve the public interest. Previously at Follow the Money.

Ada Homolova

Coordinator of the data skills track, Arena for Journalism in Europe

A freelance data journalist with over 10 years of experience in data and investigative journalism, cross-border reporting, and teaching. She has worked with both small and large newsrooms across Europe, including Correctiv, Follow The Money, OCCRP, and Lost in Europe. Her heart beats... Read More →

Sunday May 31, 2026 11:15am - 12:30pm CEST
1.16

Data skills, Workshop

Dataharvest 2026 - the European Investigative Journalism Conference

9:30am CEST

Marcus Lindemann

11:15am CEST

Paul Bradshaw

Ioanna Petsiou

11:15am CEST

Johan Schujit

Ada Homolova

Get help with the event