Loading…
Type: Data skills clear filter
Saturday, May 30
 

3:30pm CEST

Hack your CMS (and the rest of the web!): Tampermonkey 101
Saturday May 30, 2026 3:30pm - 4:00pm CEST
Tampermonkey is an age-old browser extension that allows you to inject scripts and stylesheets into any web page, turning the web into your personal playground. We'll look at how to customize your CMS with DIY features, add "Download all" buttons to paginated websites, automate tedious processes like filling out forms and redesign websites however you'd like. Best of all, Tampermonkey scripts are saveable and sharable, allowing you to give other members of your newsroom superpowers without fiddling with distributing extensions or asking them to run Python scripts. To follow along, participants should be able to install extensions in their web browser of choice.
Speakers
avatar for Jonathan Soma

Jonathan Soma

Knight Chair in Data Journalism, Columbia University
Jonathan Soma is the Knight Chair in Data Journalism at Columbia University, where he serves as Director of the Data Journalism MS program and the Lede Program, an intensive data journalism summer course. His lectures cover everything from basic Python and data analysis to interactive... Read More →
Saturday May 30, 2026 3:30pm - 4:00pm CEST
3.05

3:30pm CEST

Make a publication-ready static map with QGIS
Saturday May 30, 2026 3:30pm - 4:00pm CEST
In this demo, participants will learn how to create a static map in QGIS that is ready for publication. The session will cover setting map dimensions, selecting a basemap, adding geospatial data, and incorporating key design elements such as text annotations, a north arrow, a scale bar, an inset map, and images. Participants will also learn how to export the finished map as a JPG.

Download and install QGIS on your laptop before the session and confirm that it opens properly. MacBook users who run into security warnings when opening QGIS can follow the workaround here
Speakers
avatar for Kuang Keng Kuek Ser

Kuang Keng Kuek Ser

Senior Editor for Rainforest Investigations, Pulitzer Center
Kuang Keng Kuek Ser is the Senior Editor for Rainforest Investigations at the Pulitzer Center, a non-profit organization based in Washington, DC that supports independent journalists globally. He supports and mentors three fellowships investigating issues related to tropical rainforest... Read More →
Saturday May 30, 2026 3:30pm - 4:00pm CEST
2.03

3:30pm CEST

One template, many stories: Parameterized reports with Quarto
Saturday May 30, 2026 3:30pm - 4:00pm CEST
Learn how to build reusable report templates in Quarto that generate multiple outputs (PDF, HTML, Word documents) from a single source document. By defining parameters — such as a region, time period, or data source — you can produce dozens or even hundreds of tailored reports without duplicating code or copy-pasting results.

This is especially useful for cross-border investigations, where partners share a common dataset, but each team needs a report focused on its own country. Build the analysis once, then render a customized version for each partner with only their slice of the data.

To follow along, participants should have basic familiarity with Quarto, R Markdown, or Jupyter notebooks, and some experience writing code in R or Python.
Speakers
avatar for Leopold Salzenstein

Leopold Salzenstein

Data coordinator, Arena for Journalism in Europe
Leopold Salzenstein is a freelance investigative data journalist and trainer based in the south of France. At Arena, he coordinates the handling of data for publications and trainings. He is also a member of the collective of journalists Environmental Investigative Forum (EIF).

... Read More →
Saturday May 30, 2026 3:30pm - 4:00pm CEST
1.04

4:15pm CEST

Beyond data cleaning: Enhancing OpenRefine with LLM
Saturday May 30, 2026 4:15pm - 4:45pm CEST
Data journalism has always relied on clean, structured data; but cleaning messy datasets remains one of the most time-consuming parts of the workflow. Enter OpenRefine, our old buddy for data wrangling, now enhanced by Large Language Models (LLMs).

In this 20-minute session, we explore how combining OpenRefine’s powerful transformation capabilities with modern AI unlocks new possibilities for journalists. Using the open-source LLM extension for OpenRefine, we’ll demonstrate practical workflows for:
- Automated Enrichment: Extracting entities, categorizing content, and enriching records using natural language prompts.
- Smart Disambiguation: Resolving inconsistencies and matching fuzzy data with AI-assisted reconciliation.
- Rapid Prototyping: Turning raw, unstructured text into structured datasets ready for investigation

Why This Matters Now: Journalists are increasingly working with large, messy datasets, from leaked documents to public records.

While LLMs offer powerful analysis, they often lack precision on structured data. OpenRefine provides that precision. Together, they create a workflow that is both scalable and auditable; critical for investigative reporting where accuracy is non-negotiable.

What Attendees Will Take Away:
- A clear understanding of how to integrate LLMs into existing OpenRefine workflows.
- Practical examples relevant to journalistic investigations (entity extraction, classification, enrichment).

To attend this session, participants should have experience with data cleaning
Speakers
avatar for Herve Letoqueux

Herve Letoqueux

OpenFacto
Co-Founder of OpenFacto with Lou (@CapteursOuverts) and Aliaume (@yaolri), a french NGO dedicated to online investigation for journalists and activists, I love OpenSource researches, Python, Gephi, R and OpenRefine. I used to deal with money laundering, financial frauds and terrorism... Read More →
Saturday May 30, 2026 4:15pm - 4:45pm CEST
1.04

4:15pm CEST

From 007 to n8n - build your own no-code AI Agents
Saturday May 30, 2026 4:15pm - 4:45pm CEST
With so-called low-code platforms like n8n, you can quickly click together programs that would otherwise require tedious Python coding. And you can integrate LLMs at various points to, for example, extract information from texts or summarize content. This allows you to build complex workflows. Receive a Teams message from an agent when a nearby river level approaches extreme values? No problem! Automatically monitor the police website for accident reports and generate suggestions for brief news items? With n8n, this can be automated quickly. This workshop provides an introduction to the free platform n8n. No prior knowledge is expected.
Speakers
avatar for Claus Hesseling

Claus Hesseling

Freier Journalist und Trainer
Macht Daten-Sachen für den NDR und HR, erfindet für die Interlink-Academy im EU-Projekt INJECT Tools für Newsrooms, ist Trainer bei der ARD.ZDF-Medienakademie und anderen. Twitter: @the_claus... Read More →
Saturday May 30, 2026 4:15pm - 4:45pm CEST
2.03

4:15pm CEST

No download button? Getting web data without writing a scraper
Saturday May 30, 2026 4:15pm - 4:45pm CEST
Journalists often run into data that is visible on a website but impossible to download directly: a table buried in a government page, a list of public records, or search results that change with every query. Writing a full scraper can be time-consuming and technically demanding for what is often a one-time task.

This session introduces three lightweight approaches that cover most of these cases: reading a table directly from a page using pandas, downloading raw HTML and parsing it into a dataframe and pulling data through network requests. These techniques are practical tools for everyday newsroom situations. Participants will take home a GitHub repository with a working notebook to try on their own data, though some adaptation will be needed to apply it to different websites.

The three approaches vary in complexity. Basic Python knowledge is enough to follow along, but participants with more experience will be able to go further, and the code can be adapted with the help of an LLM.
Speakers
avatar for Teodora Curcic

Teodora Curcic

BBC
Teodora Ćurčić is an investigative and data journalist from Serbia with over seven years of experience reporting on corruption, political finance, gender-based violence, and social justice. She spent most of her career at the award-winning Center for Investigative Journalism of... Read More →
Saturday May 30, 2026 4:15pm - 4:45pm CEST
3.05

5:15pm CEST

How to look up named entities in text – fast
Saturday May 30, 2026 5:15pm - 5:45pm CEST
Have you ever stumbled at the problem "I have a bunch of documents, give me all the politicians named in it"? If yes, you know the hassle: NER is noisy, and to qualify names (Is this a politician or not) requires external services, APIs or a large language model.

Or, use "Juditha": It's an open source poor mans entity extraction and resolution tool. No external service required, just put in your list of names and then extract them from arbitrary unstructured content. Works on any laptop, super fast. Of course it works with names of criminals, too. Or company names. Whatever you need.

In this session I'll walk through how to use the "juditha" command line and how to populate it with names of interest. At the end, anyone can take it home to detect the names that matter in your material.

Knowledge about how to use a command line and install python packages helps. If you ever suffered the problems about named entity recognition, you'll have even more fun.

Speakers
avatar for Simon Wörpel

Simon Wörpel

Director of Technology, Data and Research Center – DARC

Saturday May 30, 2026 5:15pm - 5:45pm CEST

5:15pm CEST

Mining data from unstructured documents
Saturday May 30, 2026 5:15pm - 5:45pm CEST
You have a folder of documents and you want to extract data points from each one. And the data isn't in a structured table with neat rows and columns either. Here's where string functions and regular expressions can help. The demonstration will be in R but the skills are generic to all languages.
Speakers
avatar for Robert Gebeloff

Robert Gebeloff

Reporter, New York Times
Robert Gebeloff has worked as a data projects reporter for The New York Times since 2008 and has taught data journalism for many years in newsrooms and at conferences. He was co-winner of the George Polk Award in 2015 and was a Pulitzer Prize finalist in both 2015 and 2016 for projects... Read More →
Saturday May 30, 2026 5:15pm - 5:45pm CEST
3.05

6:00pm CEST

Bluetooth Trackers for Investigations
Saturday May 30, 2026 6:00pm - 6:30pm CEST
Bluetooth trackers can help you develop interesting investigations. This team started using trackers while following two cars from Germany to Siberia, then a parcel from Prague to Moscow. In late 2024, they tracked more than 230 letters sent within Germany, using up to 80 trackers simultaneously. For almost 18 months they tracked 24 items of electronic waste from Germany to places as far afield as Pakistan.

In this session, the team will share the learnings and the technology behind all these projects and the scraping tools and software behind them. They will also bring some trackers and covers to inspire colleagues to use these devices, and share lessons learnt from ongoing collaborations in various countries where other journalists and newsrooms licensed them to help them move their projects forward.
Speakers
avatar for Marcus Lindemann

Marcus Lindemann

geschäftsführender Autor, autoren(werk) GmbH & Co.KG
Marcus Lindemann ist Dozent für Recherche, TV-Journalismus und Presserecht sowie geschäftsführender Autor der TV-Produktionsfirma autoren(werk). Seit 25 Jahren produziert er Magazinbeiträge und Dokumentationen für öffentlich-rechtliche Sender, insbesondere zu Wirtschafts- und... Read More →
Saturday May 30, 2026 6:00pm - 6:30pm CEST

6:00pm CEST

Modern document processing with Natural PDF
Saturday May 30, 2026 6:00pm - 6:30pm CEST
Say hello to Natural PDF, a new Python library for wrangling PDFs that's focused on usability and feature-completeness. Process PDFs with scraping-like selectors and spatially-aware queries, asking for "the red alphanumeric string" or "the content below the big Summary header." Beyond the basics, Natural PDF is also full of modern conveniences like table detection, multiple OCR engines, and citation-aware LLM data extraction.

To get the most out of this session, participants should have experience with Python and struggling with terrible PDFs.
Speakers
avatar for Jonathan Soma

Jonathan Soma

Knight Chair in Data Journalism, Columbia University
Jonathan Soma is the Knight Chair in Data Journalism at Columbia University, where he serves as Director of the Data Journalism MS program and the Lede Program, an intensive data journalism summer course. His lectures cover everything from basic Python and data analysis to interactive... Read More →
Saturday May 30, 2026 6:00pm - 6:30pm CEST
3.05
 
Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.