Loading…
Venue: 3.05 clear filter
arrow_back View All Dates
Saturday, May 30
 

9:30am CEST

AI-Assisted OSINT: Automating the investigative workflow
Saturday May 30, 2026 9:30am - 10:45am CEST
Most investigative workflows still rely on manually juggling dozens of tools. In this session, we'll walk through a live demo of a semi-automated pipeline built for real casework: web search and archiving with Playwright, face extraction, reverse image search, database cross-referencing with Telegram bots, social media analysis, and structured reporting via Obsidian mcp. All of this is orchestrated by Claude, an AI layer you can teach your own investigative methodology. At the end, participants will work through a simplified case using a workflow of their own.

Before the session, please install: Python, Claude Code. This session will teach participants to combine several smaller OSINT tools so they work together efficiently without requiring much manual effort. No special tools needed
Speakers
avatar for Anastasiia Morozova

Anastasiia Morozova

Data and investigative journalist, Onet.pl/Ringier Axel Springer
I’m a data and investigative journalist with a background in tracking Russian influence, desinformation operations and sanctions evasion in Europe. I’m especially interested in projects where I can combine data analysis and visual storytelling to expose hidden networks or financial... Read More →
avatar for Leopold Salzenstein

Leopold Salzenstein

Data coordinator, Arena for Journalism in Europe
Leopold Salzenstein is a freelance investigative data journalist and trainer based in the south of France. At Arena, he coordinates the handling of data for publications and trainings. He is also a member of the collective of journalists Environmental Investigative Forum (EIF).

... Read More →
Saturday May 30, 2026 9:30am - 10:45am CEST
3.05

11:15am CEST

Embracing agents with Pydantic AI
Saturday May 30, 2026 11:15am - 12:30pm CEST
"Agentic AI" is all the rage, but what does it offer beyond traditional LLM workflows? In this hands-on session we'll answer this question (and more) while leveraging Python's Pydantic AI library to build a start-to-finish agentic AI workflow.

Participants will learn how agents work, when they're useful, how to build custom tools, and options for tracing and evaluation. You'll leave able to write agentic workflows to extract information from texts, do semi-autonomous research, and deliver clean, structured results.

Basic experience with Python/LLMs is helpful but not required. After attending this session, participants will be able to understand when and how to apply agentic approaches to problems. Participants should have Python/Jupyter installed or a Google account for working in the cloud.
Speakers
avatar for Jonathan Soma

Jonathan Soma

Knight Chair in Data Journalism, Columbia University
Jonathan Soma is the Knight Chair in Data Journalism at Columbia University, where he serves as Director of the Data Journalism MS program and the Lede Program, an intensive data journalism summer course. His lectures cover everything from basic Python and data analysis to interactive... Read More →
avatar for Jan van der Burgt

Jan van der Burgt

Investigative coder / AI specialist, Freelance / Open State Foundation
I leverage AI technologies to collect and analyse data at scale, uncovering the hidden patterns that build stories.

Investigative focus: lobbying, government overreach, migration, global food supply chains.
Saturday May 30, 2026 11:15am - 12:30pm CEST
3.05

1:45pm CEST

How to manage mass FOI projects using AI, vibe coding and verification
Saturday May 30, 2026 1:45pm - 3:00pm CEST
Projects involving FOI requests to multiple bodies often create significant challenges, from different file formats and data trapped in PDFs, to organisations providing data in different structures and different levels of detail. To get the big picture often requires data extraction, cleaning, reshaping, and checking.

In this session, we will share a series of tips and tools used to manage one project — including vibe coding with AI — which can be used to make any multi-response FOI project more efficient and accurate. No prior knowledge is required. By the end of this session, attendees should be able to design a data structure for an FOI project, use a range of tools, including AI, to extract, reshape, clean, and combine data from FOI responses, and design a data validation process to check AI outputs.

You will need a laptop with Google Drive and an account with an AI tool such as ChatGPT, Gemini, Claude, or Copilot. Installing Tabula and Open Refine will help you get more out of the session.
Speakers
avatar for Paul Bradshaw

Paul Bradshaw

Journalist and Academic, BBC/Birmingham City University
Paul Bradshaw runs the MA in Data Journalism at Birmingham City University and also works as a consulting data journalist with the BBC Shared Data Unit. A journalist, writer and trainer, he has worked with news organisations including The Guardian, Telegraph, Mirror, Der Tagesspi... Read More →
avatar for Ioanna Petsiou

Ioanna Petsiou

Data Journalist, Freelancer
Ioanna Petsiou is an investigative data journalist working across data analysis, satellite imagery, and mapping to uncover and explain complex stories. She is particularly drawn to environmental reporting and to building clear, reproducible ways of working with data that others can... Read More →
Saturday May 30, 2026 1:45pm - 3:00pm CEST
3.05

3:30pm CEST

Hack your CMS (and the rest of the web!): Tampermonkey 101
Saturday May 30, 2026 3:30pm - 4:00pm CEST
Tampermonkey is an age-old browser extension that allows you to inject scripts and stylesheets into any web page, turning the web into your personal playground. We'll look at how to customize your CMS with DIY features, add "Download all" buttons to paginated websites, automate tedious processes like filling out forms and redesign websites however you'd like. Best of all, Tampermonkey scripts are saveable and sharable, allowing you to give other members of your newsroom superpowers without fiddling with distributing extensions or asking them to run Python scripts. To follow along, participants should be able to install extensions in their web browser of choice.
Speakers
avatar for Jonathan Soma

Jonathan Soma

Knight Chair in Data Journalism, Columbia University
Jonathan Soma is the Knight Chair in Data Journalism at Columbia University, where he serves as Director of the Data Journalism MS program and the Lede Program, an intensive data journalism summer course. His lectures cover everything from basic Python and data analysis to interactive... Read More →
Saturday May 30, 2026 3:30pm - 4:00pm CEST
3.05

4:15pm CEST

No download button? Getting web data without writing a scraper
Saturday May 30, 2026 4:15pm - 4:45pm CEST
Journalists often run into data that is visible on a website but impossible to download directly: a table buried in a government page, a list of public records, or search results that change with every query. Writing a full scraper can be time-consuming and technically demanding for what is often a one-time task.

This session introduces three lightweight approaches that cover most of these cases: reading a table directly from a page using pandas, downloading raw HTML and parsing it into a dataframe and pulling data through network requests. These techniques are practical tools for everyday newsroom situations. Participants will take home a GitHub repository with a working notebook to try on their own data, though some adaptation will be needed to apply it to different websites.

The three approaches vary in complexity. Basic Python knowledge is enough to follow along, but participants with more experience will be able to go further, and the code can be adapted with the help of an LLM.
Speakers
avatar for Teodora Curcic

Teodora Curcic

BBC
Teodora Ćurčić is an investigative and data journalist from Serbia with over seven years of experience reporting on corruption, political finance, gender-based violence, and social justice. She spent most of her career at the award-winning Center for Investigative Journalism of... Read More →
Saturday May 30, 2026 4:15pm - 4:45pm CEST
3.05

5:15pm CEST

Mining data from unstructured documents
Saturday May 30, 2026 5:15pm - 5:45pm CEST
You have a folder of documents and you want to extract data points from each one. And the data isn't in a structured table with neat rows and columns either. Here's where string functions and regular expressions can help. The demonstration will be in R but the skills are generic to all languages.
Speakers
avatar for Robert Gebeloff

Robert Gebeloff

Reporter, New York Times
Robert Gebeloff has worked as a data projects reporter for The New York Times since 2008 and has taught data journalism for many years in newsrooms and at conferences. He was co-winner of the George Polk Award in 2015 and was a Pulitzer Prize finalist in both 2015 and 2016 for projects... Read More →
Saturday May 30, 2026 5:15pm - 5:45pm CEST
3.05

6:00pm CEST

Modern document processing with Natural PDF
Saturday May 30, 2026 6:00pm - 6:30pm CEST
Say hello to Natural PDF, a new Python library for wrangling PDFs that's focused on usability and feature-completeness. Process PDFs with scraping-like selectors and spatially-aware queries, asking for "the red alphanumeric string" or "the content below the big Summary header." Beyond the basics, Natural PDF is also full of modern conveniences like table detection, multiple OCR engines, and citation-aware LLM data extraction.

To get the most out of this session, participants should have experience with Python and struggling with terrible PDFs.
Speakers
avatar for Jonathan Soma

Jonathan Soma

Knight Chair in Data Journalism, Columbia University
Jonathan Soma is the Knight Chair in Data Journalism at Columbia University, where he serves as Director of the Data Journalism MS program and the Lede Program, an intensive data journalism summer course. His lectures cover everything from basic Python and data analysis to interactive... Read More →
Saturday May 30, 2026 6:00pm - 6:30pm CEST
3.05
 
Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.
Filtered by Date -