Loading…
Saturday May 30, 2026 4:15pm - 4:45pm CEST
Data journalism has always relied on clean, structured data; but cleaning messy datasets remains one of the most time-consuming parts of the workflow. Enter OpenRefine, our old buddy for data wrangling, now enhanced by Large Language Models (LLMs).

In this 20-minute session, we explore how combining OpenRefine’s powerful transformation capabilities with modern AI unlocks new possibilities for journalists. Using the open-source LLM extension for OpenRefine, we’ll demonstrate practical workflows for:
- Automated Enrichment: Extracting entities, categorizing content, and enriching records using natural language prompts.
- Smart Disambiguation: Resolving inconsistencies and matching fuzzy data with AI-assisted reconciliation.
- Rapid Prototyping: Turning raw, unstructured text into structured datasets ready for investigation

Why This Matters Now: Journalists are increasingly working with large, messy datasets, from leaked documents to public records.

While LLMs offer powerful analysis, they often lack precision on structured data. OpenRefine provides that precision. Together, they create a workflow that is both scalable and auditable; critical for investigative reporting where accuracy is non-negotiable.

What Attendees Will Take Away:
- A clear understanding of how to integrate LLMs into existing OpenRefine workflows.
- Practical examples relevant to journalistic investigations (entity extraction, classification, enrichment).

To attend this session, participants should have experience with data cleaning
Speakers
avatar for Herve Letoqueux

Herve Letoqueux

OpenFacto
Co-Founder of OpenFacto with Lou (@CapteursOuverts) and Aliaume (@yaolri), a french NGO dedicated to online investigation for journalists and activists, I love OpenSource researches, Python, Gephi, R and OpenRefine. I used to deal with money laundering, financial frauds and terrorism... Read More →
Saturday May 30, 2026 4:15pm - 4:45pm CEST
1.04

Attendees (2)


Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link