Name: Beyond data cleaning: Enhancing OpenRefine with LLM
Start: 2026-05-30T16:15:00+0200
End: 2026-05-30T16:45:00+0200

Beyond data cleaning: Enhancing OpenRefine with LLM

Saturday May 30, 2026 4:15pm - 4:45pm CEST

3.05

Data journalism has always relied on clean, structured data; but cleaning messy datasets remains one of the most time-consuming parts of the workflow. Enter OpenRefine, our old buddy for data wrangling, now enhanced by Large Language Models (LLMs).

In this 30-minute session, we explore how combining OpenRefine’s powerful transformation capabilities with modern AI unlocks new possibilities for journalists. Using the open-source LLM extension for OpenRefine, we’ll demonstrate practical workflows for:
- Automated Enrichment: Extracting entities, categorizing content, and enriching records using natural language prompts.
- Smart Disambiguation: Resolving inconsistencies and matching fuzzy data with AI-assisted reconciliation.
- Rapid Prototyping: Turning raw, unstructured text into structured datasets ready for investigation

Why This Matters Now: Journalists are increasingly working with large, messy datasets, from leaked documents to public records. While LLMs offer powerful analysis, they often lack precision on structured data. OpenRefine provides that precision. Together, they create a workflow that is both scalable and auditable; critical for investigative reporting where accuracy is non-negotiable.

What Attendees Will Take Away:
- A clear understanding of how to integrate local LLMs into existing OpenRefine workflows in a secure, even disconnected environment.
- Practical examples relevant to journalistic investigations (entity extraction/transforms, classification, enrichment).

To attend this session, participants should have a little experience with data cleaning with OpenRefine.

Repo on github : https://github.com/herve-checkfirst/DataHarvest2026-Refine_with_llm

Please note that the model Ministral 3b is a 2GB model. Feel free to read the tutorial and install everything prior to the session.

Speakers

Hervé Letoqueux

CEO, Checkfirst

CEO of Check First, a finnish company working on regulation (DSA...), FIMI, OSINT investigations and technology for CSOs, and journos. Former head of operations at VIGINUM, France. Also Co-Founder of OpenFacto, a french NGO dedicated to online investigation for journalists and activists... Read More →

refine llm en pdf

Saturday May 30, 2026 4:15pm - 4:45pm CEST
3.05

Data skills, Mini

Dataharvest 2026 - the European Investigative Journalism Conference

Hervé Letoqueux

Attendees (30)

Get help with the event

Dataharvest 2026 - the European Investigative Journalism Conference

Hervé Letoqueux

Attendees (30)

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Get help with the event