Name: Using LLMs in R to expand and categorise your datasets: the Ellmer package
Start: 2026-05-29T14:00:00+0200
End: 2026-05-29T15:15:00+0200

Using LLMs in R to expand and categorise your datasets: the Ellmer package

Friday May 29, 2026 2:00pm - 3:15pm CEST

1.04

Large language models can do more than generate text – they can help clean and structure messy data files as well as enrich datasets. As LLMs increasingly become a useful tool for data journalists, the Ellmer package is a useful resource for R users to easily work with LLMs. The Guardian data team has used the Ellmer R package to clean and organise thousands of emails from the Epstein files, to investigate private equity firms in the United Kingdom, and to classify recipients of climate finance.

Using some of these examples, attendees will learn when this package can be the perfect tool for your investigation, which are the good practices when using LLMs, how to connect to an API of an LLM, how to write an efficient prompt, how to submit the prompts in bulk using the batch function for structured data and how to evaluate your results and iterate for improvements.

This is an advanced R session and we will assume that attendees have some prior knowledge of R.

Speakers

Carmen Aguilar Garcia

Data Projects Editor, The Guardian

Data journalist and data projects editor at The Guardian. I work on a variety of subjects - always finding the data angle in every story. Scraping, cleaning, data analysis, but above all JOURNALISM!

Michael Goodier

Friday May 29, 2026 2:00pm - 3:15pm CEST
1.04

Data journalism, Presentation

Dataharvest 2026 - the European Investigative Journalism Conference

Carmen Aguilar Garcia

Michael Goodier

Attendees (19)

Get help with the event

Dataharvest 2026 - the European Investigative Journalism Conference

Carmen Aguilar Garcia

Michael Goodier

Attendees (19)

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Get help with the event