Data journalism projects often rely on manually executed scripts, spreadsheet updates, or code running on private computers. As investigations become more complex, span longer timeframes, or require regular updates, these methods become inefficient and unsustainable. Automated data pipelines offer a solution to these challenges.
This workshop provides an introduction to Apache Airflow, an open-source platform for automating and managing workflows. The session demonstrates how Airflow can be utilized to efficiently automate data journalism processes—from scraping to creating and updating visualizations. Participants should have basic programming skills.
After attending this session, the participants will know why and when to use automated pipelines and understand the basics of Airflow.
Materials:
https://github.com/Tilana/dataharvest2026_automated_pipelines