Publié in Martin Paul Eve

I am currently conducting a research project at Crossref that requires me to build a database using large backend files (e.g. building a relational database from a 3GB XML file). We need to rebuild this monthly, so Apache Airflow seemed a good tool to run these periodic tasks. There are, however, lots of “gotchas” in this framework that can trip up a newcomer and I thought it might be helpful to document some of these.