Marija Selakovic is a developer advocate at Crate.io, working with the CrateDB database and various other data engineering tools. She holds a Ph.D. degree in computer science from TU Darmstadt and a Master's degree in software engineering from VU University Amsterdam. As a developer advocate, Marija builds various technical content, speaks at developer conferences, and helps other software developers be productive and successful in using CrateDB.
With the rise of complex data solutions, the need for automating and orchestrating database processes is becoming increasingly important: there are more and more use cases when a database change requires a chain of operations where the execution of an operation depends on the execution of previous ones. Furthermore, to quickly develop and adapt database orchestrations, one must use the tools that offer scalable solutions that are easy to monitor and manage.
This talk will illustrate how easy it is to automate orchestration workflows with Apache Airflow and CrateDB. Apache Airflow is one of the most popular platforms for programmatically creating, scheduling, and monitoring workflows. The workflows are defined as directed acyclic graphs (DAGs) where each node in a DAG represents an execution task. Initially, Airflow was designed in a way that each task run independently. As of Airflow 2.3, a dynamic task mapping feature has been introduced, making Airflow a perfect solution for building dynamic workflows.
On the other hand, CrateDB is an open-source, distributed database that makes storage and analysis of massive amounts of data simple and efficient. CrateDB offers a high degree of scalability, flexibility, and availability. One of CrateDB’s key strengths is its compatibility with many data engineering tools, including Apache Airflow.
In this talk, you will learn how to set up a new orchestration project, and how to use Airflow with CrateDB to orchestrate complex tasks.