Apache Airflow

This topic provides an overview of Apache Airflow on HPE Ezmeral Data Fabric.

Starting from EEP 8.1.0, HPE Ezmeral Data Fabric supports Apache Airflow on core 6.2.x and core 7.0.0.

You can use Airflow to author, schedule, or monitor workflows or data pipelines.

The following image shows the Apache Airflow workflow:


A workflow is a Directed Acyclic Graph (DAG) of tasks used to handle big data processing pipelines. The workflows are started on a schedule or triggered by an event. DAGs define the order to run the tasks or rerun in case of failures. The tasks define the actions to be performed, like ingest, monitor, report, and others.

Airflow Architecture

The following image shows the Apache Airflow Architecture:


Airflow Components

Airflow consists of the following components:
Scheduler
Triggers the scheduled workflows and submits the tasks to an executor to run.
Executor
Executes the tasks or delegates the tasks to workers for execution.
Worker
Executes the tasks.
Web Server
Provides a user interface to analyze, schedule, monitor, and visualize the tasks and DAG. The Web Server enables you to manage users, roles, and set configuration options.
DAG Directory
Contains DAG files read by Scheduler, Executor, and Web Server.
Metadata Database
Stores the metadata about DAGs’ state, runs, and Airflow configuration options.

To learn more about Airflow, see Airflow Concepts.