The installation is quick and straightforward.
# airflow needs a home, ~/airflow is the default, # but you can lay foundation somewhere else if you prefer # (optional) export AIRFLOW_HOME=~/airflow # install from pypi using pip pip install apache-airflow # initialize the database airflow initdb # start the web server, default port is 8080 airflow webserver -p 8080 # start the scheduler airflow scheduler # visit localhost:8080 in the browser and enable the example dag in the home page
Upon running these commands, Airflow will create the
and lay an “airflow.cfg” file with defaults that get you going fast. You can
inspect the file either in
$AIRFLOW_HOME/airflow.cfg, or through the UI in
Admin->Configuration menu. The PID file for the webserver will be stored
$AIRFLOW_HOME/airflow-webserver.pid or in
if started by systemd.
Out of the box, Airflow uses a sqlite database, which you should outgrow
fairly quickly since no parallelization is possible using this database
backend. It works in conjunction with the
airflow.executors.sequential_executor.SequentialExecutor which will
only run task instances sequentially. While this is very limiting, it allows
you to get up and running quickly and take a tour of the UI and the
command line utilities.
Here are a few commands that will trigger a few task instances. You should
be able to see the status of the jobs change in the
example1 DAG as you
run the commands below.
# run your first task instance airflow run example_bash_operator runme_0 2015-01-01 # run a backfill over 2 days airflow backfill example_bash_operator -s 2015-01-01 -e 2015-01-02