Security

Reporting Vulnerabilities

⚠️ Please do not file Github issues for security vulnerabilities as they are public! ⚠️

The Apache Software Foundation takes security issues very seriously. Apache Airflow specifically offers security features and is responsive to issues around its features. If you have any concern around Airflow Security or believe you have uncovered a vulnerability, we suggest that you get in touch via the e-mail address security@apache.org. In the message, try to provide a description of the issue and ideally a way of reproducing it. The security team will get back to you after assessing the description.

Note that this security address should be used only for undisclosed vulnerabilities. Dealing with fixed issues or general questions on how to use the security features should be handled regularly via the user and the dev lists. Please report any security problems to the project security address before disclosing it publicly.

The ASF Security team’s page describes how vulnerability reports are handled, and includes PGP keys if you wish to use that.

Web Authentication

By default, Airflow requires users to specify a password prior to login. You can use the following CLI commands to create an account:

# create an admin user
airflow users create --username admin --firstname Peter --lastname Parker --role Admin --email [email protected]

It is however possible to switch on authentication by either using one of the supplied backends or creating your own.

Be sure to checkout Developer Interface for securing the API.

Note

Airflow uses the config parser of Python. This config parser interpolates ‘%’-signs. Make sure escape any % signs in your config file (but not environment variables) as %%, otherwise Airflow might leak these passwords on a config parser exception to a log.

Password

One of the simplest mechanisms for authentication is requiring users to specify a password before logging in.

Please use command line interface airflow users create to create accounts, or do that in the UI.

Other Methods

Standing on the shoulder of underlying framework Flask-AppBuilder, Airflow also supports authentication methods like OAuth, OpenID, LDAP, REMOTE_USER. You can configure in webserver_config.py. For details, please refer to Security section of FAB documentation.

API Authentication

Authentication for the API is handled separately to the Web Authentication. The default is to not require any authentication on the API i.e. wide open by default. This is not recommended if your Airflow webserver is publicly accessible, and you should probably use the deny all backend:

[api]
auth_backend = airflow.api.auth.backend.deny_all

Kerberos authentication is currently supported for the API.

To enable Kerberos authentication, set the following in the configuration:

[api]
auth_backend = airflow.api.auth.backend.kerberos_auth

[kerberos]
keytab = <KEYTAB>

The Kerberos service is configured as airflow/fully.qualified.domainname@REALM. Make sure this principal exists in the keytab file.

Kerberos

Airflow has initial support for Kerberos. This means that airflow can renew kerberos tickets for itself and store it in the ticket cache. The hooks and dags can make use of ticket to authenticate against kerberized services.

Limitations

Please note that at this time, not all hooks have been adjusted to make use of this functionality. Also it does not integrate kerberos into the web interface and you will have to rely on network level security for now to make sure your service remains secure.

Celery integration has not been tried and tested yet. However, if you generate a key tab for every host and launch a ticket renewer next to every worker it will most likely work.

Enabling kerberos

Airflow

To enable kerberos you will need to generate a (service) key tab.

# in the kadmin.local or kadmin shell, create the airflow principal
kadmin:  addprinc -randkey airflow/[email protected]

# Create the airflow keytab file that will contain the airflow principal
kadmin:  xst -norandkey -k airflow.keytab airflow/fully.qualified.domain.name

Now store this file in a location where the airflow user can read it (chmod 600). And then add the following to your airflow.cfg

[core]
security = kerberos

[kerberos]
keytab = /etc/airflow/airflow.keytab
reinit_frequency = 3600
principal = airflow

Launch the ticket renewer by

# run ticket renewer
airflow kerberos

Hadoop

If want to use impersonation this needs to be enabled in core-site.xml of your hadoop config.

<property>
  <name>hadoop.proxyuser.airflow.groups</name>
  <value>*</value>
</property>

<property>
  <name>hadoop.proxyuser.airflow.users</name>
  <value>*</value>
</property>

<property>
  <name>hadoop.proxyuser.airflow.hosts</name>
  <value>*</value>
</property>

Of course if you need to tighten your security replace the asterisk with something more appropriate.

Using kerberos authentication

The hive hook has been updated to take advantage of kerberos authentication. To allow your DAGs to use it, simply update the connection details with, for example:

{ "use_beeline": true, "principal": "hive/[email protected]"}

Adjust the principal to your settings. The _HOST part will be replaced by the fully qualified domain name of the server.

You can specify if you would like to use the dag owner as the user for the connection or the user specified in the login section of the connection. For the login user, specify the following as extra:

{ "use_beeline": true, "principal": "hive/[email protected]", "proxy_user": "login"}

For the DAG owner use:

{ "use_beeline": true, "principal": "hive/[email protected]", "proxy_user": "owner"}

and in your DAG, when initializing the HiveOperator, specify:

run_as_owner=True

To use kerberos authentication, you must install Airflow with the kerberos extras group:

pip install 'apache-airflow[kerberos]'

SSL

SSL can be enabled by providing a certificate and key. Once enabled, be sure to use “https://” in your browser.

[webserver]
web_server_ssl_cert = <path to cert>
web_server_ssl_key = <path to key>

Enabling SSL will not automatically change the web server port. If you want to use the standard port 443, you’ll need to configure that too. Be aware that super user privileges (or cap_net_bind_service on Linux) are required to listen on port 443.

# Optionally, set the server to listen on the standard SSL port.
web_server_port = 443
base_url = http://<hostname or IP>:443

Enable CeleryExecutor with SSL. Ensure you properly generate client and server certs and keys.

[celery]
ssl_active = True
ssl_key = <path to key>
ssl_cert = <path to cert>
ssl_cacert = <path to cacert>

Rendering Airflow UI in a Web Frame from another site

Using Airflow in a web frame is enabled by default. To disable this (and prevent click jacking attacks) set the below:

[webserver]
x_frame_enabled = False

Impersonation

Airflow has the ability to impersonate a unix user while running task instances based on the task’s run_as_user parameter, which takes a user’s name.

NOTE: For impersonations to work, Airflow must be run with sudo as subtasks are run with sudo -u and permissions of files are changed. Furthermore, the unix user needs to exist on the worker. Here is what a simple sudoers file entry could look like to achieve this, assuming as airflow is running as the airflow user. Note that this means that the airflow user must be trusted and treated the same way as the root user.

airflow ALL=(ALL) NOPASSWD: ALL

Subtasks with impersonation will still log to the same folder, except that the files they log to will have permissions changed such that only the unix user can write to it.

Default Impersonation

To prevent tasks that don’t use impersonation to be run with sudo privileges, you can set the core:default_impersonation config which sets a default user impersonate if run_as_user is not set.

[core]
default_impersonation = airflow

Flower Authentication

Basic authentication for Celery Flower is supported.

You can specify the details either as an optional argument in the Flower process launching command, or as a configuration item in your airflow.cfg. For both cases, please provide user:password pairs separated by a comma.

airflow flower --basic-auth=user1:password1,user2:password2
[celery]
flower_basic_auth = user1:password1,user2:password2

RBAC UI Security

Security of Airflow Webserver UI is handled by Flask AppBuilder (FAB). Please read its related security document regarding its security model.

Default Roles

Airflow ships with a set of roles by default: Admin, User, Op, Viewer, and Public. Only Admin users could configure/alter the permissions for other roles. But it is not recommended that Admin users alter these default roles in any way by removing or adding permissions to these roles.

Admin

Admin users have all possible permissions, including granting or revoking permissions from other users.

Public

Public users (anonymous) don’t have any permissions.

Viewer

Viewer users have limited viewer permissions

airflow/www/security.pyView Source

    VIEWER_PERMS = {
        'menu_access',
        'can_index',
        'can_list',
        'can_show',
        'can_chart',
        'can_dag_stats',
        'can_dag_details',
        'can_task_stats',
        'can_code',
        'can_log',
        'can_get_logs_with_metadata',
        'can_tries',
        'can_graph',
        'can_tree',
        'can_task',
        'can_task_instances',
        'can_xcom',
        'can_gantt',
        'can_landing_times',
        'can_duration',
        'can_blocked',
        'can_rendered',
        'can_version',
    }

on limited web views

airflow/www/security.pyView Source

    VIEWER_VMS = {
        'Airflow',
        'DagModelView',
        'Browse',
        'DAG Runs',
        'DagRunModelView',
        'Task Instances',
        'TaskInstanceModelView',
        'SLA Misses',
        'SlaMissModelView',
        'Jobs',
        'JobModelView',
        'Logs',
        'LogModelView',
        'Docs',
        'Documentation',
        'Github',
        'About',
        'Version',
        'VersionView',
    }

User

User users have Viewer permissions plus additional user permissions

airflow/www/security.pyView Source

    USER_PERMS = {
        'can_dagrun_clear',
        'can_run',
        'can_trigger',
        'can_add',
        'can_edit',
        'can_delete',
        'can_paused',
        'can_refresh',
        'can_success',
        'muldelete',
        'set_failed',
        'set_running',
        'set_success',
        'clear',
        'can_clear',
    }

on User web views which is the same as Viewer web views.

Op

Op users have User permissions plus additional op permissions

airflow/www/security.pyView Source

    OP_PERMS = {
        'can_conf',
        'can_varimport',
    }

on User web views plus these additional op web views

airflow/www/security.pyView Source

    OP_VMS = {
        'Admin',
        'Configurations',
        'ConfigurationView',
        'Connections',
        'ConnectionModelView',
        'Pools',
        'PoolModelView',
        'Variables',
        'VariableModelView',
        'XComs',
        'XComModelView',
    }

Custom Roles

DAG Level Role

Admin can create a set of roles which are only allowed to view a certain set of dags. This is called DAG level access. Each dag defined in the dag model table is treated as a View which has two permissions associated with it (can_dag_read and can_dag_edit). There is a special view called all_dags which allows the role to access all the dags. The default Admin, Viewer, User, Op roles can all access all_dags view.

Securing Connections

Airflow uses Fernet to encrypt passwords in the connection configuration. It guarantees that a password encrypted using it cannot be manipulated or read without the key. Fernet is an implementation of symmetric (also known as “secret key”) authenticated cryptography.

The first time Airflow is started, the airflow.cfg file is generated with the default configuration and the unique Fernet key. The key is saved to option fernet_key of section [core].

You can also configure a fernet key using environment variables. This will overwrite the value from the airflow.cfg file

# Note the double underscores
export AIRFLOW__CORE__FERNET_KEY=your_fernet_key

Generating fernet key

If you need to generate a new fernet key you can use the following code snippet.

from cryptography.fernet import Fernet
fernet_key= Fernet.generate_key()
print(fernet_key.decode()) # your fernet_key, keep it in secured place!

Rotating encryption keys

Once connection credentials and variables have been encrypted using a fernet key, changing the key will cause decryption of existing credentials to fail. To rotate the fernet key without invalidating existing encrypted values, prepend the new key to the fernet_key setting, run airflow rotate_fernet_key, and then drop the original key from fernet_keys:

  1. Set fernet_key to new_fernet_key,old_fernet_key

  2. Run airflow rotate_fernet_key to re-encrypt existing credentials with the new fernet key

  3. Set fernet_key to new_fernet_key