Alternative secrets backend

New in version 1.10.10.

In addition to retrieving connections & variables from environment variables or the metastore database, you can enable an alternative secrets backend to retrieve Airflow connections or Airflow variables, such as AWS SSM Parameter Store, Hashicorp Vault Secrets or you can roll your own.

Note

The Airflow UI only shows connections and variables stored in the Metadata DB and not via any other method. If you use an alternative secrets backend, check inside your backend to view the values of your variables and connections.

Search path

When looking up a connection/variable, by default Airflow will search environment variables first and metastore database second.

If you enable an alternative secrets backend, it will be searched first, followed by environment variables, then metastore. This search ordering is not configurable.

Configuration

The [secrets] section has the following options:

[secrets]
backend =
backend_kwargs =

Set backend to the fully qualified class name of the backend you want to enable.

You can provide backend_kwargs with json and it will be passed as kwargs to the __init__ method of your secrets backend.

Local Filesystem Secrets Backend

This backend is especially useful in the following use cases:

  • Development: It ensures data synchronization between all terminal windows (same as databases), and at the same time the values are retained after database restart (same as environment variable)

  • Kubernetes: It allows you to store secrets in Kubernetes Secrets or you can synchronize values using the sidecar container and a shared volume

To use variable and connection from local file, specify LocalFilesystemBackend as the backend in [secrets] section of airflow.cfg.

Available parameters to backend_kwargs:

  • variables_file_path: File location with variables data.

  • connections_file_path: File location with connections data.

Here is a sample configuration:

[secrets]
backend = airflow.secrets.local_filesystem.LocalFilesystemBackend
backend_kwargs = {"variables_file_path": "/files/var.json", "connections_file_path": "/files/conn.json"}

JSON, YAML and .env files are supported. All parameters are optional. If the file path is not passed, the backend returns an empty collection.

Storing and Retrieving Connections

If you have set connections_file_path as /files/my_conn.json, then the backend will read the file /files/my_conn.json when it looks for connections.

The file can be defined in JSON, YAML or env format. Depending on the format, the data should be saved as a URL or as a connection object. Any extra json parameters can be provided using keys like extra_dejson and extra. The key extra_dejson can be used to provide parameters as JSON object where as the key extra can be used in case of a JSON string. The keys extra and extra_dejson are mutually exclusive.

The JSON file must contain an object where the key contains the connection ID and the value contains the definition of one connection. The connection can be defined as a URI (string) or JSON object. For a guide about defining a connection as a URI, see:: Generating a connection URI. For a description of the connection object parameters see Connection. The following is a sample JSON file.

{
    "CONN_A": "mysq://host_a",
    "CONN_B": {
        "conn_type": "scheme",
        "host": "host",
        "schema": "lschema",
        "login": "Login",
        "password": "None",
        "port": "1234"
    }
}

The YAML file structure is similar to that of a JSON. The key-value pair of connection ID and the definitions of one or more connections. In this format, the connection can be defined as a URI (string) or JSON object.

CONN_A: 'mysq://host_a'

CONN_B:
  - 'mysq://host_a'
  - 'mysq://host_b'

CONN_C:
  conn_type: scheme
  host: host
  schema: lschema
  login: Login
  password: None
  port: 1234
  extra_dejson:
    a: b
    nestedblock_dict:
      x: y

You can also define connections using a .env file. Then the key is the connection ID, and the value should describe the connection using the URI. Connection ID should not be repeated, it will raise an exception. The following is a sample file.

mysql_conn_id=mysql://log:password@13.1.21.1:3306/mysqldbrd
google_custom_key=google-cloud-platform://?extra__google_cloud_platform__key_path=%2Fkeys%2Fkey.json

Storing and Retrieving Variables

If you have set variables_file_path as /files/my_var.json, then the backend will read the file /files/my_var.json when it looks for variables.

The file can be defined in JSON, YAML or env format.

The JSON file must contain an object where the key contains the variable key and the value contains the variable value. The following is a sample JSON file.

{
    "VAR_A": "some_value",
    "var_b": "differnet_value"
}

The YAML file structure is similar to that of JSON, with key containing the variable key and the value containing the variable value. The following is a sample YAML file.

VAR_A: some_value
VAR_B: different_value

You can also define variable using a .env file. Then the key is the variable key, and variable should describe the variable value. The following is a sample file.

VAR_A=some_value
var_B=different_value

AWS SSM Parameter Store Secrets Backend

To enable SSM parameter store, specify SystemsManagerParameterStoreBackend as the backend in [secrets] section of airflow.cfg.

Here is a sample configuration:

[secrets]
backend = airflow.providers.amazon.aws.secrets.systems_manager.SystemsManagerParameterStoreBackend
backend_kwargs = {"connections_prefix": "/airflow/connections", "variables_prefix": "/airflow/variables", "profile_name": "default"}

Storing and Retrieving Connections

If you have set connections_prefix as /airflow/connections, then for a connection id of smtp_default, you would want to store your connection at /airflow/connections/smtp_default.

Optionally you can supply a profile name to reference aws profile, e.g. defined in ~/.aws/config.

The value of the SSM parameter must be the connection URI representation of the connection object.

Storing and Retrieving Variables

If you have set variables_prefix as /airflow/variables, then for an Variable key of hello, you would want to store your Variable at /airflow/variables/hello.

Optionally you can supply a profile name to reference aws profile, e.g. defined in ~/.aws/config.

AWS Secrets Manager Backend

To enable Secrets Manager, specify SecretsManagerBackend as the backend in [secrets] section of airflow.cfg.

Here is a sample configuration:

[secrets]
backend = airflow.providers.amazon.aws.secrets.secrets_manager.SecretsManagerBackend
backend_kwargs = {"connections_prefix": "airflow/connections", "variables_prefix": "airflow/variables", "profile_name": "default"}

To authenticate you can either supply a profile name to reference aws profile, e.g. defined in ~/.aws/config or set environment variables like AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY.

Storing and Retrieving Connections

If you have set connections_prefix as airflow/connections, then for a connection id of smtp_default, you would want to store your connection at airflow/connections/smtp_default.

Example:

aws secretsmanager put-secret-value \
    --secret-id airflow/connections/smtp_default \
    --secret-string "smtps://user:host@relay.example.com:465"

Verify that you can get the secret:

❯ aws secretsmanager get-secret-value --secret-id airflow/connections/smtp_default
{
    "ARN": "arn:aws:secretsmanager:us-east-2:314524341751:secret:airflow/connections/smtp_default-7meuul",
    "Name": "airflow/connections/smtp_default",
    "VersionId": "34f90eff-ea21-455a-9c8f-5ee74b21be672",
    "SecretString": "smtps://user:host@relay.example.com:465",
    "VersionStages": [
        "AWSCURRENT"
    ],
    "CreatedDate": "2020-04-08T02:10:35.132000+01:00"
}

The value of the secret must be the connection URI representation of the connection object.

Storing and Retrieving Variables

If you have set variables_prefix as airflow/variables, then for an Variable key of hello, you would want to store your Variable at airflow/variables/hello.

Hashicorp Vault Secrets Backend

To enable Hashicorp vault to retrieve Airflow connection/variable, specify VaultBackend as the backend in [secrets] section of airflow.cfg.

Here is a sample configuration:

[secrets]
backend = airflow.providers.hashicorp.secrets.vault.VaultBackend
backend_kwargs = {"connections_path": "connections", "variables_path": "variables", "mount_point": "airflow", "url": "http://127.0.0.1:8200"}

The default KV version engine is 2, pass kv_engine_version: 1 in backend_kwargs if you use KV Secrets Engine Version 1.

You can also set and pass values to Vault client by setting environment variables. All the environment variables listed at https://www.vaultproject.io/docs/commands/#environment-variables are supported.

Hence, if you set VAULT_ADDR environment variable like below, you do not need to pass url key to backend_kwargs:

export VAULT_ADDR="http://127.0.0.1:8200"

Storing and Retrieving Connections

If you have set connections_path as connections and mount_point as airflow, then for a connection id of smtp_default, you would want to store your secret as:

vault kv put airflow/connections/smtp_default conn_uri=smtps://user:host@relay.example.com:465

Note that the Key is conn_uri, Value is postgresql://airflow:airflow@host:5432/airflow and mount_point is airflow.

You can make a mount_point for airflow as follows:

vault secrets enable -path=airflow -version=2 kv

Verify that you can get the secret from vault:

❯ vault kv get airflow/connections/smtp_default
====== Metadata ======
Key              Value
---              -----
created_time     2020-03-19T19:17:51.281721Z
deletion_time    n/a
destroyed        false
version          1

====== Data ======
Key         Value
---         -----
conn_uri    smtps://user:host@relay.example.com:465

The value of the Vault key must be the connection URI representation of the connection object to get connection.

Storing and Retrieving Variables

If you have set variables_path as variables and mount_point as airflow, then for a variable with hello as key, you would want to store your secret as:

vault kv put airflow/variables/hello value=world

Verify that you can get the secret from vault:

❯ vault kv get airflow/variables/hello
====== Metadata ======
Key              Value
---              -----
created_time     2020-03-28T02:10:54.301784Z
deletion_time    n/a
destroyed        false
version          1

==== Data ====
Key      Value
---      -----
value    world

Note that the secret Key is value, and secret Value is world and mount_point is airflow.

GCP Secret Manager Backend

To enable GCP Secrets Manager to retrieve connection/variables, specify CloudSecretManagerBackend as the backend in [secrets] section of airflow.cfg.

Available parameters to backend_kwargs:

  • connections_prefix: Specifies the prefix of the secret to read to get Connections.

  • variables_prefix: Specifies the prefix of the secret to read to get Variables.

  • gcp_key_path: Path to GCP Credential JSON file

  • gcp_scopes: Comma-separated string containing GCP scopes

  • sep: separator used to concatenate connections_prefix and conn_id. Default: “-“

Note: The full GCP Secrets Manager secret id should follow the pattern “[a-zA-Z0-9-_]”.

Here is a sample configuration if you want to just retrieve connections:

[secrets]
backend = airflow.providers.google.cloud.secrets.secret_manager.CloudSecretManagerBackend
backend_kwargs = {"connections_prefix": "airflow-connections", "sep": "-"}

Here is a sample configuration if you want to just retrieve variables:

[secrets]
backend = airflow.providers.google.cloud.secrets.secret_manager.CloudSecretManagerBackend
backend_kwargs = {"variables_prefix": "airflow-variables", "sep": "-"}

and if you want to retrieve both Variables and connections use the following sample config:

[secrets]
backend = airflow.providers.google.cloud.secrets.secret_manager.CloudSecretManagerBackend
backend_kwargs = {"connections_prefix": "airflow-connections", "variables_prefix": "airflow-variables", "sep": "-"}

When gcp_key_path is not provided, it will use the Application Default Credentials (ADC) to obtain credentials.

Note

For more information about the Application Default Credentials (ADC), see:

The value of the Secrets Manager secret id must be the connection URI representation of the connection object.

Roll your own secrets backend

A secrets backend is a subclass of airflow.secrets.BaseSecretsBackend and must implement either get_connections() or get_conn_uri().

After writing your backend class, provide the fully qualified class name in the backend key in the [secrets] section of airflow.cfg.

Additional arguments to your SecretsBackend can be configured in airflow.cfg by supplying a JSON string to backend_kwargs, which will be passed to the __init__ of your SecretsBackend. See Configuration for more details, and SSM Parameter Store for an example.

Note

If you are rolling your own secrets backend, you don’t strictly need to use airflow’s URI format. But doing so makes it easier to switch between environment variables, the metastore, and your secrets backend.