Google Cloud AutoML Operators

The Google Cloud AutoML makes the power of machine learning available to you even if you have limited knowledge of machine learning. You can use AutoML to build on Google’s machine learning capabilities to create your own custom machine learning models that are tailored to your business needs, and then integrate those models into your applications and web sites.

Prerequisite Tasks

To use these operators, you must do a few things:

Creating Datasets

To create a Google AutoML dataset you can use AutoMLCreateDatasetOperator. The operator returns dataset id in XCom under dataset_id key.

airflow/providers/google/cloud/example_dags/example_automl_tables.pyView Source

create_dataset_task = AutoMLCreateDatasetOperator(
    task_id="create_dataset_task",
    dataset=DATASET,
    location=GCP_AUTOML_LOCATION,
    project_id=GCP_PROJECT_ID,
)

dataset_id = (
    "{{ task_instance.xcom_pull('create_dataset_task', key='dataset_id') }}"
)

After creating a dataset you can use it to import some data using AutoMLImportDataOperator.

airflow/providers/google/cloud/example_dags/example_automl_tables.pyView Source

import_dataset_task = AutoMLImportDataOperator(
    task_id="import_dataset_task",
    dataset_id=dataset_id,
    location=GCP_AUTOML_LOCATION,
    input_config=IMPORT_INPUT_CONFIG,
)

To update dataset you can use AutoMLTablesUpdateDatasetOperator.

airflow/providers/google/cloud/example_dags/example_automl_tables.pyView Source

update = deepcopy(DATASET)
update["name"] = '{{ task_instance.xcom_pull("create_dataset_task")["name"] }}'
update["tables_dataset_metadata"][  # type: ignore
    "target_column_spec_id"
] = "{{ get_target_column_spec(task_instance.xcom_pull('list_columns_spec_task'), target) }}"

update_dataset_task = AutoMLTablesUpdateDatasetOperator(
    task_id="update_dataset_task",
    dataset=update,
    location=GCP_AUTOML_LOCATION,
)

Listing Table And Columns Specs

To list table specs you can use AutoMLTablesListTableSpecsOperator.

airflow/providers/google/cloud/example_dags/example_automl_tables.pyView Source

list_tables_spec_task = AutoMLTablesListTableSpecsOperator(
    task_id="list_tables_spec_task",
    dataset_id=dataset_id,
    location=GCP_AUTOML_LOCATION,
    project_id=GCP_PROJECT_ID,
)

To list column specs you can use AutoMLTablesListColumnSpecsOperator.

airflow/providers/google/cloud/example_dags/example_automl_tables.pyView Source

list_columns_spec_task = AutoMLTablesListColumnSpecsOperator(
    task_id="list_columns_spec_task",
    dataset_id=dataset_id,
    table_spec_id="{{ extract_object_id(task_instance.xcom_pull('list_tables_spec_task')[0]) }}",
    location=GCP_AUTOML_LOCATION,
    project_id=GCP_PROJECT_ID,
)

Operations On Models

To create a Google AutoML model you can use AutoMLTrainModelOperator. The operator will wait for the operation to complete. Additionally the operator returns the id of model in XCom under model_id key.

airflow/providers/google/cloud/example_dags/example_automl_tables.pyView Source

create_model_task = AutoMLTrainModelOperator(
    task_id="create_model_task",
    model=MODEL,
    location=GCP_AUTOML_LOCATION,
    project_id=GCP_PROJECT_ID,
)

model_id = "{{ task_instance.xcom_pull('create_model_task', key='model_id') }}"

To get existing model one can use AutoMLGetModelOperator.

airflow/providers/google/cloud/example_dags/example_automl_tables.pyView Source

get_model_task = AutoMLGetModelOperator(
    task_id="get_model_task",
    model_id=MODEL_ID,
    location=GCP_AUTOML_LOCATION,
    project_id=GCP_PROJECT_ID,
)

Once a model is created it could be deployed using AutoMLDeployModelOperator.

airflow/providers/google/cloud/example_dags/example_automl_tables.pyView Source

deploy_model_task = AutoMLDeployModelOperator(
    task_id="deploy_model_task",
    model_id=MODEL_ID,
    location=GCP_AUTOML_LOCATION,
    project_id=GCP_PROJECT_ID,
)

If you wish to delete a model you can use AutoMLDeleteModelOperator.

airflow/providers/google/cloud/example_dags/example_automl_tables.pyView Source

delete_model_task = AutoMLDeleteModelOperator(
    task_id="delete_model_task",
    model_id=model_id,
    location=GCP_AUTOML_LOCATION,
    project_id=GCP_PROJECT_ID,
)

Making Predictions

To obtain predictions from Google Cloud AutoML model you can use AutoMLPredictOperator or AutoMLBatchPredictOperator. In the first case the model must be deployed.

airflow/providers/google/cloud/example_dags/example_automl_tables.pyView Source

predict_task = AutoMLPredictOperator(
    task_id="predict_task",
    model_id=MODEL_ID,
    payload={},  # Add your own payload, the used model_id must be deployed
    location=GCP_AUTOML_LOCATION,
    project_id=GCP_PROJECT_ID,
)

airflow/providers/google/cloud/example_dags/example_automl_tables.pyView Source

batch_predict_task = AutoMLBatchPredictOperator(
    task_id="batch_predict_task",
    model_id=MODEL_ID,
    input_config={},  # Add your config
    output_config={},  # Add your config
    location=GCP_AUTOML_LOCATION,
    project_id=GCP_PROJECT_ID,
)

Listing And Deleting Datasets

You can get a list of AutoML models using AutoMLListDatasetOperator. The operator returns list of datasets ids in XCom under dataset_id_list key.

airflow/providers/google/cloud/example_dags/example_automl_tables.pyView Source

list_datasets_task = AutoMLListDatasetOperator(
    task_id="list_datasets_task",
    location=GCP_AUTOML_LOCATION,
    project_id=GCP_PROJECT_ID,
)

To delete a model you can use AutoMLDeleteDatasetOperator. The delete operator allows also to pass list or coma separated string of datasets ids to be deleted.

airflow/providers/google/cloud/example_dags/example_automl_tables.pyView Source

delete_datasets_task = AutoMLDeleteDatasetOperator(
    task_id="delete_datasets_task",
    dataset_id="{{ task_instance.xcom_pull('list_datasets_task', key='dataset_id_list') | list }}",
    location=GCP_AUTOML_LOCATION,
    project_id=GCP_PROJECT_ID,
)