airflow.contrib.hooks.aws_athena_hook

This module contains AWS Athena hook

Module Contents

class airflow.contrib.hooks.aws_athena_hook.AWSAthenaHook(aws_conn_id='aws_default', sleep_time=30, *args, **kwargs)[source]

Bases: airflow.contrib.hooks.aws_hook.AwsHook

Interact with AWS Athena to run, poll queries and return query results

Parameters
  • aws_conn_id (str) – aws connection to use.

  • sleep_time (int) – Time to wait between two consecutive call to check query status on athena

INTERMEDIATE_STATES = ['QUEUED', 'RUNNING'][source]
FAILURE_STATES = ['FAILED', 'CANCELLED'][source]
SUCCESS_STATES = ['SUCCEEDED'][source]
get_conn(self)[source]

check if aws conn exists already or create one and return it

Returns

boto3 session

run_query(self, query, query_context, result_configuration, client_request_token=None, workgroup='primary')[source]

Run Presto query on athena with provided config and return submitted query_execution_id

Parameters
  • query (str) – Presto query to run

  • query_context (dict) – Context in which query need to be run

  • result_configuration (dict) – Dict with path to store results in and config related to encryption

  • client_request_token (str) – Unique token created by user to avoid multiple executions of same query

  • workgroup (str) – Athena workgroup name, when not specified, will be ‘primary’

Returns

str

check_query_status(self, query_execution_id)[source]

Fetch the status of submitted athena query. Returns None or one of valid query states.

Parameters

query_execution_id (str) – Id of submitted athena query

Returns

str

get_state_change_reason(self, query_execution_id)[source]

Fetch the reason for a state change (e.g. error message). Returns None or reason string.

Parameters

query_execution_id (str) – Id of submitted athena query

Returns

str

get_query_results(self, query_execution_id)[source]

Fetch submitted athena query results. returns none if query is in intermediate state or failed/cancelled state else dict of query output

Parameters

query_execution_id (str) – Id of submitted athena query

Returns

dict

poll_query_status(self, query_execution_id, max_tries=None)[source]

Poll the status of submitted athena query until query state reaches final state. Returns one of the final states

Parameters
  • query_execution_id (str) – Id of submitted athena query

  • max_tries (int) – Number of times to poll for query state before function exits

Returns

str

stop_query(self, query_execution_id)[source]

Cancel the submitted athena query

Parameters

query_execution_id (str) – Id of submitted athena query

Returns

dict