Configuring Data Fabric SASL and SSL for Hooks Connections

This topic decribes configuration options for Data Fabric SASL and SSL for hook connections in Airflow.

Using Airflow, you can import and export data to multiple systems. Airflow provides a high-level interface called Hooks to connect to these systems by integrating with Connections.

A connection is an object that stores credentials such as your username, password and hostname, the type of system you are connecting to, and other configuration options.

HPE Ezmeral Data Fabric 7.0.0 supports Data Fabric SASL authentication for Airflow.

To support Data Fabric SASL authentication for HPE Ezmeral Data Fabric 6.2.x, see Applying a Patch.

Airflow authenticates with Data Fabric SASL in the following ways:

Using the Ecosystem Component Client

To authenticate with Data Fabric SASL, Airflow uses the clients of ecosystem component installed on the node. To submit the tasks, configure a Data Fabric User Ticket on a secure cluster. See Generating a HPE Ezmeral Data Fabric User Ticket.

Using the REST API or Thrift protocol

To authenticate with Data Fabric SASL, you can use REST API or Thrift protocol by setting the additional configuration options.

WebEZFSHook (webezfs_default connection id)
To connect with file system, set the following configuration options on extra section of connection configuration.
Data Fabric SASL: Set {"auth": "maprsasl"}.
SSL: On secure clusters, set {"use_ssl": "true"} option. For nondefault SSL configuration, set {"cert":"/path_to_truststore.pem"}.
EzHiveCliHook (hive_cli_default connection id)
To connect with Hive, set the following configuration options on connection configuration.
Data Fabric SASL: Set{"use_beeline": true, "auth": "maprsasl", "ssl":"true"}.
SSL: On secure clusters, set {"use_beeline": true, "auth": "maprsasl", "ssl":"true"} .
Add the following configutaion options to hive-site.xml file:
<property> 

    <name>hive.security.authorization.sqlstd.confwhitelist.append</name> 

    <value>mapred.job.name|airflow.ctx.*</value> 

 </property>
EzHiveMetastoreHook (metastore_default connection id)
To connect with Hive Metastore, set the following configuration options on extra section of connection configuration.
Data Fabric SASL: Set {"authMechanism":"MAPRSASL"}.
EzHiveServer2Hook (hiveserver2_default connection id)
To connect with HiveServer2, set the following configuration options on extra section of connection configuration.
Data Fabric SASL: Set {"authMechanism":"MAPRSASL"}.
SSL: On secure clusters, set {"ssl": "true"} option. For nondefault SSL configuration, set {"certificate":"/path_to_truststore.pem"}.
EzLivyHook (livy_default connection id)
To connect with Livy, set the following configuration options on extra section of connection configuration.
Data Fabric SASL: Set {"auth":"maprsasl"}.
SSL: On secure clusters, set {"use_ssl": "true"} option. For nondefault SSL configuration, set {"cert":"/path_to_truststore.pem"}.
EzS3Hook (aws_default connection id)
To connect with S3, set the following configuration options on the extra section of connection configuration.
SSL: On secure clusters, set the {"cert":"/path_to_truststore.pem"} option for nondefault SSL configuration.