Cloudera Enterprise 5.15.x | Other versions

Configuring Oozie to Enable MapReduce Jobs To Read/Write from Microsoft Azure (ADLS)

MapReduce jobs controlled by Oozie as part of a workflow can read from and write to Azure Data Lake Storage (ADLS). The steps below show you how to enable this capability. Before you begin, you will need the following information from your Microsoft Azure account:
  • The client id.
  • The client secret.
  • The refresh URL. To get this value, in the Azure portal, go to Azure Active Directory > App registrations > Endpoints. In the Endpoints region, copy the OAUTH 2.0 TOKEN ENDPOINT. This is the value you need for the refresh_URL, below.
After storing these credentials in the keystore (the JCEKS file), specify the path to this keystore in the Oozie workflow configuration.
  Note: This setup is for use in the context of Oozie workflows only, and does not support running shell scripts on Microsoft Azure or other types of scenarios.

In the steps below, replace the path/to/file with the HDFS directory where the .jceks file is located, and replace access_key_ID and secret_access_key with your Microsoft Azure credentials.

  1. Create the credential store (.jceks) and add your Azure Client ID, Client Secret, and refresh URL to the store as follows:
    hadoop credential create dfs.adls.oauth2.client.id -provider jceks://hdfs/user/USER_NAME/adlskeyfile.jceks -value client ID
    hadoop credential create dfs.adls.oauth2.credential -provider jceks://hdfs/user/USER_NAME/adlskeyfile.jceks -value client secret
    hadoop credential create dfs.adls.oauth2.refresh.url -provider jceks://hdfs/user/USER_NAME/adlskeyfile.jceks -value refresh URL
  2. Set hadoop.security.credential.provider.path to the path of the .jceks file in Oozie's workflow.xml file in the MapReduce Action's <configuration> section so that the MapReduce framework can load the Azure credentials that give access to ADLS.
    <action name="ADLSjob">
        <map-reduce>
            <job-tracker>${jobtracker}</job-tracker>
            <name-node>${namenode}</name-node>
            <configuration>
                <property>
                    <name>hadoop.security.credential.provider.path</name>
                    <value>jceks://hdfs/path/to/file.jceks</value>
                </property>  
                ....
                ....
    </action>
Page generated May 18, 2018.