Encrypted Shuffle and Encrypted Web UIs
- If you use Cloudera Manager, do not use these command-line instructions. For the Cloudera Manager instructions, see Configuring TLS/SSL for HDFS, YARN and MapReduce.
- This information applies specifically to CDH 5.15.0. If you use a lower version of CDH, see the documentation for that version located at Cloudera Documentation.
CDH 5 supports encryption of the MapReduce shuffle phase for both MapReduce v1 (MRv1) and MapReduce v2 (MRv2), also known as YARN. CDH also supports enabling TLS/SSL for the MRv1 and YARN web UIs, with optional client authentication (also known as bi-directional HTTPS, or HTTPS with client certificates). The configuration properties required to enable these features have been combined. In most cases, these properties are common to both MRv1 and YARN. They include:
- hadoop.ssl.enabled:
- Toggles the shuffle for MRv1 between HTTP and HTTPS.
- Toggles the MRv1 and YARN web UIs between HTTP and HTTPS.
- mapreduce.shuffle.ssl.enabled: Toggles the shuffle for YARN between HTTP and HTTPS.
By default, this property is not specified in mapred-site.xml, and YARN encrypted shuffle is controlled by the value of hadoop.ssl.enabled. If this property is set to true, encrypted shuffle is enabled for YARN. Note that you cannot successfully enable encrypted shuffle for YARN by only setting this property to true, if hadoop.ssl.enabled is still set to false.
- Configuration settings for specifying keystore and truststore properties that are used by the MapReduce shuffle service, the Reducer tasks that fetch shuffle data, and the web UIs.
- ssl.server.truststore.reload.interval: A configuration property to reload truststores across the cluster when a node is added or removed.
When the web UIs are served over HTTPS, you must specify https:// as the protocol. There is no redirection from http://. If you attempt to access an HTTPS resource over HTTP, your browser will show an empty screen with no warning.
Continue reading:
Configuring Encrypted Shuffle and Encrypted Web UIs
Configure encryption for the MapReduce shuffle, and the MRv1 and YARN web UIs, as follows:
- Enable encrypted shuffle for MRv1, and encryption for the MRv1 and YARN web UIs (core-site.xml)
- Enable encrypted shuffle for YARN (mapred-site.xml)
- Configure the keystore and truststore for the Shuffle server (ssl-server.xml)
- Configure the keystore and truststore for the Reducer/Fetcher (ssl-client.xml)
Enable encrypted shuffle for MRv1, and encryption for the MRv1 and YARN web UIs (core-site.xml)
- hadoop.ssl.enabled
-
Default value: false
For MRv1, set this value to true to enable encryption for both the MapReduce shuffle and the web UI.
For YARN, this property enables encryption for the web UI only . Enable shuffle encryption with a property in the mapred-site.xml file as described here.
- hadoop.ssl.require.client.cert
-
Default value: false
When this property is set to true, client certificates are required for all shuffle operations and all browsers used to access web UIs.
Cloudera recommends that this be set to false. This is because client certificates are easily susceptible to attacks from malicious clients or jobs. For more details, see Client Certificates.
- hadoop.ssl.hostname.verifier
-
Default value: DEFAULT
The SSLHostnameVerifier interface present inside the hadoop-common security library checks if a hostname matches the name stored inside the server's X.509 certificate. The value assigned to this property determines how Hadoop verifies hostnames when it establishes new HttpsURLConnection instances. Valid values are:- DEFAULT: The hostname must match either the first common name (CN) or any of the subjectAltNames (SAN). Wildcards can occur in either the CN or the SANs. For example, a hostname, such as *.example.com, will match all subdomains, including test.cloudera.example.com.
- DEFAULT_AND_LOCALHOST: This verifier mechanism works just like DEFAULT. However, it also allows all hostnames of the type: localhost, localhost.example, or 127.0.0.1.
- STRICT: This verifier works just like DEFAULT with an additional restriction for hostnames with wildcards. For example, a hostname with a wildcard such as *.example.com, will only match subdomains at the same level. Hence, cloudera.example.com will match, but, unlike DEFAULT, test.cloudera.example.com will be rejected.
- STRICT_IE6: This verifier works just like STRICT, however, it will allow hostnames that match any of the common names (CN) within the server's X.509 certificate, not just the first one.
- ALLOW_ALL: Using this verifier will essentially turn off the hostname verifier mechanism.
- hadoop.ssl.keystores.factory.class
-
Default value: org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory
The KeyStoresFactory implementation to be used. Currently, FileBasedKeyStoresFactory is the only implementation of KeyStoresFactory.
- hadoop.ssl.server.conf
-
Default value: ssl-server.xml
Resource file from which TLS/SSL server keystore information is extracted. Typically, it should be in the /etc/hadoop/conf/ directory so that it can be looked up in the CLASSPATH.
- hadoop.ssl.client.conf
-
Default value: ssl-client.xml
Resource file from which TLS/SSL client keystore information is extracted. Typically, it should be in the /etc/hadoop/conf/ directory so that it can be looked up in the CLASSPATH.
... <property> <name>hadoop.ssl.require.client.cert</name> <value>false</value> <final>true</final> </property> <property> <name>hadoop.ssl.hostname.verifier</name> <value>DEFAULT</value> <final>true</final> </property> <property> <name>hadoop.ssl.keystores.factory.class</name> <value>org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory</value> <final>true</final> </property> <property> <name>hadoop.ssl.server.conf</name> <value>ssl-server.xml</value> <final>true</final> </property> <property> <name>hadoop.ssl.client.conf</name> <value>ssl-client.xml</value> <final>true</final> </property> <property> <name>hadoop.ssl.enabled</name> <value>true</value> </property> ...
Enable encrypted shuffle for YARN (mapred-site.xml)
To enable encrypted shuffle for YARN, set the following property in the mapred-site.xml file on every node in the cluster:
- mapreduce.shuffle.ssl.enabled
-
Default value: Not specified
By default, this property is not specified in mapred-site.xml, and YARN encrypted shuffle is controlled by the value of hadoop.ssl.enabled. If this property is set to true, encrypted shuffle is enabled for YARN. Note that you cannot successfully enable encrypted shuffle for YARN by only setting this property to true, if hadoop.ssl.enabled is still set to false.
... <property> <name>mapreduce.shuffle.ssl.enabled</name> <value>true</value> <final>true</final> </property> ...
Configure the keystore and truststore for the Shuffle server (ssl-server.xml)
- Configure the Linux Task Controller for MRv1
- Configure the Linux Container Executor for YARN
Currently, FileBasedKeyStoresFactory is the only implementation of KeyStoresFactory. It uses properties in the ssl-server.xml and ssl-client.xml files to configure the keystores and truststores.
The ssl-server.xml should be owned by the hdfs or mapred Hadoop system user, belong to the hadoop group, and it should have 440 permissions. Regular users should not belong to the hadoop group.
Use the following settings to configure the keystores and truststores in the ssl-server.xml file.
Property |
Default Value |
Description |
---|---|---|
ssl.server.keystore.type |
jks |
Keystore file type |
ssl.server.keystore.location |
NONE |
Keystore file location. The mapred user must own this file and have exclusive read access to it. |
ssl.server.keystore.password |
NONE |
Keystore file password |
ssl.server.keystore.keypassword |
NONE |
Key password |
ssl.server.truststore.type |
jks |
Truststore file type |
ssl.server.truststore.location |
NONE |
Truststore file location. The mapred user must own this file and have exclusive read access to it. |
ssl.server.truststore.password |
NONE |
Truststore file password |
ssl.server.truststore.reload.interval |
10000 |
Truststore reload interval, in milliseconds |
Sample ssl-server.xml
<configuration> <!-- Server Certificate Store --> <property> <name>ssl.server.keystore.type</name> <value>jks</value> </property> <property> <name>ssl.server.keystore.location</name> <value>${user.home}/keystores/server-keystore.jks</value> </property> <property> <name>ssl.server.keystore.password</name> <value>serverfoo</value> </property> <property> <name>ssl.server.keystore.keypassword</name> <value>serverfoo</value> </property> <!-- Server Truststore --> <property> <name>ssl.server.truststore.type</name> <value>jks</value> </property> <property> <name>ssl.server.truststore.location</name> <value>${user.home}/keystores/truststore.jks</value> </property> <property> <name>ssl.server.truststore.password</name> <value>clientserverbar</value> </property> <property> <name>ssl.server.truststore.reload.interval</name> <value>10000</value> </property> </configuration>
Configure the keystore and truststore for the Reducer/Fetcher (ssl-client.xml)
Use the following settings to configure the keystore and truststore in the ssl-client.xml file. This file must be owned by the mapred user for MRv1 and by the yarn user for YARN. The file permissions should be 444 (read access for all users).
Property |
Default Value |
Description |
---|---|---|
ssl.client.keystore.type |
jks |
Keystore file type |
ssl.client.keystore.location |
NONE |
Keystore file location. The mapred user must own this file and should have read access to it. |
ssl.client.keystore.password |
NONE |
Keystore file password |
ssl.client.keystore.keypassword |
NONE |
Key password |
ssl.client.truststore.type |
jks |
Truststore file type |
ssl.client.truststore.location |
NONE |
Truststore file location. The mapred user must own this file and should have read access to it. |
ssl.client.truststore.password |
NONE |
Truststore file password |
ssl.client.truststore.reload.interval |
10000 |
Truststore reload interval, in milliseconds |
Sample ssl-client.xml
<configuration> <!-- Client Certificate Store --> <property> <name>ssl.client.keystore.type</name> <value>jks</value> </property> <property> <name>ssl.client.keystore.location</name> <value>${user.home}/keystores/client-keystore.jks</value> </property> <property> <name>ssl.client.keystore.password</name> <value>clientfoo</value> </property> <property> <name>ssl.client.keystore.keypassword</name> <value>clientfoo</value> </property> <!-- Client Truststore --> <property> <name>ssl.client.truststore.type</name> <value>jks</value> </property> <property> <name>ssl.client.truststore.location</name> <value>${user.home}/keystores/truststore.jks</value> </property> <property> <name>ssl.client.truststore.password</name> <value>clientserverbar</value> </property> <property> <name>ssl.client.truststore.reload.interval</name> <value>10000</value> </property> </configuration>
Activating Encrypted Shuffle
Encrypted shuffle has a significant performance impact. You should benchmark this before implementing it in production. In many cases, one or more additional cores are needed to maintain performance.
When you have made the configuration changes described in the previous section, activate Encrypted Shuffle by re-starting all TaskTrackers in MRv1 and all NodeManagers in YARN.
Client Certificates
Client Certificates are supported but they do not guarantee that the client is a reducer task for the job. The Client Certificate keystore file that contains the private key must be readable by all users who submit jobs to the cluster, which means that a rogue job could read those keystore files and use the client certificates in them to establish a secure connection with a Shuffle server. The JobToken mechanism that the Hadoop environment provides is a better protector of the data; each job uses its own JobToken to retrieve only the shuffle data that belongs to it. Unless the rogue job has a proper JobToken, it cannot retrieve Shuffle data from the Shuffle server.
However, if your cluster requires client certificates, ensure that browsers connecting to the web UIs are configured with appropriately signed certificates. If your certificates are signed by a certificate authority (CA), make sure you include the complete chain of CA certificates in the server's keystore.
Reloading Truststores
By default, each truststore reloads its configuration every 10 seconds. If you bring in a new truststore file to replace an old one, when the truststore is reloaded, the new certificates will be override the previous ones. If a client certificate is added to (or removed from) all the truststore files in the system, both YARN and MRv1 will pick up the new configuration without requiring that the TaskTracker or NodeManager daemons are restarted. This mechanism is useful for adding or removing nodes from the cluster, or for adding or removing trusted clients.
The reload interval is controlled by the ssl.client.truststore.reload.interval and ssl.server.truststore.reload.interval configuration properties in the ssl-client.xml and ssl-server.xml files described here.
Debugging
To enable TLS/SSL debugging in the reducers, set the mapred.reduce.child.java.opts property as follows. You can do this on a per-job basis, or by means of a cluster-wide setting in mapred-site.xml.:
<configuration> ... <property> <name>mapred.reduce.child.java.opts</name> <value>-Xmx200m -Djavax.net.debug=all</value> </property> ... </configuration>
To enable debugging for MRv1 TaskTrackers, edit hadoop-env.sh as follows:
HADOOP_TASKTRACKER_OPTS="-Djavax.net.debug=all $HADOOP_TASKTRACKER_OPTS"
To enable debugging for YARN NodeManagers for YARN, edit yarn-env.sh as follows:
YARN_OPTS="-Djavax.net.debug=all $YARN_OPTS"
<< Configuring TLS/SSL for HttpFS | ©2016 Cloudera, Inc. All rights reserved | Configuring TLS/SSL for Navigator Audit Server >> |
Terms and Conditions Privacy Policy |