Configuring YARN Security
This page explains how to configure, start, and test secure YARN. For instructions on MapReduce1, see Configuring MRv1 Security.
Step 1: Configure Secure YARN
Before you start:
- The Kerberos principals for the ResourceManager and NodeManager are configured in the yarn-site.xml file. The same yarn-site.xml file must be installed on every host machine in the cluster.
- Make sure that each user who runs YARN jobs exists on all cluster nodes (that is, on every node that hosts any YARN daemon).
To configure secure YARN:
- Add the following properties to the yarn-site.xml file on every machine in the cluster:
<!-- ResourceManager security configs --> <property> <name>yarn.resourcemanager.keytab</name> <value>/etc/hadoop/conf/yarn.keytab</value> <!-- path to the YARN keytab --> </property> <property> <name>yarn.resourcemanager.principal</name> <value>yarn/_HOST@YOUR-REALM.COM</value> </property> <!-- NodeManager security configs --> <property> <name>yarn.nodemanager.keytab</name> <value>/etc/hadoop/conf/yarn.keytab</value> <!-- path to the YARN keytab --> </property> <property> <name>yarn.nodemanager.principal</name> <value>yarn/_HOST@YOUR-REALM.COM</value> </property> <property> <name>yarn.nodemanager.container-executor.class</name> <value>org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor</value> </property> <property> <name>yarn.nodemanager.linux-container-executor.group</name> <value>yarn</value> </property> <!-- To enable TLS/SSL --> <property> <name>yarn.http.policy</name> <value>HTTPS_ONLY</value> </property>
- Add the following properties to the mapred-site.xml file on every machine in the cluster:
<!-- MapReduce JobHistory Server security configs --> <property> <name>mapreduce.jobhistory.address</name> <value>host:port</value> <!-- Host and port of the MapReduce JobHistory Server; default port is 10020 --> </property> <property> <name>mapreduce.jobhistory.keytab</name> <value>/etc/hadoop/conf/mapred.keytab</value> <!-- path to the MAPRED keytab for the JobHistory Server --> </property> <property> <name>mapreduce.jobhistory.principal</name> <value>mapred/_HOST@YOUR-REALM.COM</value> </property> <!-- To enable TLS/SSL --> <property> <name>mapreduce.jobhistory.http.policy</name> <value>HTTPS_ONLY</value> </property>
- Create a file called container-executor.cfg for the Linux Container Executor program that contains the
following information:
yarn.nodemanager.local-dirs=<comma-separated list of paths to local NodeManager directories. Should be same values specified in yarn-site.xml. Required to validate paths passed to container-executor in order.> yarn.nodemanager.linux-container-executor.group=yarn yarn.nodemanager.log-dirs=<comma-separated list of paths to local NodeManager log directories. Should be same values specified in yarn-site.xml. Required to set proper permissions on the log files so that they can be written to by the user's containers and read by the NodeManager for log aggregation. banned.users=hdfs,yarn,mapred,bin min.user.id=1000
Note:In the container-executor.cfg file, the default setting for the banned.users property is hdfs, yarn, mapred, and bin to prevent jobs from being submitted using those user accounts. The default setting for the min.user.id property is 1000 to prevent jobs from being submitted with a user ID less than 1000, which are conventionally Unix super users. Some operating systems such as CentOS 5 use a default value of 500 and above for user IDs, not 1000. If this is the case on your system, change the default setting for the min.user.id property to 500. If there are user accounts on your cluster that have a user ID less than the value specified for the min.user.id property, the NodeManager returns an error code of 255.
- The path to the container-executor.cfg file is determined relative to the location of the container-executor binary. Specifically, the path is
<dirname of container-executor binary>/../etc/hadoop/container-executor.cfg. If you installed the CDH 5 package, this path will always correspond to /etc/hadoop/conf/container-executor.cfg.
Note:
The container-executor program requires that the paths including and leading up to the directories specified in yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs to be set to 755 permissions as shown in this table on permissions on directories.
- Verify that the ownership and permissions of the container-executor program corresponds to:
---Sr-s--- 1 root yarn 36264 May 20 15:30 container-executor
Note: For more information about the Linux Container Executor program, see Information about Other Hadoop Security Programs.
Step 2: Start the ResourceManager
You are now ready to start the ResourceManager.
If you're using the /etc/init.d/hadoop-yarn-resourcemanager script, then you can use the service command to run it now:
$ sudo service hadoop-yarn-resourcemanager start
You can verify that the ResourceManager is working properly by opening a web browser to http://host:8088/ where host is the name of the machine where the ResourceManager is running.
Step 3: Start the NodeManager
You are now ready to start the NodeManager.
If you're using the /etc/init.d/hadoop-yarn-nodemanager script, then you can use the service command to run it now:
$ sudo service hadoop-yarn-nodemanager start
You can verify that the NodeManager is working properly by opening a web browser to http://host:8042/ where host is the name of the machine where the NodeManager is running.
Step 4: Start the MapReduce Job History Server
You are now ready to start the MapReduce JobHistory Server.
If you're using the /etc/init.d/hadoop-mapreduce-historyserver script, then you can use the service command to run it now:
$ sudo service hadoop-mapreduce-historyserver start
You can verify that the MapReduce JobHistory Server is working properly by opening a web browser to http://host:19888/ where host is the name of the machine where the MapReduce JobHistory Server is running.
Step 5: Try Running a Map/Reduce YARN Job
You should now be able to run Map/Reduce jobs. To confirm, try launching a sleep or a pi job from the provided Hadoop examples (/usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar). You need Kerberos credentials to do so.
To try running a MapReduce job using YARN, set the HADOOP_MAPRED_HOME environment variable and then submit the job. For example:
$ export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce $ /usr/bin/hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 10 10000
Step 6: (Optional) Configure YARN for Long-running Applications
Long-running applications such as Spark Streaming jobs will need additional configuration since the default settings only allow the hdfs user's delegation tokens a maximum lifetime of 7 days which is not always sufficient.
You can work around this by configuring the ResourceManager as a proxy user for the corresponding HDFS NameNode so that the ResourceManager can request new tokens when the existing ones are past their maximum lifetime. YARN will then be able to continue performing localization and log-aggregation on behalf of the hdfs user.
Set the following property in yarn-site.xml to true:
<property> <name>yarn.resourcemanager.proxy-user-privileges.enabled</name> <value>true</value> </property>Configure the following properties in core-site.xml on the HDFS NameNode. You can use a more restrictive configuration by specifying hosts/groups instead of * as in the example below.
<property> <name>hadoop.proxyuser.yarn.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.yarn.groups</name> <value>*</value> </property>
<< Configuring MRv1 Security | ©2016 Cloudera, Inc. All rights reserved | FUSE Kerberos Configuration >> |
Terms and Conditions Privacy Policy |