Work Preserving Recovery for YARN Components
Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)
After moving the JobHistory Server to a new host, the URLs listed for the JobHistory Server on the ResourceManager web UI still point to the old JobHistory Server. This affects existing jobs only. New jobs started after the move are not affected. For any existing jobs that have the incorrect JobHistory Server URL, there is no option other than to allow the jobs to roll off the history over time. For new jobs, make sure that all clients have the updated mapred-site.xml that references the correct JobHistory Server.
Configuring Work Preserving Recovery Using Cloudera Manager
Enabling Work Preserving Recovery on ResourceManager with Cloudera Manager
Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)
If you use Cloudera Manager and you enable YARN (MRv2) ResourceManager High Availability, work preserving recovery is enabled by default for the ResourceManager.
Disabling Work Preserving Recovery on ResourceManager Using Cloudera Manager
To disable Work Preserving Recovery for the ResourceManager:
- Go to the YARN service.
- Click the Configuration tab.
- Search for Enable ResourceManager Recovery.
- In the Enable ResourceManager Recovery field, clear the ResourceManager Default Group checkbox.
- Click Save Changes.
Enabling Work Preserving Recovery on NodeManager with Cloudera Manager
Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)
The default value for the recovery directory is /var/lib/hadoop-yarn/yarn-nm-recovery.
Work preserving recovery is enabled by default in Cloudera Manager managed clusters.
- Edit the advanced configuration snippet for yarn-site.xml on that NodeManager, and set the value of yarn.nodemanager.recovery.enabled to true.
- Configure the directory on the local filesystem where state information is stored when work preserving recovery is enabled.
- Go to the YARN service.
- Click the Configuration tab.
- Search for NodeManager Recovery Directory.
- Enter the directory path in the NodeManager Recovery Directory field (for example, /var/lib/hadoop-yarn/yarn-nm-recovery).
- Click Save Changes.
Configuring Work Preserving Recovery Using the Command Line
Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)
- Follow these command-line instructions on systems that do not use Cloudera Manager.
- This information applies specifically to CDH 5.15.0. See Cloudera Documentation for information specific to other releases.
- Set the value of yarn.resourcemanager.work-preserving-recovery.enabled to true to enable work preserving recovery for the ResourceManager, and set the value of yarn.nodemanager.recovery.enabled to true for the NodeManager.
- For each NodeManager, configure the directory on the local filesystem where state information is stored when work preserving recovery is enabled, Set yarn.nodemanager.recovery.dir to a local filesystem directory. The default value is ${hadoop.tmp.dir}/yarn-nm-recovery. This location usually points to the /tmp directory on the local filesystem. Because many operating systems do not preserve the contents of the /tmp directory across a reboot, Cloudera strongly recommends changing the location of yarn.nodemanager.recovery.dir to a directory under the root partition. If the drive which hosts this directory fails, the NodeManager will also fail. The example below uses /home/cloudera/recovery.
- Configure a valid RPC address for the NodeManager by setting yarn.nodemanager.address to an address with a specific port number (such as 0.0.0.0:45454). Ephemeral ports (default is port 0) cannot be used for the NodeManager's RPC server; this could cause the NodeManager to use different ports before and after a restart, preventing clients from connecting to the NodeManager. The NodeManager RPC address is also important for auxiliary services that run in a YARN cluster.
Auxiliary services should be designed to support recoverability by reloading the previous state after a NodeManager restarts. An example auxiliary service, the ShuffleHandler service for MapReduce, follows the correct pattern for an auxiliary service that supports work preserving recovery of the NodeManager.
For more information, see Starting, Stopping, and Restarting Services.
Example Configuration for Work Preserving Recovery
Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)
<property> <name>yarn.resourcemanager.work-preserving-recovery.enabled</name> <value>true</value> <description>Whether to enable work preserving recovery for the Resource Manager</description> </property> <property> <name>yarn.nodemanager.recovery.enabled</name> <value>true</value> <description>Whether to enable work preserving recovery for the Node Manager</description> </property> <property> <name>yarn.nodemanager.recovery.dir</name> <value>/home/cloudera/recovery</value> <description>The location for stored state on the Node Manager, if work preserving recovery is enabled.</description> </property> <property> <name>yarn.nodemanager.address</name> <value>0.0.0.0:45454</value> </property>
<< YARN (MRv2) ResourceManager High Availability | ©2016 Cloudera, Inc. All rights reserved | MapReduce (MRv1) JobTracker High Availability >> |
Terms and Conditions Privacy Policy |