Cloudera Enterprise 5.15.x | Other versions

Configuring Other CDH Components to Use HDFS HA

You can use the HDFS high availability NameNodes with other components of CDH.

Continue reading:

Configuring HBase to Use HDFS HA
Configuring the Hive Metastore to Use HDFS HA
Configuring Hue to Work with HDFS HA Using Cloudera Manager
Configuring Impala to Work with HDFS HA
Configuring Oozie to Use HDFS HA

Configuring HBase to Use HDFS HA

Configuring HBase to Use HDFS HA Using Cloudera Manager

If you configure HBase to use an HA-enabled HDFS instance, Cloudera Manager automatically handles HA configuration for you.

Configuring HBase to Use HDFS HA Using the Command Line

To configure HBase to use HDFS HA, proceed as follows.

Shut Down the HBase Cluster
Configure hbase.rootdir
Restart HBase
HBase-HDFS HA Troubleshooting

Shut Down the HBase Cluster

Stop the Thrift server and clients:
```
sudo service hbase-thrift stop
```
Stop the cluster by shutting down the Master and the RegionServers:
- Use the following command on the Master host:
```
sudo service hbase-master stop
```
- Use the following command on each host hosting a RegionServer:
```
sudo service hbase-regionserver stop
```

Configure hbase.rootdir

Change the distributed file system URI in hbase-site.xml to the name specified in the dfs.nameservices property in hdfs-site.xml. The clients must also have access to hdfs-site.xml's dfs.client.* settings to properly use HA.

For example, suppose the HDFS HA property dfs.nameservices is set to ha-nn in hdfs-site.xml. To configure HBase to use the HA NameNodes, specify that same value as part of your hbase-site.xml's hbase.rootdir value:

<!-- Configure HBase to use the HA NameNode nameservice -->
<property>
  <name>hbase.rootdir</name>
  <value>hdfs://ha-nn/hbase</value>
</property>

Restart HBase

Start the HBase Master.
Start each of the HBase RegionServers.

HBase-HDFS HA Troubleshooting

Problem: HMasters fail to start.

Solution: Check for this error in the HMaster log:

2012-05-17 12:21:28,929 FATAL master.HMaster (HMaster.java:abort(1317)) - Unhandled exception. Starting shutdown.
java.lang.IllegalArgumentException: java.net.UnknownHostException: ha-nn
        at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:431)
        at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:161)
        at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:126)
...

If so, verify that Hadoop's hdfs-site.xml and core-site.xml files are in your hbase/conf directory. This may be necessary if you put your configurations in non-standard places.

Configuring the Hive Metastore to Use HDFS HA

The Hive metastore can be configured to use HDFS high availability by using Cloudera Manager or by using the command-line for unmanaged clusters.

Configuring the Hive Metastore to Use HDFS HA Using Cloudera Manager

In the Cloudera Manager Admin Console, go to the Hive service.
Select Actions > Stop.
Note: You may want to stop the Hue and Impala services first, if present, as they depend on the Hive service.
Click Stop again to confirm the command.
Back up the Hive metastore database.
Select Actions > Update Hive Metastore NameNodes and confirm the command.
Select Actions > Start and click Start to confirm the command.
Restart the Hue and Impala services if you stopped them prior to updating the metastore.

Upgrading the Hive Metastore to Use HDFS HA Using the Command Line

Important:

Follow these command-line instructions on systems that do not use Cloudera Manager.
This information applies specifically to CDH 5.15.0. See Cloudera Documentation for information specific to other releases.

To configure the Hive metastore to use HDFS HA, change the records to reflect the location specified in the dfs.nameservices property, using the Hive metatool to obtain and change the locations.

Note: Before attempting to upgrade the Hive metastore to use HDFS HA, shut down the metastore and back it up to a persistent store.

If you are unsure which version of Avro SerDe is used, use both the serdePropKey and tablePropKey arguments. For example:

$ hive --service metatool -listFSRoot
...
hdfs://<oldnamenode>.com/user/hive/warehouse

$ hive --service metatool -updateLocation hdfs://<new_nameservice1>
hdfs://<oldnamenode>.com -tablePropKey <avro.schema.url> 
-serdePropKey <schema.url>
...

$ hive --service metatool -listFSRoot
...
hdfs://nameservice1/user/hive/warehouse

where:

hdfs://oldnamenode.com/user/hive/warehouse identifies the NameNode location.
hdfs://nameservice1 specifies the new location and should match the value of the dfs.nameservices property.
tablePropKey is a table property key whose value field may reference the HDFS NameNode location and hence may require an update. To update the Avro SerDe schema URL, specify avro.schema.url for this argument.
serdePropKey is a SerDe property key whose value field may reference the HDFS NameNode location and hence may require an update. To update the Haivvero schema URL, specify schema.url for this argument.

Note: The Hive metatool is a best effort service that tries to update as many Hive metastore records as possible. If it encounters an error during the update of a record, it skips to the next record.

Configuring Hue to Work with HDFS HA Using Cloudera Manager

Add the HttpFS role.
After the command has completed, go to the Hue service.
Click the Configuration tab.
Locate the HDFS Web Interface Role property or search for it by typing its name in the Search box.
Select the HttpFS role you just created instead of the NameNode role, and save your changes.
Restart the Hue service.

Configuring Impala to Work with HDFS HA

Complete the steps to reconfigure the Hive metastore database, as described in the preceding section. Impala shares the same underlying database with Hive, to manage metadata for databases, tables, and so on.
Issue the INVALIDATE METADATA statement from an Impala shell. This one-time operation makes all Impala daemons across the cluster aware of the latest settings for the Hive metastore database. Alternatively, restart the Impala service.

Configuring Oozie to Use HDFS HA

To configure an Oozie workflow to use HDFS HA, use the HDFS nameservice instead of the NameNode URI in the <name-node> element of the workflow.

Example:

<action name="mr-node">
  <map-reduce>
    <job-tracker>${jobTracker}</job-tracker>
    <name-node>hdfs://ha-nn

where ha-nn is the value of dfs.nameservices in hdfs-site.xml.

Page generated May 18, 2018.