Configuring Apache Hive Metastore High Availability in CDH
You can enable Hive metastore high availability (HA) so that your cluster is resilient to failures if a metastore becomes unavailable. When HA mode is enabled, one of the metastores is designated as the master and the others are slaves. If a master metastore fails, one of the slave metastores takes over.
Continue reading:
- Prerequisites
- Enabling Hive Metastore High Availability Using Cloudera Manager
- Enabling Hive Metastore High Availability Using the Command Line
Prerequisites
- Cloudera recommends that each instance of the metastore runs on a separate cluster host, to maximize high availability.
- Hive metastore HA requires a database that is also highly available, such as MySQL with replication in active-active mode. Refer to the documentation for your database of choice to configure it correctly.
Enabling Hive Metastore High Availability Using Cloudera Manager
Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)
- Go to the Hive service.
- If you have a secure cluster, enable the Hive token store. Non-secure clusters can skip this step.
To apply this configuration property to other role groups as needed, edit the value for the appropriate role group. See Modifying Configuration Properties Using Cloudera Manager.
- Click the Configuration tab.
- Select .
- Select .
- Locate the Hive Metastore Delegation Token Store property or search for it by typing its name In the Search box.
- Select org.apache.hadoop.hive.thrift.DBTokenStore.
- Click Save Changes to commit the changes.
- Click the Instances tab.
- Click Add Role Instances.
- Click the text field under Hive Metastore Server.
- Check the box by the host on which to run the additional metastore and click OK.
- Click Continue and click Finish.
- Check the box by the new Hive Metastore Server role.
- Select Start to confirm. , and click
- Click Close and click to display the stale configurations page.
- Click Restart Stale Services and click Restart Now.
- Click Finish after the cluster finishes restarting.
Enabling Hive Metastore High Availability Using the Command Line
To configure the Hive metastore for high availability, configure each metastore to store its state in a replicated database, then provide the metastore clients with a list of URIs where metastores are available. The client starts with the first URI in the list. If it does not get a response, it randomly picks another URI in the list and attempts to connect. This continues until the client receives a response.
- Follow these command-line instructions on systems that do not use Cloudera Manager.
- This information applies specifically to CDH 5.15.0. See Cloudera Documentation for information specific to other releases.
- Configure Hive on each of the cluster hosts where you want to run a metastore, following the instructions at Configuring the Hive Metastore for CDH.
- On the server where the master metastore instance runs, edit the /etc/hive/conf.server/hive-site.xml file, setting the hive.metastore.uris property's value to a list of URIs where a Hive metastore is available for failover.
<property> <name>hive.metastore.uris</name> <value>thrift://metastore1.example.com,thrift://metastore2.example.com,thrift://metastore3.example.com</value> <description> URI for client to contact metastore server </description> </property>
- If you use a secure cluster, enable the Hive token store by configuring the value of the hive.cluster.delegation.token.store.class property to
org.apache.hadoop.hive.thrift.DBTokenStore. Non-secure clusters can skip this step.
<property> <name>hive.cluster.delegation.token.store.class</name> <value>org.apache.hadoop.hive.thrift.DBTokenStore</value> </property>
- Save your changes and restart each Hive instance.
- Connect to each metastore and update it to use a nameservice instead of a NameNode, as a requirement for high availability.
- From the command-line, as the Hive user, retrieve the list of URIs representing the filesystem roots:
hive --service metatool -listFSRoot
- Run the following command with the --dry-run option, to be sure that the nameservice is available and configured correctly. This will not change your
configuration.
hive --service metatool -updateLocation nameservice-uri namenode-uri -dryRun
- Run the same command again without the --dry-run option to direct the metastore to use the nameservice instead of a NameNode.
hive --service metatool -updateLocation nameservice-uri namenode-uri
- From the command-line, as the Hive user, retrieve the list of URIs representing the filesystem roots:
- Test your configuration by stopping your main metastore instance, and then attempting to connect to one of the other metastores from a client. The following is an example of doing this
on a RHEL or Fedora system. The example first stops the local metastore, then connects to the metastore on the host metastore2.example.com and runs the SHOW TABLES
command.
$ sudo service hive-metastore stop $ /usr/lib/hive/bin/beeline beeline> !connect jdbc:hive2://metastore2.example.com:10000 username password org.apache.hive.jdbc.HiveDriver 0: jdbc:hive2://localhost:10000> SHOW TABLES; show tables; +-----------+ | tab_name | +-----------+ +-----------+ No rows selected (0.238 seconds) 0: jdbc:hive2://localhost:10000>
- Restart the local metastore when you have finished testing.
$ sudo service hive-metastore start
<< Tuning Apache Hive Performance on the Amazon S3 Filesystem in CDH | ©2016 Cloudera, Inc. All rights reserved | Configuring HiveServer2 High Availability in CDH >> |
Terms and Conditions Privacy Policy |