How To Set Up a Shared Amazon RDS as Your Hive Metastore for CDH
From CDH 5.10 and later, clusters running in the AWS cloud can share a single persistent instance of the Amazon Relational Database Service (RDS) as the HMS backend database. This enables persistent sharing of metadata beyond a cluster's life cycle so that subsequent clusters need not regenerate metadata as they had to before.
Supported Scenarios
The following limitations apply to the jobs you run when you use an RDS server as a remote backend database for Hive metastore.
- No overlapping data or metadata changes to the same data sets across clusters.
- No reads during data or metadata changes to the same data sets across clusters.
-
Overlapping data or metadata changes are defined as when multiple clusters concurrently:
- Make updates to the same table or partitions within the table located on S3.
- Add or change the same parent schema or database.
Important: If you are running a shared RDS, Cloudera Support will help licensed
customers repair any unexpected metadata issues, but will not do "root-cause" analysis.
Page generated May 18, 2018.