Cloudera Enterprise 5.15.x | Other versions

Best Practices for Using Apache Hive in CDH

Hive data warehouse software enables reading, writing, and managing large datasets in distributed storage. Using the Hive query language (HiveQL), which is very similar to SQL, queries are converted into a series of jobs that execute on a Hadoop cluster through MapReduce or Apache Spark.

Users can run batch processing workloads with Hive while also analyzing the same data for interactive SQL or machine-learning workloads using tools like Apache Impala or Apache Spark—all within a single platform.

As part of CDH, Hive also benefits from:

Unified resource management provided by YARN
Simplified deployment and administration provided by Cloudera Manager
Shared security and governance to meet compliance requirements provided by Apache Sentry and Cloudera Navigator

Continue reading:

Overview of Apache Hive Installation and Upgrade in CDH
Configuring Apache Hive in CDH
Using & Managing Apache Hive in CDH
Tuning Apache Hive in CDH
Overview of Apache Hive Data Replication in CDH
Overview of Apache Hive Security in CDH
Troubleshooting Apache Hive in CDH

Page generated May 18, 2018.