Configuring Spark on YARN for Long-Running Applications
Long-running applications such as Spark Streaming jobs must be able to write to HDFS, which means that the hdfs user may need to delegate tokens possibly beyond the default lifetime. This workload type requires passing Kerberos principal and keytab to the spark-submit script using the --principal and --keytab parameters. The keytab is copied to the host running the ApplicationMaster, and the Kerberos login is renewed periodically by using the principal and keytab to generate the required delegation tokens needed for HDFS.
Note: For secure distribution of the keytab to the ApplicationMaster host, the cluster should be
configured for TLS/SSL communication for YARN and HDFS encryption .
Create the Spark Principal and Keytab File
These are needed for long-running applications running on Spark on YARN cluster mode only.
- Create the spark principal and spark.keytab file:
kadmin: addprinc -randkey spark/fully.qualified.domain.name@YOUR-REALM.COM kadmin: xst -k spark.keytab spark/fully.qualified.domain.name
See Step 4: Create and Deploy the Kerberos Principals and Keytab Files for more information about Kerberos and its use with Cloudera clusters.
Page generated May 18, 2018.
<< Spark Authentication | ©2016 Cloudera, Inc. All rights reserved | Sqoop 2 Authentication >> |
Terms and Conditions Privacy Policy |