How to Enable Kerberos Debugging for Spark Application and “hdfs dfs” Command

How to Enable Kerberos Debugging for Spark Application and "hdfs dfs" Command

Kerberos debugging involves enabling debug log level for the Krb5LoginModule module at the JVM level, This would help us to get more detailed logging related to security, Which allows us in troubleshooting failures

By default, You will not see any kerberos-related logs in the console or application logs, 

For example below error messages will not be displayed in the regular output, Due to that you will be missing a major part for troubleshooting, and can be mislead with the regular logs

>>> KDCRep: init() encoding tag is 126 req type is 13 

>>>KRBError: 

sTime is Wed Jan 13 16:39:11 JST 2017 1488154293000 

suSec is 949350 

error code is 14 

error Message is KDC has no support for encryption type 

Below are the steps to enable Kerberos debugging for spark application and “hdfs dfs” command

HDFS DFS command

There are multiple scenarios, Where we need to enable debugging to get more kerberos-related information for troubleshooting

Setting the below value before the ‘hdfs dfs” command would give us extra logging

export HADOOP_OPTS="-Dsun.security.krb5.debug=true"
exportHADOOP_ROOT_LOGGER=DEBUG,console
hdfs dfs -ls /

23/01/14 13:10:42 DEBUG security.SaslRpcClient: RPC Server’s Kerberos principal name for protocol=org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolPB is hdfs/<hostname>@testing.site

23/01/14 13:10:42 DEBUG security.SaslRpcClient: Creating SASL GSSAPI(KERBEROS)  client to authenticate to service at <hostname>

23/01/14 13:10:42 DEBUG security.SaslRpcClient: Use KERBEROS authentication for protocol ClientNamenodeProtocolPB

Found ticket for test@testing.site to go to krbtgt/testing.site@testing.site expiring on Sun Jan 15 13:09:56 UTC 2023

Entered Krb5Context.initSecContext with state=STATE_NEW

Found ticket for test@testing.site to go to krbtgt/testing.site@testing.site expiring on Sun Jan 15 13:09:56 UTC 2023

Service ticket not found in the subject

>>> Credentials acquireServiceCreds: same realm

Using builtin default etypes for default_tgs_enctypes

default etypes for default_tgs_enctypes: 18 17 16 23 1 3.

>>> CksumType: sun.security.krb5.internal.crypto.RsaMd5CksumType

>>> EType: sun.security.krb5.internal.crypto.Des3CbcHmacSha1KdEType

>>> KrbKdcReq send: kdc=<hostname> UDP:88, timeout=10000, number of retries =3, #bytes=720

>>> KDCCommunication: kdc=<hostname> UDP:88, timeout=10000,Attempt =1, #bytes=720

>>> KrbKdcReq send: #bytes read=722

>>> KdcAccessibility: remove <hostname>:88

>>> EType: sun.security.krb5.internal.crypto.Des3CbcHmacSha1KdEType

>>> KrbApReq: APOptions are 00100000 00000000 00000000 00000000

>>> EType: sun.security.krb5.internal.crypto.Des3CbcHmacSha1KdEType

Krb5Context setting mySeqNumber to: 1067920184

Created InitSecContextToken:

23/01/14 13:10:43 DEBUG ipc.ProtobufRpcEngine: Call: getListing took 6ms

Found 3 items

drwxr-xr-x   – solr    solr                0 2023-01-14 04:32 /solr

drwxrwxrwt   – hdfs    supergroup          0 2023-01-14 04:35 /tmp

drwxr-xr-x   – hdfs    supergroup          0 2023-01-14 04:31 /user

23/01/14 13:10:43 DEBUG ipc.Client: stopping client from cache: org.apache.hadoop.ipc.Client@2dde1bff

23/01/14 13:10:43 DEBUG ipc.Client: removing client from cache: org.apache.hadoop.ipc.Client@2dde1bff

Spark Application

Enabling Kerberos debugging involves setting up the debug log level in both the Executor container and the Driver container

For the Spark Driver:

spark.driver.extraJavaOptions=-Dsun.security.krb5.debug=true

For Spark Executors:

spark.executor.extraJavaOptions=-Dsun.security.krb5.debug=true

These properties can be set as needed on the spark-submit command line.

Example:

# spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode client --conf "spark.driver.extraJavaOptions=-Dsun.security.krb5.debug=true" --conf "spark.executor.extraJavaOptions=-Dsun.security.krb5.debug=true" /opt/cloudera/parcels/CDH/jars/spark-examples*.jar  10 10

Java config name: null

Native config name: /etc/krb5.conf

Loaded from native config

Java config name: null

Native config name: /etc/krb5.conf

Loaded from native config

>>> KdcAccessibility: reset

>>> KdcAccessibility: reset

>>>KinitOptions cache name is /tmp/krb5cc_0

>>>DEBUG <CCacheInputStream>  client principal is test@testing.site

>>>DEBUG <CCacheInputStream> server principal is krbtgt/testing.site@testing.site

>>>DEBUG <CCacheInputStream> key type: 16

For Spark Cluster Mode, Spark log4j configuration sends Spark logging output to stderr, It would be efficient to add the logs to the spark application log itself for better troubleshooting.

spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster --conf "spark.driver.extraJavaOptions=-Dsun.security.krb5.debug=true" --conf "spark.executor.extraJavaOptions=-Dsun.security.krb5.debug=true" /opt/cloudera/parcels/CDH/jars/spark-examples*.jar  10 10

To do that, We need to add the below property in the spark gateway configuration

For Cloudera Distribution 

Cloudera Manager -> Spark -> Configuration -> Search “Gateway Logging Advanced Configuration Snippet (Safety Valve)”

Add the below property

log4j.appender.console.target=System.out

For Other distributions,

We can add the below line in the log4j.properties under the /etc/spark/conf directory or other relevant spark log4j file

log4j.appender.console.target=System.out

This has to be added in all the execution nodes

To get the application logs, We can use the following yarn command

yarn logs -applicationId <Add Id> -appOwner <AppOwner> > application-id.log

For Spark Client mode, Kerberos debugging will be printed in the console log and we don’t need extra property addition

spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode client --conf "spark.driver.extraJavaOptions=-Dsun.security.krb5.debug=true" --conf "spark.executor.extraJavaOptions=-Dsun.security.krb5.debug=true" /opt/cloudera/parcels/CDH/jars/spark-examples*.jar  10 10

The above command will display the Kerberos debugging logs in the console logs itself

Conclusion

We have learned to enable Kerberos debugging for both the spark application and “hdfs dfs ” command, This would help in efficient troubleshooting

Good Luck with your Learning !!

Similar Posts