Resolve the “User application exited with status 1” issue in Spark

User application exited with status 1

“User application exited” means an application has stopped running and it can happen for multiple reasons. In specific, We are going to learn, what is “User application exited with status 1” ?.

“User application exited with status 1” indicates that a user program has terminated its processes with an “exit code 1”. In Unix-based operating systems, a zero exit code is generally considered a successful completion, while a non-zero exit code is used to indicate a failure

Symptoms

Non-zero exit codes usually indicate a failure or error in the user-created application. Due to that, the system will consider the application as a failure and will result in the application being marked as failed

ERROR Snippet from Spark application log:

ERROR yarn.ApplicationMaster: User application exited with status 1
INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 1, (reason: User application exited with status 1)
INFO spark.SparkContext: Invoking stop() from shutdown hook
INFO server.AbstractConnector: Stopped Spark

Cause

We will see this issue due to a code problem If there is a non-zero exit code upon completion or failure of an external task (like a shell script or impala query).

To reproduce the issue:

Code to replicate the issue:

  • The below python code will help to find the “pi” value using the spark framework.
  • I did a minor change to return exit code (1) on successful completion
from __future__ import print_function

import sys
from random import random
from operator import add

from pyspark.sql import SparkSession


if __name__ == "__main__":
    """
        Usage: pi [partitions]
    """
    spark = SparkSession\
        .builder\
        .appName("PythonPi")\
        .getOrCreate()

    partitions = int(sys.argv[1]) if len(sys.argv) > 1 else 2
    n = 100000 * partitions

    def f(_):
        x = random() * 2 - 1
        y = random() * 2 - 1
        return 1 if x ** 2 + y ** 2 <= 1 else 0

    count = spark.sparkContext.parallelize(range(1, n + 1), partitions).map(f).reduce(add)
    print("Pi is roughly %f" % (4.0 * count / n))

# Code to replicate the issue
    sys.exit(1)

    spark.stop()
  • This code, After printing the pi value will exit the code with “exit code 1” instead of spark.stop().
  • For spark application, any non Zero exit code value means application failure
  • The job will fail with the final status ‘User application exited with status 1’

Steps to run the sample code:

Save the above code as pi.py file and run the below command

spark-submit --master yarn --deploy-mode cluster pi.py 10
22/11/22 12:10:54 INFO yarn.Client: Application report for application_1669092136770_0001 (state: RUNNING)
22/11/22 12:10:55 INFO yarn.Client: Application report for application_1669092136770_0001 (state: FINISHED)
22/11/22 12:10:55 INFO yarn.Client: 
	 client token: Token { kind: YARN_CLIENT_TOKEN, service:  }
	 diagnostics: User application exited with status 1
	 ApplicationMaster host: 
	 ApplicationMaster RPC port: 36269
	 queue: root.users.test_spark
	 start time: 1669119000924
	 final status: FAILED
	 tracking URL: https://:8090/proxy/application_1669092136770_0001/
	 user: test_spark
22/11/22 12:10:55 ERROR yarn.Client: Application diagnostics message: User application exited with status 1
Exception in thread "main" org.apache.spark.SparkException: Application application_1669092136770_0001 finished with failed status
	at org.apache.spark.deploy.yarn.Client.run(Client.scala:1155)
	at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1603)
	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:847)
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:922)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:931)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
22/11/22 12:10:55 INFO util.ShutdownHookManager: Shutdown hook called

Troubleshoot similar issues:

Check the Application master container log for the error: “User application exited with status 1”

– Compare the time of failure with the other container logs, to see if there is a co-relation (Check the end of the post to know, How to collect yarn app logs)

– In the below example, We could see a select query failed, which returns an error code to spark and it failed the container with “User application exited with status 1” status

– As the subsequent attempt of AM container failed, the application marked as a failure

– In this case, The code is trying to connect with an external service like (Hive, or Impala) to read the data.

– Using this method, We can find which part of the job/code is failing

Application Master Container Log

1st Attempt (container_02_13746384932_1323_01_000001)

ERROR yarn.ApplicationMaster: User application exited with status 1
INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 1, (reason: User application exited with status 1)
INFO spark.SparkContext: Invoking stop() from shutdown hook
INFO server.AbstractConnector: Stopped Spark@32dwf242f{HTTP/1.1, (http/1.1)}{0.0.0.0:0}
INFO ui.SparkUI: Stopped Spark web UI at

Executor Container log:

ERROR: Failed select * from table test_spark

Application Master Container Log

2nd Attempt (container_02_13746384932_1323_02_000001)

ERROR yarn.ApplicationMaster: User application exited with status 1
INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 1, (reason: User application exited with status 1)
INFO spark.SparkContext: Invoking stop() from shutdown hook
INFO server.AbstractConnector: Stopped Spark@d3423423{HTTP/1.1, (http/1.1)}{0.0.0.0:0}
INFO ui.SparkUI: Stopped Spark web UI at 
INFO cluster.YarnClusterSchedulerBackend: Shutting down all executors

Executor Container log:

ERROR: Failed select * from table test_spark

Resolution:

Based on the above example, To fix this issue, We need to check the code for unwanted exits.

In this case, We could see their application return a non-zero exit code 1. So spark considers this as application failure and failing the application with

Final status: User application exited with status 1

Handling the code, Not to send the exit code upon completion or failure resolved the issue

Do comment on your failure and How you have fixed the issue, Let us know if you got stuck as well 🙂

Additional points:

To collect Spark application logs use the below command

yarn logs -applicationId <application ID> -appOwner <AppOwner>

Where

Application ID is the corresponding app ID

AppOwner is the user name, who submitted the job.

Good Luck with your Learning

Similar Posts