Resolve the “User application exited with status 1” issue in Spark

“User application exited” means an application has stopped running and it can happen for multiple reasons. In specific, We are going to learn, what is “User application exited with status 1” ?.
“User application exited with status 1” indicates that a user program has terminated its processes with an “exit code 1”. In Unix-based operating systems, a zero exit code is generally considered a successful completion, while a non-zero exit code is used to indicate a failure
Symptoms
Non-zero exit codes usually indicate a failure or error in the user-created application. Due to that, the system will consider the application as a failure and will result in the application being marked as failed
ERROR Snippet from Spark application log:
ERROR yarn.ApplicationMaster: User application exited with status 1 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 1, (reason: User application exited with status 1) INFO spark.SparkContext: Invoking stop() from shutdown hook INFO server.AbstractConnector: Stopped Spark
Cause
We will see this issue due to a code problem If there is a non-zero exit code upon completion or failure of an external task (like a shell script or impala query).
To reproduce the issue:
Code to replicate the issue:
- The below python code will help to find the “pi” value using the spark framework.
- I did a minor change to return exit code (1) on successful completion
from __future__ import print_function import sys from random import random from operator import add from pyspark.sql import SparkSession if __name__ == "__main__": """ Usage: pi [partitions] """ spark = SparkSession\ .builder\ .appName("PythonPi")\ .getOrCreate() partitions = int(sys.argv[1]) if len(sys.argv) > 1 else 2 n = 100000 * partitions def f(_): x = random() * 2 - 1 y = random() * 2 - 1 return 1 if x ** 2 + y ** 2 <= 1 else 0 count = spark.sparkContext.parallelize(range(1, n + 1), partitions).map(f).reduce(add) print("Pi is roughly %f" % (4.0 * count / n)) # Code to replicate the issue sys.exit(1) spark.stop()
- This code, After printing the pi value will exit the code with “exit code 1” instead of spark.stop().
- For spark application, any non Zero exit code value means application failure
- The job will fail with the final status ‘User application exited with status 1’
Steps to run the sample code:
Save the above code as pi.py file and run the below command
spark-submit --master yarn --deploy-mode cluster pi.py 10
22/11/22 12:10:54 INFO yarn.Client: Application report for application_1669092136770_0001 (state: RUNNING) 22/11/22 12:10:55 INFO yarn.Client: Application report for application_1669092136770_0001 (state: FINISHED) 22/11/22 12:10:55 INFO yarn.Client: client token: Token { kind: YARN_CLIENT_TOKEN, service: } diagnostics: User application exited with status 1 ApplicationMaster host: ApplicationMaster RPC port: 36269 queue: root.users.test_spark start time: 1669119000924 final status: FAILED tracking URL: https://:8090/proxy/application_1669092136770_0001/ user: test_spark 22/11/22 12:10:55 ERROR yarn.Client: Application diagnostics message: User application exited with status 1 Exception in thread "main" org.apache.spark.SparkException: Application application_1669092136770_0001 finished with failed status at org.apache.spark.deploy.yarn.Client.run(Client.scala:1155) at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1603) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:847) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:922) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:931) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 22/11/22 12:10:55 INFO util.ShutdownHookManager: Shutdown hook called
Troubleshoot similar issues:
– Check the Application master container log for the error: “User application exited with status 1”
– Compare the time of failure with the other container logs, to see if there is a co-relation (Check the end of the post to know, How to collect yarn app logs)
– In the below example, We could see a select query failed, which returns an error code to spark and it failed the container with “User application exited with status 1” status
– As the subsequent attempt of AM container failed, the application marked as a failure
– In this case, The code is trying to connect with an external service like (Hive, or Impala) to read the data.
– Using this method, We can find which part of the job/code is failing
Application Master Container Log
1st Attempt (container_02_13746384932_1323_01_000001)
ERROR yarn.ApplicationMaster: User application exited with status 1 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 1, (reason: User application exited with status 1) INFO spark.SparkContext: Invoking stop() from shutdown hook INFO server.AbstractConnector: Stopped Spark@32dwf242f{HTTP/1.1, (http/1.1)}{0.0.0.0:0} INFO ui.SparkUI: Stopped Spark web UI at
Executor Container log:
ERROR: Failed select * from table test_spark
Application Master Container Log
2nd Attempt (container_02_13746384932_1323_02_000001)
ERROR yarn.ApplicationMaster: User application exited with status 1 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 1, (reason: User application exited with status 1) INFO spark.SparkContext: Invoking stop() from shutdown hook INFO server.AbstractConnector: Stopped Spark@d3423423{HTTP/1.1, (http/1.1)}{0.0.0.0:0} INFO ui.SparkUI: Stopped Spark web UI at INFO cluster.YarnClusterSchedulerBackend: Shutting down all executors
Executor Container log:
ERROR: Failed select * from table test_spark
Resolution:
Based on the above example, To fix this issue, We need to check the code for unwanted exits.
In this case, We could see their application return a non-zero exit code 1. So spark considers this as application failure and failing the application with
Final status: User application exited with status 1
Handling the code, Not to send the exit code upon completion or failure resolved the issue
Do comment on your failure and How you have fixed the issue, Let us know if you got stuck as well 🙂
Additional points:
To collect Spark application logs use the below command
yarn logs -applicationId <application ID> -appOwner <AppOwner>
Where
Application ID is the corresponding app ID
AppOwner is the user name, who submitted the job.
Good Luck with your Learning