Resolve “Task serialization failed: java.lang.StackOverflowError” in Spark

 

Resolve "Task serialization failed: java.lang.StackOverflowError" in Spark

“Task serialization failed: java.lang.StackOverflowError” usually happens, When the JVM encounters a situation where it is unable to create a new stack frame due to space issues and will result in StackOverflowError

Table of Contents

Symptoms

StackOverflowError will cause application failure and you will be seeing the below messages in the application logs

INFO scheduler.DAGScheduler: ResultStage 11 (showString at NativeMethodAccessorImpl.java:0) failed in 0.062 s due to Job aborted due to stage failure: Task serialization failed: java.lang.StackOverflowError

java.lang.StackOverflowError

In this scenario, the Spark application is processing an XML file with multiple nested fields. Due to this, it has reached the default stack size (Which is 1024KB)

NOTE: Default stack size for a 64-bit Linux JDK is 1024 KB

Cause

The most common cause for this issue is When the user uses infinite recursion and an unterminated loop.

Based on the reference from Java official document, StackOverflowError error is thrown as a result of very deep recursion in a particular code snippet

Apart from recursion, We can see this issue, when the application keeps calling methods within methods until the stack is exhausted or using a lot of local variables (These are some of the known causes )

Resolution

When you are seeing this error, Which means the user code (i.e. not in the Spark framework code) doing something wrong causing it result in this issue, 

– Either we need to rewrite the application to avoid the issue 

– Or Increase the memory allocation for the stack object, Which is the easiest way to resolve the issue faster.

To increase the memory, We can add the below configuration in the spark application submit command

--conf "spark.executor.extraJavaOptions=-XX:ThreadStackSize=2048000"

As mentioned earlier, the default stack size is 1024 KB. The value shown in the example, 2048 KB, is just an example. You can experiment with smaller or larger values. The exact value to use is application specific and depends on your application logic, code, and input data.

Similarly, if you hit a StackOverflowError in the Driver, you can set:

--conf "spark.driver.extraJavaOptions=-XX:ThreadStackSize=2048000"

For long-term use, it may be desired to set this property in a spark-defaults.conf file which contains other standard property values for your application.

Spark command example to add the above configuration

spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster  --conf "spark.driver.extraJavaOptions=-XX:ThreadStackSize=2048000" --conf "spark.executor.extraJavaOptions=-XX:ThreadStackSize=2048000" /opt/cloudera/parcels/CDH/jars/spark-examples*.jar  1 1

For better resolution, It is always recommended to handle this at the code level (Code optimization). Make sure we are addressing the below points in the code

  • Incorrectly implemented recursion, Loop (i.e. with no termination condition)
  • Cyclic dependency between various classes
  • Instantiating a class within the same class

JDK 1.8 Reference

https://docs.oracle.com/javase/8/docs/technotes/tools/unix/java.html

Above topics and properties are discussed in the Apache Spark guide below

https://spark.apache.org/docs/latest/configuration.html

Good Luck with your learning !!

Similar Posts