Resolve “Task serialization failed: java.lang.StackOverflowError” in Spark

“Task serialization failed: java.lang.StackOverflowError” usually happens, When the JVM encounters a situation where it is unable to create a new stack frame due to space issues and will result in StackOverflowError
Symptoms
StackOverflowError will cause application failure and you will be seeing the below messages in the application logs
INFO scheduler.DAGScheduler: ResultStage 11 (showString at NativeMethodAccessorImpl.java:0) failed in 0.062 s due to Job aborted due to stage failure: Task serialization failed: java.lang.StackOverflowError
java.lang.StackOverflowError
In this scenario, the Spark application is processing an XML file with multiple nested fields. Due to this, it has reached the default stack size (Which is 1024KB)
NOTE: Default stack size for a 64-bit Linux JDK is 1024 KB
Cause
The most common cause for this issue is When the user uses infinite recursion and an unterminated loop.
Based on the reference from Java official document, StackOverflowError error is thrown as a result of very deep recursion in a particular code snippet
Apart from recursion, We can see this issue, when the application keeps calling methods within methods until the stack is exhausted or using a lot of local variables (These are some of the known causes )
Resolution
When you are seeing this error, Which means the user code (i.e. not in the Spark framework code) doing something wrong causing it result in this issue,
– Either we need to rewrite the application to avoid the issue
– Or Increase the memory allocation for the stack object, Which is the easiest way to resolve the issue faster.
To increase the memory, We can add the below configuration in the spark application submit command
--conf "spark.executor.extraJavaOptions=-XX:ThreadStackSize=2048000"
As mentioned earlier, the default stack size is 1024 KB. The value shown in the example, 2048 KB, is just an example. You can experiment with smaller or larger values. The exact value to use is application specific and depends on your application logic, code, and input data.
Similarly, if you hit a StackOverflowError in the Driver, you can set:
--conf "spark.driver.extraJavaOptions=-XX:ThreadStackSize=2048000"
For long-term use, it may be desired to set this property in a spark-defaults.conf file which contains other standard property values for your application.
Spark command example to add the above configuration
spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster --conf "spark.driver.extraJavaOptions=-XX:ThreadStackSize=2048000" --conf "spark.executor.extraJavaOptions=-XX:ThreadStackSize=2048000" /opt/cloudera/parcels/CDH/jars/spark-examples*.jar 1 1
For better resolution, It is always recommended to handle this at the code level (Code optimization). Make sure we are addressing the below points in the code
- Incorrectly implemented recursion, Loop (i.e. with no termination condition)
- Cyclic dependency between various classes
- Instantiating a class within the same class
JDK 1.8 Reference
https://docs.oracle.com/javase/8/docs/technotes/tools/unix/java.html
Above topics and properties are discussed in the Apache Spark guide below
https://spark.apache.org/docs/latest/configuration.html
Good Luck with your learning !!