spark.driver.memoryOverhead and spark.executor.memoryOverhead explained

spark.driver.memoryOverhead and spark.executor.memoryOverhead explained

In this article, We will learn about memory overhead configuration in spark and explore more about spark.driver.memoryOverhead & spark.executor.memoryOverhead and its importance to ensure the optimal performance of your Spark applications

What is Memory Overhead?

Memory overhead refers to the additional memory required by the system other than allocated container memory, In other words, memory required by the container to perform operations like task scheduling and memory management for the JVM heap other than the actual computation

The importance of allocating overhead memory will help in the application stability, and performance and avoid Out of Memory issues. The actual requirement of overhead memory depends on various factors like Data size, Application logic, and resource

Spark allows the configuration of these memory settings in both Driver and executor containers

  • spark.driver.memoryOverhead
  • spark.executor.memoryOverhead

spark.driver.memoryOverhead

spark.driver.memoryOverhead is a configuration property that helps to specify the amount of memory overhead that needs to be allocated for a driver process in addition to the memory required by the application which is the default container memory allocated by Yarn.

 The driver process is responsible for coordinating the execution of tasks and managing the application and requires a certain amount of memory overhead to perform these functions effectively. This overhead memory is used for tasks such as JVM heap management, task scheduling, and data structures for metadata.

Tuning spark.driver.memoryOverhead

By default spark.driver.memoryOverhead will be allocated by the yarn based on the “spark.driver.memoryOverheadFactor” value, But it can be overridden based on the application need

spark.driver.memoryOverheadFactor is set to 0.10 by default, Which is 10% of the assigned container memory.

NOTE: If 10% of the driver container memory is less than 384 MB, then yarn will allocate 384MB as spark.driver.memoryOverhead

spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode client --num-executors 4 --executor-memory 4G --driver-memory 4G --conf spark.driver.memoryOverhead=1g  /opt/cloudera/parcels/CDH/jars/spark-examples.jar 10

The driver process will start before starting the logging, This memory setting will not be printed in the Driver logs, But we can see more details for the executor container which is similar to Driver container memory allocation, and will be explained in the next topic

The amount of memory overhead required depends on the specifics of the application and the resources available on the cluster. It is generally recommended to set this property to a value that is 10-20% of the total memory assigned to the driver process.

spark.executor.memoryOverhead

spark.executor.memoryOverhead is a configuration property that specifies the amount of memory overhead that needs to be allocated to each executor in addition to the memory required by the application (Container memory allocated by yarn).

The executors are the processes that run the tasks in the application and require a certain amount of memory overhead to perform their operations effectively. This overhead memory is used for tasks such as JVM heap management, task scheduling, and data structures for metadata.

Tuning spark.executor.memoryOverhead

Tuning spark.executor.memoryOverhead would be as same as spark.driver.memoryOverhead

In this example, I didn’t set the spark.executor.memoryOverhead, So Yarn allocates the default factor of 10% of yarn container memory as overhead memory => 409 MB

(4 GB Container memory /100) * 10 => 409MB executor.memoryOverhead

spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode client --num-executors 4 --executor-memory 4G --driver-memory 4G /opt/cloudera/parcels/CDH/jars/spark-examples.jar 10

23/02/09 11:49:46 DEBUG yarn.ResourceRequestHelper: Custom resources requested: Map()

23/02/09 11:49:46 DEBUG yarn.YarnAllocator: Created resource capability: <memory:4505, vCores:1>

23/02/09 11:49:46 DEBUG yarn.YarnAllocator: Updating resource requests, target: 4, pending: 0, running: 0, executorsStarting: 0

23/02/09 11:49:46 INFO yarn.YarnAllocator: Will request 4 executor container(s), each with 1 core(s) and 4505 MB memory (including 409 MB of overhead)

23/02/09 11:49:46 DEBUG impl.RemoteRequestsTable: Added priority=1

Now, I am setting the spark.executor.memoryOverhead as 1 GB as below

Example:

spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode client --num-executors 4 --executor-memory 4G --driver-memory 4G --conf "spark.executor.memoryOverhead=1g" /opt/cloudera/parcels/CDH/jars/spark-examples*.jar 10

Now, We can see the executor.memoryOverhead has been allocated as 1GB along with 4 GB Container memory => a total of 5GB for the executor container

23/02/09 12:06:21 INFO yarn.YarnAllocator: Received 1 containers from YARN, launching executors on 1 of them.

23/02/09 12:06:22 INFO yarn.YarnAllocator: Driver requested a total number of 7 executor(s).

23/02/09 12:06:22 INFO yarn.YarnAllocator: Will request 2 executor container(s), each with 1 core(s) and 5120 MB memory (including 1024 MB of overhead)

If 10% of the driver container memory is less than 384 MB, then yarn will allocate 384MB as spark.driver.memoryOverhead

Example:

spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode client --num-executors 4 --executor-memory 1G --driver-memory 1G  /opt/cloudera/parcels/CDH/jars/spark-examples*.jar 10

23/02/09 12:14:37 INFO yarn.YarnAllocator: Driver requested a total number of 5 executor(s).

23/02/09 12:14:37 INFO yarn.YarnAllocator: Will request 1 executor container(s), each with 1 core(s) and 1408 MB memory (including 384 MB of overhead)

23/02/09 12:14:37 INFO yarn.YarnAllocator: Submitted 1 unlocalized container requests.

In this example, 10% of 1 GB is around 109MB which is less than 384MB, So yarn decided to use 384 MB instead of 109MB (Shown in the above log snippet)

Just like the driver process, the amount of memory overhead required by the executors depends on the specifics of the application and the resources available on the cluster. It is generally recommended to set this property to a value that is 10-20% of the total memory assigned to each executor.

Conclusion

In conclusion, spark.driver.memoryOverhead and spark.executor.memoryOverhead configuration properties play a critical role in ensuring the stability and performance of your Spark applications. By correctly setting these values, you can ensure that your applications have enough memory overhead to perform their functions effectively and deliver the results you expect.

For Reference check here:

Good Luck with your Learning !!

Similar Posts