How to set Apache Spark Executor and Driver Memory/Cores ( pyspark and spark-submit)

To set Apache Spark Executor and Driver Memory/Cores

In this article, We will learn about the importance of Memory/Cores in a Spark job and tuning these properties help to improve their performance and to avoid any Out of Memory issues

We will discuss the ways to specify Executor and driver memory/cores in different spark commands

  • By Specifying it in the runtime (via Commands)
  • By specifying in the Spark config files

By Specifying it in the runtime (via Commands)

This is by specifying the memory/Cores setting in the spark command (While triggering the Spark application). Usually, this is the recommended way in the Spark memory tuning exercise. Its because each and every job needs it’s our memory/cores based on the Data Load/ Code Logic and it is Ideal approach to keep this config separate for each application

Below are the properties that need to be added based on the use case

–num-executors

Number of executors that needs to be spawned to complete the tasks, This is a hard limit, where YARN will assign a specified number of executor and will not increase more than that

If you have dynamic allocation enabled, Then yarn will increase the num executor based on the container request

–executor-memory

This property helps to specify the executor memory, If you have set 4 GB as executor memory, then each executor will be assigned 4GB of memory

–driver-memory

To specify the memory for the spark driver

–executor-cores

Number of cores that will be assigned per executor, if you have set 4 cores, then each executor will have 4 cores each

Different types of spark commands and the ways to add memory settings

Spark-submit

]# spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode client  --num-executors 2 --executor-memory 4G --driver-memory 4G --executor-cores 5  /opt/cloudera/parcels/CDH/jars/spark-examples*.jar 1 1 

We can use the spark history server to confirm if the memory has been assigned as mentioned in the command

Spark history server Screenshot (Environment tab)

How to set Apache Spark Executor and Driver Memory/Cores ( pyspark and spark-submit)
How to set Apache Spark Executor and Driver Memory/Cores ( pyspark and spark-submit)

Spark-shell

spark-shell --num-executors 2 --executor-memory 4G --driver-memory 4G --executor-cores 5

Pyspark

pyspark --num-executors 2 --executor-memory 4G --driver-memory 4G --executor-cores 5

By specifying in the Spark config files

We can also specify the memory/cores settings in the spark-default.conf file. Any Spark application triggered will use the default conf, if you are not seeing the config explicitly in the command

Example:

spark-default.conf

spark.executor.memory   4g
spark.eventLog.enabled  true
spark.driver.memory   1g

Usually, it will be located in /etc/spark/conf

Tunning memory setting via default.conf is not usually recommended, Let’s say we are setting Executor memory to 12GB and this 12 GB will be allocated for all the spark applications in the cluster by default even for Small as well as Big applications, Which is not a recommended way and can cause resource crunch

Points to remember:

If you are seeing an OOM => Out Of Memory error in Executor container logs, then increasing the executor memory will be sufficient and similar for the driver as well.

Make sure to assign the Memory/Cores based on the need and the availability

Conclusion

In this article, We have discussed the need for setting the memory/cores for the executor and driver contains and the commands used to set these properties

Check here to know more about Spark Driver

Good luck with your learning!!

Similar Posts