How to set Apache Spark Executor and Driver Memory/Cores ( pyspark and spark-submit)

In this article, We will learn about the importance of Memory/Cores in a Spark job and tuning these properties help to improve their performance and to avoid any Out of Memory issues
We will discuss the ways to specify Executor and driver memory/cores in different spark commands
- By Specifying it in the runtime (via Commands)
- By specifying in the Spark config files
By Specifying it in the runtime (via Commands)
This is by specifying the memory/Cores setting in the spark command (While triggering the Spark application). Usually, this is the recommended way in the Spark memory tuning exercise. Its because each and every job needs it’s our memory/cores based on the Data Load/ Code Logic and it is Ideal approach to keep this config separate for each application
Below are the properties that need to be added based on the use case
–num-executors
Number of executors that needs to be spawned to complete the tasks, This is a hard limit, where YARN will assign a specified number of executor and will not increase more than that
If you have dynamic allocation enabled, Then yarn will increase the num executor based on the container request
–executor-memory
This property helps to specify the executor memory, If you have set 4 GB as executor memory, then each executor will be assigned 4GB of memory
–driver-memory
To specify the memory for the spark driver
–executor-cores
Number of cores that will be assigned per executor, if you have set 4 cores, then each executor will have 4 cores each
Different types of spark commands and the ways to add memory settings
Spark-submit
]# spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode client --num-executors 2 --executor-memory 4G --driver-memory 4G --executor-cores 5 /opt/cloudera/parcels/CDH/jars/spark-examples*.jar 1 1
We can use the spark history server to confirm if the memory has been assigned as mentioned in the command
Spark history server Screenshot (Environment tab)


Spark-shell
spark-shell --num-executors 2 --executor-memory 4G --driver-memory 4G --executor-cores 5
Pyspark
pyspark --num-executors 2 --executor-memory 4G --driver-memory 4G --executor-cores 5
By specifying in the Spark config files
We can also specify the memory/cores settings in the spark-default.conf file. Any Spark application triggered will use the default conf, if you are not seeing the config explicitly in the command
Example:
spark-default.conf
spark.executor.memory 4g
spark.eventLog.enabled true
spark.driver.memory 1g
Usually, it will be located in /etc/spark/conf
Tunning memory setting via default.conf is not usually recommended, Let’s say we are setting Executor memory to 12GB and this 12 GB will be allocated for all the spark applications in the cluster by default even for Small as well as Big applications, Which is not a recommended way and can cause resource crunch
Points to remember:
If you are seeing an OOM => Out Of Memory error in Executor container logs, then increasing the executor memory will be sufficient and similar for the driver as well.
Make sure to assign the Memory/Cores based on the need and the availability
Conclusion
In this article, We have discussed the need for setting the memory/cores for the executor and driver contains and the commands used to set these properties
Check here to know more about Spark Driver
Good luck with your learning!!