Yarn application stuck in the ACCEPTED state (Includes Spark, Hive, Tez, and MapReduce jobs)

Usually, the Yarn application will stuck in the ACCEPTED state, When it didn’t find enough resources to create a new container in the cluster and schedule a task.
Below are the scenarios, We usually face this issue
- When the Total cluster resource or queue resource is exhausted
- When the Application Master container creation threshold reaches its max
When the Total cluster resource or Queue resource is exhausted
- When your total cluster capacity or a particular queue (where you are submitting your Job) is been used at its maximum capacity
- The yarn would accept your Job submission request, but as it is unable to satisfy the Memory/Core requirement (Due to resource constraints), It will keep it in the ACCEPTED state, till it can able to allocate the requested resources
- Till the point, Your job will be in an ACCEPTED state
How to find the Resouce usage:
Check the Yarn Resource manager Webui -> Schedular page
– On this page, At the top, You can able easily identify the total cluster capacity (Memory and cores) and current usage
– If you are seeing usage of more than 90%, Then you are in the resource crunch state
Example Screenshot:
Here, We could see the total Memory is 44GB and the Total Vcores is 16, Cluster is currently Idle so no resource utilization.

- To check a specific queue level usage, There are chances, Where a specific queue is fully occupied and it is unable to allocate resources for new jobs in the same queue
- Can be identified in the RM -> Schedule page, Where you can see queue level max resource and usage
Example Screenshot:
In the below screenshot, We can easily understand that Used and max resources

Resolution:
- If your resource has been used at its max, Then we need to make sure there is no rouge job occupying the entire resources and causing a resource crunch
- If it is queue specific, We can move the job to a different queue as an immediate solution
When the Application Master container creation threshold reaches its max
In Yarn, there is a restriction for creating multiple AM containers on each queue, This is to make sure, We are not getting into a deadlock situation
In this case, you will be seeing the below error messages in the Application logs
waiting for AM container to be allocated
This can be checked in the RM -> Schedule page
Where we need to check AM max resource & Current AM Used resource, if you are seeing the AM Used is at its max usage, then Yarn will not allow you to schedule another AM container in the same queue, Which results in your Job going into ACCEPTED state

Resolution:
– We can check if there are multiple small applications occupying the AM resources
– Else, we can increase the AM max resource to the upper value
In Cloudera, We can use dynamic resource pool configuration and edit the pool config to increase the AM max share as below

You might be wondering, Why we need to restrict the AM container creation in the 1st place,
AM max share
This is to make sure, you are not going into a deadlock situation, where your job will be in the ACCEPTED state forever
Example:
Let’s say you have 100 GB and 100 Cores capacity in your cluster, and you have only one queue with 100% cluster capacity
- When you are submitting the 100 Oozie job, What will happen?
- Oozie will create 1 AM( Application Master) container for itself and 1 AM container for each action that is going to perform. Let’s assume, each container will need 1 GB of memory and 1 core
- In this case, 100 Oozie launcher AM will be created, and as your cluster capacity is reached at its max
- Now, You can’t move anywhere, All the jobs will be in the same state forever as Oozie can’t able to trigger any new AM container for its action
Start a discussion if you have any questions or a better way to resolve the issue
Hope this article is useful for you, Good luck with your learning