Resolve “node has conditions: [DiskPressure]”
You might have faced the “DiskPressure” error messages in the Kubernetes cluster, Which results in pod/container eviction. In this article, We will learn more about the cause of this issue and how to fix it
“node has conditions: [DiskPressure]” meaning that the node’s disk usage has exceeded a certain threshold, and there is not enough space left to schedule new pods. Docker disk has an 80% threshold for disk usage, if it exceeded the limit then the Node will be marked as “NodeUnderDiskPressure()”
![Resolve "node has conditions: [DiskPressure]" 1 node has conditions: [DiskPressure]](https://learnerkb.com/wp-content/uploads/2023/03/pexels-kelly-2881154.jpg)
Summary:
Docker needs to have at least 20% unused space to make sure the disk will not suffer from any space issues. If you are using more than 80% then the node will be considered as “node has conditions: [DiskPressure]” and further it can also cause a worker or master node down situation if some important pods are evited
Before going further, Let’s discuss taints because it will be useful going further
What are taints
In Kubernetes, taints are applied to a node to repel pods from being scheduled on that node unless a particular condition is met. A taint can be added to a node to indicate that the node has a specific condition or limitation that makes it unsuitable for running certain types of workloads.
We can add taints manually to a node using the below command
kubectl taint nodes node key1=value1:NoSchedule
Let’s proceed further on the symptoms, cause, and solution for this issue
Symptoms
As mentioned in the above points, disk pressure can cause node-down scenarios (Master or worker node), In the worst case, If it is the master node, Then your complete Kubernetes cluster would be down
You will be finding the below messages in the logs, For example
Get pods will give you which pods are getting evicted and further, you can use the pod name to get the description or logs to get the exact error message
kubectl get pods
nginx 0/1 Evicted 0 5m27s
nginx1 0/1 Evicted 0 5m27s
kubectl describe pod <pod name>
Failed to admit pod nginx_default(030153de-d3d5-49c6-be03-178d49449f42) - node has conditions: [DiskPressure] Failed to admit pod nginx1_default(5208dd2a-de90-46ca-925f-45f7a62eb2e7) - node has conditions: [DiskPressure]
Once you got the above error message, We can confirm the node is marked as the disk pressure. Therefore no new pod can be scheduled.
Cause
There are 2 things that need to be considered as the cause for this issue,
Disk Space
By default, Docker will have a threshold to make sure the disk is not under pressure. In this case, it is 80% and if the disk usage is more than 80 percent, Then it will be considered as disk pressure and the node will be marked as “node has conditions: [DiskPressure]”
This can be validated with a few docker commands
# docker info | grep -i "data space" Data Space Used: 350.8 GB Data Space Total: 400 GB Data Space Available: 49.1 GB Metadata Space Used: 211.6 MB Metadata Space Total: 4.997 GB Metadata Space Available: 2.856 GB
From the above output, It is very clear that the Docker disk usage is more than 80% “Used: 350.8 GB”. Due to that the node is marked as a disk under pressure. In our case, it is the master node 🙁 and the entire Kubernetes cluster is down
Tainted Node
By default, the node controller will taint a Node when certain conditions are true. For the master, the below taints will be added by default.
node.kubernetes.io/disk-pressure: Node has disk pressure.
We can easily, confirm the taints settings by simply running the describe command on the node
kubectl describe node <node>
taints: node.kubernetes.io/disk-pressure
Due to this setting, When a node is marked as Node has disk pressure, Then no other pod can be submitted on the particular node (In this case it is the master node). When you try to submit a pod on this node, Then the pod will be evited with the below messages
nginx1 0/1 Evicted 0 5m27s
Failed to admit pod nginx_default(030153de-d3d5-49c6-be03-178d49449f42) - node has conditions: [DiskPressure]
Resolution
To resolve the issue, We can either try to clean up unused images or need to increase the disk space
Cleanup
To clean up the unused images or other objects, We can run the following docker command to release more space:
docker image prune
The above command will remove all dangling images. If you specify -a it will remove all images not referenced by any container.
docker system prune
Above command will remove all unused containers, networks, images (both dangling and unreferenced), and optionally, volumes.
You can run both the above command followed by the below info command to check if you have sufficient space
# docker info | grep -i "data space" Data Space Used: 350.8 GB Data Space Total: 400 GB
Once you have cleaned up enough disk space on the node, the DiskPressure condition should automatically be resolved, and new pods can be able to be scheduled on the node.
If yes we can able to clean up some space, Then good, We are good to proceed further, By restarting all the pods 🙂
If not, Then we need to validate the disk space
Increase disk space
If cleanup didn’t help in saving some disk space, Then it is clear that your cluster needs all these images and there are no unused images, In this case, You need to increase the disk capacity to a higher value to accommodate all the images
{Device:docker DeviceMajor:452 Capacity:410023172096 Type:devicemapper Inodes:0 HasInodes:false}
Based on the Cloudera recommendation, We need to keep at least 1TB for storing docker images, For reference click here
Conclusion
As mentioned, “node has conditions: [DiskPressure]” is the precautionary measure to make sure no pod is added to that node. This can be caused by a variety of factors, including unexpected increases in data size, or issues with pods. To resolve this issue, free up disk space on the node. Once enough disk space has been freed up, the [DiskPressure] condition should be resolved, and new pods can be scheduled.
Good Luck with your Learning !!
Related Topics:
What’s the difference between ClusterIP, NodePort, and LoadBalancer service types in Kubernetes
Learn Kubernetes Deployments and how to record changes
How to Update the Image in Kubernetes Deployment