Resolve “Slow BlockReceiver write packet to mirror”
“Slow BlockReceiver write packet to mirror” error usually indicates an unhealthy node in your Hadoop cluster or if the Node load is high which can result in poor performance. In this article, We will see, How to identify the issue and the possible way to resolve it.

Symptoms:
This will usually cause performance issues, Jobs/clients will take a long time to complete, and All the requests from the client will be delayed while reading/writing requests from/to the HDFS
You will be seeing the below message in the Datanode logs
Datanode logs:
WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Slow BlockReceiver write packet to mirror took 4467ms (threshold=300ms), downstream DNs=[10.34.54.1:1004], blockId=235234234
WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Slow BlockReceiver write packet to mirror took 3585ms (threshold=300ms), downstream DNs=[10.34.54.1:1004], blockId=235234234
WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Slow BlockReceiver write packet to mirror took 5468ms (threshold=300ms), downstream DNs=[10.34.54.1:1004], blockId=235234234
WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Slow BlockReceiver write packet to mirror took 12849ms (threshold=300ms), downstream DNs=[10.34.54.1:1004], blockId=235234234
Above warning messages shows that the write request is taking more than the threshold value of 300 milliseconds
Also, if you are seeing slowness in the Yarn application, you will be seeing the below messages in the application logs
Client Logs:
WARN [ResponseProcessor for block BP-<pool_id>:blk_<block_id>] org.apache.hadoop.hdfs.DFSClient: Slow ReadProcessor read fields took 342485ms (threshold=30000ms);
Disk/Network Slowness
To give a quick tip, You can search for the below messages in the DataNode logs, and all of them are related to Disk/Network issues and the debug process would be the same
Slow flushOrSync took | If you are seeing this message it indicates that there was a delay in writing the block to the disk or OS cache |
Slow BlockReceiver write data to disk cost | Disk/Network issue |
Slow manageWriterOsCache took | Delay in writing the block to the OS cache |
Slow BlockReceiver write packet to mirror took | Disk/Network issue |
Slow PacketResponder send ack to upstream took | Disk/Network issue |
Above messages indicates that there is some issue with the read/write request to the hdfs/disk, Which can be due to disk I/O or network issues. The default threshold (300 milliseconds) if it takes more than that is considered a performance issue
Resolution:
Resolution for this kind of issue is very hard to pinpoint as it deals with complete node health like Disk I/O, System load, Network issues
Below are some of the points, I used to validate this issue
— If you are seeing the above WARNING messages on a single Datanode, Then we can stop the DN to get temporary relief and proceed with troubleshooting the node, or else we need to validate all the nodes having these messages
— To investigate node resource usage
- command: “top -c” helps to check the Load average [To check the CPU utilization]
- command “free -g” [To check the Memory usage]
- df -h [to check the disk usage]
- Check for “sar” report [to check the history]
— To investigate on the network end, it would be helpful to start with the below commands
- ifconfig -a [Check for packet drop by executing this multiple times].
- netstat -s | grep -i retrans [Check for retransmitted packets]
— To investigate on I/O end, it would be helpful to start with the below commands
- iostat [Check for high iowait percentage]
- dmesg [This log helps us to understand if there are any issues with the disk ]
- Check for disk health [by running a command like
df -h
(Linux) orwmic logicaldisk get size,freespace,caption
]
Conclusion
In Conclusion, “Slow BlockReceiver write packet to mirror” error message shows that there is some issue with the node related to network or IO. If you are seeing these messages, we need to understand there is no issue with the application and we need to debug at the node level. Use the shared command to validate if there are any issues in the node level
Good Luck with your Learning!!
Related Topics:
How to Recover Standby Namenode (Bootstrap Standby Namenode)
How to find and delete files older than X days in HDFS