What is meant by “Tablet server is quiescing” in Apache Kudu
“Tablet server is quiescing” in a state When the Kudu Tablet server is down or not responding and the Kudu Master is trying to move all the tablet leaders out of that server that went down or which is not responding, This is to make sure all tablets have an active leader
Symptom:
While in the quiescing state, the Tablet server will stop hosting the tablets and stop responding to any request and it will continue to be in this stage until it has no tablet hosted on the server and no active scan.
During this time, We will be seeing failure while accessing the Kudu table (which has a leader replica in the node, which is going down or not responding)
Error while executing a query in Impala: [HY000] : Runtime Error: Query 1342b2933423417c19:4nrs33e00000000: 99% Complete (371 out of 447)\nUnable to open scanner for node with id '87' for Kudu table 'test::table': Timed out: exceeded configured scan timeout of 180.000s: after 91 scan attempts: Scan RPC to <hostname>:<port> timed out after -5.061s (SENT): Service unavailable: Tablet server is quiescing
Failure:
But there are chances if you have one replica and the server hosting that replica went down. This will put the node in to “quiescing state” forever, As it has only one replica it will not be moved to the active tablet server
Resolution:
We can try rebalancing if the host is active.
Example:
sudo -u kudu kudu cluster rebalance master-01,master-02,master-03
Worst case, If the server is unrecoverable and if we have only one replica, Then there are chances of losing the tablet
Good Luck with your learning. Don’t forget to like this post if it is useful