Resolve “STUCK Region-In-Transition” in Apache Hbase
What is called a “Region-In-Transition” state?
It is a state, where the region is marked as transitioning and will not serve any request. During this time, We won’t be able to access the specific regions
One of the reasons, Hbase regions are going into a Region-In-Transition state is during “Region Splitting“
So, What is Region Splitting?
Hbase table data is stored as a region. One table data can be stored in multiple regions and sorted as a range of adjacent rows. Regions servers manage the data stored in the regions.
Whenever the file size grows beyond the limit “hbase.hregion.max.filesize” Regions server will start to split the region into two to make sure it is handled efficiently. (To improve parallelism)
Let’s not go into much detail about splitting and will focus on how to resolve this issue.
So whenever we had some issue during “Region Splitting”, Region will be stuck in the “Region-in-transition” state indefinitely. In this state, we are unable to access the region data which can cause critical job failure
Log Snippet: RIT
2022-08-22 14:22:02,687 WARN org.apache.hadoop.hbase.master.assignment.AssignmentManager: STUCK Region-In-Transition rit=CLOSING, location=<hostname>,16020,1626068908198, table=nm:tablename, region=g15345345347883c27c179623rghkae3dfea
To resolve the issue,
We need to assign the region manually
— Check the Hbase master WebUI for RITs (To get the region id that is marked as RIT)
— Check the procedure tab in HBase Web for any stuck procedures or you can see the below log in Regions server
2022-08-22 14:23:24,216 WARN org.apache.hadoop.hbase.master.assignment.RegionRemoteProcedureBase: Can not add remote operation pid=4535, ppid=2278, state=RUNNABLE, locked=true; org.apache.hadoop.hbase.master.assignment.CloseRegionProcedure for region {ENCODED => g15345345347883c27c179623rghkae3dfea, NAME => ‘nm:tablename,,1626068908198.g15345345347883c27c179623rghkae3dfea.', STARTKEY => '', ENDKEY => ''} to server <hostname>,16020,1626068908198, this usually because the server is alread dead, give up and mark the procedure as complete, the parent procedure will take care of this.
If you find any stuck procedures, Proceed with bypassing the procedure as below using the HBCK2 jar
hbase hbck -j <path_to_hbck2_jar> -s bypass -o -r <PID>
Once the stuck procedures are bypassed, we can manually assign the regions either via hbck2 took or HBase shell, Also, If you are not seeing any stuck procedures, Proceed directly on assigning the regions manually
hbase hbck -j <path_to_hbck2_jar> -s setRegionState g15345345347883c27c179623rghkae3dfea CLOSED
hbase hbck -j <path_to_hbck2_jar> assigns -o g15345345347883c27c179623rghkae3dfea
Or
hbase shell -> assigns g15345345347883c27c179623rghkae3dfea
A quick trick to assign multiple regions in a single command: 1. Get the “hbck details” output below
sudo -u hbase hbase hbck -details > hbckdetails.txt
2. use the below grep command to get the regions that are stuck in the region-in-transition state
grep ‘not deployed on any region server’ hbckdetails.txt | awk -F ‘hdfs =>’ ‘{print $2}’ | awk -F’/’ ‘{print $(NF)}’ | awk ‘{print $1}’| sort | uniq | sed ‘s/,$//’ > regions.txt
3. Assign all the regions using the below command
sudu -u hbase hbase hbck -j /tmp/hbase-hbck2-1.0.0.1.0.0.0-1007.jar assigns -o -i regions.txt
This helps when you see many region-in-transition states
Use Chat or forum page for any further questions or if you want to have a discussion regarding the post
Check here for the more helpful post
Good luck with your Learning, Hit like if this post helped you