Resolve “STUCK Region-In-Transition” in Apache Hbase

What is called a “Region-In-Transition” state?

It is a state, where the region is marked as transitioning and will not serve any request. During this time, We won’t be able to access the specific regions

One of the reasons, Hbase regions are going into a Region-In-Transition state is during “Region Splitting

So, What is Region Splitting?

Hbase table data is stored as a region. One table data can be stored in multiple regions and sorted as a range of adjacent rows. Regions servers manage the data stored in the regions.

Whenever the file size grows beyond the limit “hbase.hregion.max.filesize” Regions server will start to split the region into two to make sure it is handled efficiently. (To improve parallelism)

Let’s not go into much detail about splitting and will focus on how to resolve this issue.

So whenever we had some issue during “Region Splitting”, Region will be stuck in the “Region-in-transition” state indefinitely. In this state, we are unable to access the region data which can cause critical job failure

Log Snippet: RIT

2022-08-22 14:22:02,687 WARN org.apache.hadoop.hbase.master.assignment.AssignmentManager: STUCK Region-In-Transition rit=CLOSING, location=<hostname>,16020,1626068908198, table=nm:tablename, region=g15345345347883c27c179623rghkae3dfea

To resolve the issue,

We need to assign the region manually

— Check the Hbase master WebUI for RITs (To get the region id that is marked as RIT)

— Check the procedure tab in HBase Web for any stuck procedures or you can see the below log in Regions server

2022-08-22 14:23:24,216 WARN org.apache.hadoop.hbase.master.assignment.RegionRemoteProcedureBase: Can not add remote operation pid=4535, ppid=2278, state=RUNNABLE, locked=true; org.apache.hadoop.hbase.master.assignment.CloseRegionProcedure for region {ENCODED => g15345345347883c27c179623rghkae3dfea, NAME => ‘nm:tablename,,1626068908198.g15345345347883c27c179623rghkae3dfea.', STARTKEY => '', ENDKEY => ''} to server <hostname>,16020,1626068908198, this usually because the server is alread dead, give up and mark the procedure as complete, the parent procedure will take care of this.

If you find any stuck procedures, Proceed with bypassing the procedure as below using the HBCK2 jar

hbase hbck -j <path_to_hbck2_jar> -s bypass -o -r <PID>

Once the stuck procedures are bypassed, we can manually assign the regions either via hbck2 took or HBase shell, Also, If you are not seeing any stuck procedures, Proceed directly on assigning the regions manually

hbase hbck -j <path_to_hbck2_jar> -s setRegionState g15345345347883c27c179623rghkae3dfea CLOSED

hbase hbck -j <path_to_hbck2_jar> assigns -o g15345345347883c27c179623rghkae3dfea

Or

hbase shell -> assigns g15345345347883c27c179623rghkae3dfea

A quick trick to assign multiple regions in a single command: 1. Get the “hbck details” output below

sudo -u hbase hbase hbck -details > hbckdetails.txt

2. use the below grep command to get the regions that are stuck in the region-in-transition state

grep ‘not deployed on any region server’ hbckdetails.txt | awk -F ‘hdfs =>’ ‘{print $2}’ | awk -F’/’ ‘{print $(NF)}’ | awk ‘{print $1}’| sort | uniq | sed ‘s/,$//’ > regions.txt

3. Assign all the regions using the below command

sudu -u hbase hbase hbck -j /tmp/hbase-hbck2-1.0.0.1.0.0.0-1007.jar assigns -o -i regions.txt

This helps when you see many region-in-transition states

Use Chat or forum page for any further questions or if you want to have a discussion regarding the post

Check here for the more helpful post

Good luck with your Learning, Hit like if this post helped you

Similar Posts