How to Recover Standby Namenode (Bootstrap Standby Namenode)

Recover Standby Namenode (Bootstrap Standby Namenode)

– There are scenarios, Where we can’t able to bring back the standby Name node due to Disk crash, OS failure, or data corruption.

– In this kind of situation, We need to sync the Active NN meta information to standby NN faster. As we are in the single point of failure (Only one NN is active)

In Active Namenode: (which is up and running)

1) Bring namenode to safemode (on Active NameNode)

hdfs dfsadmin -safemode enter

2) Do savenamespace (on good NameNode)

hdfs dfsadmin -saveNamespace

3) In Standby Namenode: Go to the “NameNode directories” and move the “current” directory to /tmp or any other backup location. (Make sure the backup is taken properly)

4) Bootstrap the standby NN by running the below command

hdfs namenode -bootstrapStandby

5) Exit out of Safemode (on good NameNode)

hdfs dfsadmin -safemode leave

6) Start Standby Namenode

– In case, You are not allowed to take a downtime for the cluster (NOTE: Taking Namenode to Safemode will make your cluster read-only)

– We can just copy the fsimage to the standby NN. But there is the downside to doing these, As there are chances the edits are huge and cause a timeout

1) Copy the latest files from the Active “NameNode directories”

VERSION
fsimage_<version>
fsimage_<version>.md5

2) In Standby Namenode: “cd” into the “NameNode directories” and move the content inside the “current” directory to /tmp or any other backup location. (Make sure the backup is taken properly)

3) Paste the “fsimage,fsimage.md5,VERSION files” from active NN to standby NN. Make sure the permission and ownership match the active NN “hdfs hdfs”

4) Start Standby Namenode

Check here on how to recover the Journal node during host failure

Good luck with your Learning, Give us a like if this blog is helpful

Similar Posts