Recovering / Resyncing a distributed DRBD dual primary Split Brain – [servera] has a different data from [serverb]

Posted on November 3, 2015

Recovering / Resyncing a distributed DRBD dual primary Split Brain – [servera] has a different data from [serverb]

A client had a pair of servers running drbd in order to keep a large file system syncronized and highly available. However at some point in time the drbd failed and the two servers got out of sync and it went unnoticed for long enough, that new files were written on both ‘servera’ and on ‘serverb’.

At this point both servers believe that they are the primary, and the servers are running in what you call a ‘Split Brain’

To determine that split brain has happened you can run several commands. In our scenario we have two servers servera and serverb

servera#drbd-overview
 0:r0/0 WFConnection Primary/Unknown UpToDate/DUnknown C r----- /data ocfs2 1.8T 1001G 799G 56%
serverb#drbd-overview
 0:r0/0  StandAlone Primary/Unknown UpToDate/DUnknown r----- /data ocfs2 1.8T 1.1T 757G 58%

From the output above (color added) we can see that servera knows that it is in StandAlone mode, the server realizes that it can not connect. We can research the logs and we can find out why it things it is in StandAlone d. To do this we grep the syslog.

serverb#grep split /var/log/syslog
Nov 2 10:15:26 serverb kernel: [41853948.860147] block drbd0: helper command: /sbin/drbdadm initial-split-brain minor-0
Nov 2 10:15:26 serverb kernel: [41853948.862910] block drbd0: helper command: /sbin/drbdadm initial-split-brain minor-0 exit code 0 (0x0)
Nov 2 10:15:26 serverb kernel: [41853948.862934] block drbd0: Split-Brain detected but unresolved, dropping connection!
Nov 2 10:15:26 serverb kernel: [41853948.862950] block drbd0: helper command: /sbin/drbdadm split-brain minor-0
Nov 2 10:15:26 serverb kernel: [41853948.865829] block drbd0: helper command: /sbin/drbdadm split-brain minor-0 exit code 0 (0x0)

This set of log entries lets us know that when serverb attempted to connect to servera, it detected a situation where both file systems had been written to, so it could no longer synchronize. it made these entries and put itself into Standalone mode.

servera on the other hand says that it is waiting for a connection WFConnection.

The next step is to determine which of the two servers has the ‘master’ set of data. This set of data will sync OVER THE TOP of the other server.

In our client’s case we had to do some investigation in order to determine what differences there were on the two servers.

After some discovery we realized that in our case serverb had the most up to date information, except in the case of one directory, we simply copied that data from servera to serverb, and then serverb was ready to become our primary. In the terminology of DRBD, servera is our ‘split-brain victim’ and serverb is our ‘splitbrain survivor’ we will need to run a set of commands which

ensures the status of the victim to ‘Standalone’ (currently it is ‘WFConnection’)
umount the drive on the victim(servera) so that the filesystem is no longer accessible
sets the victim to be ‘secondary’ server, this will allow us to sync from the survivor to victim KNOWING the direction the data will go.
start the victim (servera) and let the let the ‘split brain detector’ know that it is okay to overwrite the data on the victim(servera) with the data on the survivor (serverb)
start the survivor(serverb) (if the serverb server was in WFConnection mode, it would not need to be started, however ours was in StandAlone mode so it will need to be restarted)

At first we were concerned that we would have to resync 1.2 TB of data, however we read here that

The split brain victim is not subjected to a full device synchronization. Instead, it has its local modifications rolled back, and any modifications made on the split brain survivor propagate to the victim.

The client runs a dual primary, however as we rebuild the synced pair, we need to ensure that the ‘victim’ is rebuilt from the survivor, so we move the victim from a primary, to a secondary. And it seems that we are unable to mount a drive (using our ocfs2 filesystem) while it is a secondary. So we had to ‘umount’ the drive, and we were unable to remount it while it is a secondary. In a future test (in which restoring data redundancy primary / primary is less critical), we will find out whether we are able to keep the primary/primary status while we are rebuilding from a split brain.

While the drbd-overview tool shows all of the ‘resources’ we are required to use a third parameter specifying the ‘resource’ to operate on . If you have more than one drbd resource defined you will need to identify which resource you are working with. You can look in your /etc/drbd.conf file or in your /etc/drbd.d/disk.res (your file may be named differently). The file has the form of

resource r0 {
....................
}

where r0 is your resource name, you can also see this buried in your output of drbd-overview

servera# drbd-overview
0:r0/0 WFConnection Primary/Unknown UpToDate/DUnknown C r----- /data ocfs2 1.8T 1001G 799G 56%

So we ran the following commands on servera to prepare it as the victim

servera# drbd-overview #check the starting status of the victim
 0:r0/0 WFConnection Primary/Unknown UpToDate/DUnknown C r----- /data ocfs2 1.8T 1001G 799G 56%
serverb# drbd-overview #check the starting status of the survivor
 0:r0/0 StandAlone Primary/Unknown UpToDate/DUnknown r----- /data ocfs2 1.8T 1.1T 760G 58%

From this above we can see that serverb has 58% usage and 760GB free, were server a has 56% usage and 799GB free.
Based on what I know about the difference between servera and serverb, this helps me to confirm that serverb has more data and is the ‘survivor’

servera# drbdadm disconnect r0 # 1. ensures the victim is standalone
servera# drbd-overview #confirm it is now StandAlone 
 0:r0/0 StandAlone Primary/Unknown UpToDate/DUnknown r----- /data ocfs2 1.8T 1001G 799G 56%
servera# umount /data # 2. we can not mount the secondary drive with read write
servera# drbdadm secondary r0 # 3. ensures the victim is the secondary
servera# drbd-overview #confirm it is now secondary 
  0:r0/0 StandAlone Secondary/Unknown UpToDate/DUnknown r-----
servera# drbdadm connect --discard-my-data r0 # 4. start / connect the victim up again knowing that its data should be overwritten with a primary
servera# drbd-overview #confirm the status and that it it is now connected [WFConnection]
  0:r0/0 WFConnection Secondary/Unknown UpToDate/DUnknown C r-----

I also checked the logs to confirm the status change

servera#grep drbd /var/log/syslog|tail -4
Nov 4 05:14:03 servera kernel: [278068.555213] drbd r0: conn( StandAlone -> Unconnected )
Nov 4 05:14:03 servera kernel: [278068.555247] drbd r0: Starting receiver thread (from drbd_w_r0 [19105])
Nov 4 05:14:03 servera kernel: [278068.555331] drbd r0: receiver (re)started
Nov 4 05:14:03 servera kernel: [278068.555364] drbd r0: conn( Unconnected -> WFConnection )

Next we simply have to run this command on serverb to let it know that it can connect as the survivor (like I mentioned above, if the survivor was in WFConnection mode, it would automatically reconnect, however we were in StandAlone mode)

serverb# drbd-overview #check one more time that serverb is not yet connected
 0:r0/0 StandAlone Primary/Unknown UpToDate/DUnknown r----- /data ocfs2 1.8T 1.1T 760G 58%
serverb# drbdadm connect r0 # 5. start the surviving server to ensure that it reconnects
serverb# drbd-overview #confirm serverb and servera are communicating again
 0:r0/0 SyncSource Primary/Secondary UpToDate/Inconsistent C r----- /data ocfs2 1.8T 1.1T 760G 58%
 [>....................] sync'ed: 0.1% (477832/478292)M
servera# drbd-overview #check that servera confirms what serverb says about communicating again
 0:r0/0 SyncTarget Secondary/Primary Inconsistent/UpToDate C r-----
 [>....................] sync'ed: 0.3% (477236/478292)M

Another way to confirm that the resync started happening is to check the logs

servera# grep drbd /var/log/syslog|grep resync
Nov 4 05:18:09 servera kernel: [278314.571951] block drbd0: Began resync as SyncTarget (will sync 489771348 KB [122442837 bits set]).
serverb# grep drbd /var/log/syslog|grep resync
Nov 4 05:18:09 serverb kernel: [42008909.652451] block drbd0: Began resync as SyncSource (will sync 489771348 KB [122442837 bits set]).

Finally, we simply run a command to promote servera to be a primary again, and then both servers will be writable

servera#drbdadm primary r0
servera# drbd-overview
 0:r0/0 Connected Primary/Primary UpToDate/UpToDate C r-----
servera# mount /data #remount the data drive we unmounted previously

Now that we ‘started’ recovering from the split-brain issue we just have to watch the two servers to confirm once they have fully recovered. once that is complete we will put in place log watchers and FileSystem tests to send out a notification to the system administrator if it should happen again.