How does Fault Tolerance prevent a split brain scenario?
Administration, Disaster Recovery Add commentsI’m training all of my partner engineers this week and they always ask the toughest technical questions. Thanks to Scott Phillips for asking me this one:
What does Fault Tolerance do to prevent a split brain if both Primary and Secondary VMs become isolated?
Fault Tolerance (FT) uses an on-disk generation number file. When FT is enabled the primary VM creates a file on shared storage called generation.N where N is a counter number. The secondary VM is started and when it connects to the primary, the primary tells the secondary what the generation number is. Once the Primary or secondary detects that there is a failure in the other half of the VM pair, it will try to rename the generation.N file to generation.N+1. If the rename succeeds, the VM takes over as being the Primary (or remains the primary if it already was) and takes corrective action to rebuild a secondary and become protected again. If the rename of the generation.N file fails, that means that the other VM in the pair already renamed the file and took over and the current VM shuts down.
There you have it, the disk subsystem prevents both VM’s from becoming the primary at the same time and creating a split brain.
June 14th, 2009 at 10:32 pm
I really like your post. Does it copyright protected?