One of the topics I receive a lot of questions about is Datacenter Activation Coordination Mode, or DAC Mode for short. Here is an excerpt from Deploying and Managing Exchange Server 2013 High Availability that covers this topic in more detail.
Datacenter Activation Coordination (DAC) Mode is a property of DAGs that is designed to prevent split brain conditions from occurring by enabling a protocol called Datacenter Activation Coordination Protocol (DACP).
In addition, DAC Mode enables the use of three PowerShell cmdlets for site-resilience:
- Stop-DatabaseAvailabilityGroup
- Restore-DatabaseAvailabilityGroup
- Start-DatabaseAvailabilityGroup
Without those cmdlets any datacenter switchover or failover scenario involves using other combinations of Exchange and cluster management tools. These site resilience cmdlets make datacenter switchovers and failovers much easier to manage.
A split brain condition can occur in a multi-site DAG when one datacenter goes offline entirely. It can also occur in a single-site DAG in some network failure situations. Let’s take a look at an example of a multi-site failure where the benefits of DAC mode become clear.
In this example the Sydney and Melbourne datacenters each host two DAG members, with Sydney also hosting the file share witness server. To keep this example simple a single database exists in the DAG, currently active on a Sydney DAG member.
The Sydney datacenter has a power failure that takes the entire site offline. With two DAG members and the FSW offline in Sydney, and just two DAG members online in Melbourne, quorum can’t be maintained and the database goes offline.
The administrators activate the alternate file share witness in Melbourne to restore quorum, and bring the database online in Melbourne to restore service.
Eventually the datacenter in Sydney has power restored and the Sydney DAG members and file share witness come back online. However, the WAN connection remains offline, preventing the DAG members in each site from communicating with each other.
The two Sydney DAG members and file share witness have enough votes to achieve quorum, so the database is brought online in Sydney.
At this stage the problem should be apparent. Both Sydney and Melbourne have an active copy of the same database because the DAG members in each site were not able to communicate with each other. A split brain condition has occurred.
DAC and DACP prevent this behavior by requiring a DAG member to check with other DAG members before it is allowed to bring database online.
DACP exists as a bit (a 0 or 1) that is stored in memory. When DAC mode is enabled each DAG member starts up with a DACP bit of 0. Until it can communicate with a DAG member that has a DACP bit of 1, or alternatively it can communicate with every other member of the DAG, it will not attempt to activate its database copies even if it can achieve quorum with some of the DAG members.
To demonstrate this let’s go back in the example scenario above to the stage where the Sydney datacenter was coming back online again.
When DAC Mode has been configured in advance the Sydney DAG members start up with a DACP bit of 0 and are unable to communicate with the Melbourne DAG members because the WAN link is still offline.
Therefore they do not bring the database online in Sydney, preventing a split brain condition.
When the WAN connection is restored the Sydney DAG members are able to communicate with the Melbourne DAG members. Their DACP bit is set from 0 to 1 and, because they now realize that the database is already active in Melbourne, their database copies become passive copies.
For more on DAC mode and other features of database availability groups check out the Deploying and Managing Exchange Server 2013 High Availability.
Thanks Paul!
Your Exam Ref 70-345 is top notch!
Hi Paul,
We have a 12 exchange 2016 servers distributed evenly across two Datacenters DC1 and DC2. FSW is in the third site, DC3.
We’d like to understand if DAC mode is required in this scenario.
Could you please also tell, in the if the FSW will contain the information about the other servers holding the active copies.
Scenario:
Assume that there is power down in DC2 and databases are switched to DC1. DC2 came up however the network between DC2 and DC3 is restored ahead of DC2 and DC1.
In the event of No DAC-
1. Will the databases in DC2 be resumed as they could still communicate to DC3 (FSW), Resulting in Split-brain? or
2. Will FSW have the information to state DC2 servers that the DC1 servers already have the mounted copies and restrict DC2 servers from mounting – avoiding Split-brain?
Hi Paul,
Can you explain how the DAC will behave in case only the WAN link fails at the primary site and all servers remain up without restarting?
What happen when the WAN link is restored and we already manually failed over the DAG?
My understanding is that the primary site will stay active because the DACP bit will be already set to 1 and the site will maintain quorum. Manually failing over to the backup site will change the configuration only at the backup site and will not update the configuration on the primary site(which will stay isolated).
Thank you!
The Real Person!
Author Paul Cunningham acts as a real person and passed all tests against spambots. Anti-Spam by CleanTalk.
The DAG will stay active at the primary site if the primary site DAG members are able to retain quorum.
If the servers are up, but the primary site is isolated and you need to activate the secondary site for your users, you either need to make sure you take down the primary site servers first, or make sure the primary site remains isolated when the WAN connection is re-established so that you don’t suddenly have a split brain condition.
Hi Paul,
i have two site with same namespace, in a first site i have 1 CAS and 2 MBX, in the second site i have 1 CAS and 1 MBX.
What happens when the first site connection is lost?
We have a split brain?
The Real Person!
Author Paul Cunningham acts as a real person and passed all tests against spambots. Anti-Spam by CleanTalk.
Sounds like you have a DAG with 3 members. If the network connection between the two sites is lost then the DAG will be active in the site with the 2 DAG members, because that is the majority (quorum).
figured it out:
set-databaseavailabilitygroup dagname -datacenteractivationmode off
Hi Paul
I would like to decommission Exchange 2010 DAG, Datacenteractivatione mode was enabled. It doesn’t allow to me remove last two servers. How can I disable the Datacenteractivatione mode?
Thanks,
Harry
Hi Paul,
I just wanted to advise that I have success re the above issue. For one reason or other I tried the same technique as previous and it worked.
stop-clusternode
net.exe stop clussvc
net.exe start clussvc /forcequorum
Start-DatabaseAvailabilityGroup -Identity DAG -MailboxServer MBX-1
Get-MailboxDatabase -Server | Mount-Database
Hi Paul,
Your information is fantastic to read as always.
I need so advice on how to mount my database on exchange 2013. My set up is a DAG split across 2 sites Prod and DR hosting a member each with fsw and fsw alternate. I have single Database. My DR is currently turned off as its moving (so I will not bring the site up). The replication was suspended on the DR exchange which holds the passive copy of the Database before the site was switched off. The Prod server has PAM and witness share in use.
The problem started after I patched the Prod I decided to dismount the Database and rebooted the server ( may have been a mistake) after a reboot I am unable to mount the Database as it complains Error : Active Manager Operation Failed. I have tried variance of trying to force quorum and other troubleshooting technique from Google but unfortunately to no avail.
Looking very much forward to your reply hoping you can shed some light.
Hi Paul,
In your Sydney/Melbourne stretched DAG example, the alternate file share witness is activated manually in Melbourne when the Sydney site has gone offline.
If the AFSW had been specified before the the failure of the primary site – (i.e there were equal votes in both sites, FSW in Sydney and AFSW in Melbourne) what impact on failover behaviour would this have if any?
Given that 3 of the 6 votes are available when the Sydney site has failed, would the Melbourne nodes achieve quorum and activate the databases automatically, without administrator intervention?
The Real Person!
Author Paul Cunningham acts as a real person and passed all tests against spambots. Anti-Spam by CleanTalk.
Even if the Alt FSW is pre-configured, the FSW in use never fails over automatically. You need to manually switch from Primary to Alternate FSW as part of the manual switchover process, and switch it back to Primary FSW as part of the recovery process later.
If you want automatic site failover, place the FSW in a third location (as long as you can meet the connectivity requirements).
Excellent Explanation Paul..:-)
Very well explained.
Thanks Paul 🙂
Great article as always thanks Paul. But- what happens in the case where Dynamic quorum is also enabled? Take your example but remove a mail server from each DC. With dynamic quorum on -Sydney fails and Melbourne would work fine – as Dynamic quorum allows this. But what happens when Sydney comes back online and then the wan reconnects? Do the two features co-exist?
The Real Person!
Author Paul Cunningham acts as a real person and passed all tests against spambots. Anti-Spam by CleanTalk.
Dynamic Quorum is only able to adjust the quorum requirements if quorum was still maintained in the failure scenario. It works in sequential failures, eg you lose server1, then an hour later you lose server2. It *may* be able to prevent loss of quorum all the way to a “last man standing” scenario, but it isn’t guaranteed, because any failure that causes a complete loss of quorum isn’t going to be saved by DQ.
Furthermore, it is a cluster feature, not an Exchange feature, so while DQ might be able to help with quorum it has no ability to prevent split brain.
Gotcha- thanks Paul. Good explanation. Cheers
Tony
This is simply awesome.
Thanks great explanation!
Great explanation,Well explained…..
Great Post as always!!!
another great article, thanks
FYI Microsoft confirmed that turning on DAC mode can produce MSExchangeRepl errors (4133 and 4376) every 15 minutes in the Application Event Log
Annoying, but I would rather have it enabled
I appriciate your effort in writing this article, hope there is no difference as compared to the earlier version of Exchange 2010. my question is below:
1. will this works only for the muti site active-passive design or works for active-active site as well? ( if this works for active-active design then the below are my confusions)
2. while activating the DAC mode, on which DAG member it will set the DACP bit as 1, on what basis exchange decides the suitable DAG member?
3. Though i see it works fine, i still wanted to ask you this question. As per the theory while the DAC mode is enabled, one DB should not get acitvated untill it communicates with other DAG members(in otherway i assume the DAG member which has DACP value 1 located in primary site). In this case, how will i accept the secondary site DBs will get activated when it can not communicate with primary site(the same way how the DBs in primary site maintain them passive)?
Hello Eswar!
2. I understood that DACP bit is set as 1 on all of DAG servers during activating DAC mode. But when one of the servers restarts, it starts with DACP bit as 0 and it can’t set it as 1 till communicate with all others servers in the DAG.
3. In case when one site is failed, you should do a manual switchover process.
Well explained as always, thank you Paul
Very well explained.
Simple and to the point … Great!
Well Explained..as always..
Excellent Paul. You are a genius.
Great explanation Paul!