Active Directory Replication Failures: Causes, Diagnostics, and Remediation - ToddMaxey/Technical-Documentation GitHub Wiki
Active Directory (AD) uses a multi-master replication model to synchronize directory data among all domain controllers (DCs). When replication fails, directory objects (users, groups, passwords, GPOs, etc.) become inconsistent across DCs, leading to login failures, policy errors, and application issues【1】. Administrators often detect these failures when diagnostic tools report errors or when newly created/modified AD objects do not appear on all DCs in the site. For example, the repadmin /showrepl
command will list failed replication attempts【2】. Symptoms include error events in the Directory Service log (e.g. Event IDs 1311, 1925, 2087, 2042) and user complaints of group membership or login problems.
Replication depends on several prerequisites【1】【2】:
- Network Connectivity and RPC: All DCs must be reachable over the network. AD replication uses RPC over TCP (with an RPC Endpoint Mapper on port 135) and dynamic high-numbered ports. Firewalls or network failures blocking these ports will stop replication【5】.
- DNS and Name Resolution: Each DC must be able to resolve its replication partners’ DNS names. Active Directory relies on DNS SRV and A records for DC location. If a DC’s GUID-based DNS name cannot be resolved, replication fails (e.g. Event ID 2087 is logged when a DC’s GUID DNS name cannot be resolved【6】).
- Time Synchronization: All DCs must have closely synchronized clocks (typically within 5 minutes) so that Kerberos authentication succeeds. Excessive time skew will break secure replication channels【2】.
- Authentication/Permissions: DC computer accounts and trusts must be valid. Replication occurs over secure channels: if there is a broken trust or bad account password, you will see “Access is denied” errors in tools like Repadmin【7】.
- Directory Database Health: The AD DS database must be intact and have sufficient resources (CPU, memory, disk). If the database is corrupted or cannot process changes in time, replication can fail【2】.
- Replication Topology (Sites and Links): The AD Sites and Services configuration must correctly map to the physical network. Intersite replication requires properly configured site links. If DCs lack the necessary site links or if the replication topology is misconfigured, some links never form and replication for those domain partitions will fail【5】.
When replication breaks, the culprit is usually one of a few common issues【1】【5】:
- Network and Firewall Issues: A broken or misconfigured network is the most frequent cause. If LAN/WAN connectivity is down or if firewalls block RPC/135, the DCs cannot communicate【5】. Always verify IP connectivity (ping) and ensure TCP 135 (RPC Endpoint Mapper) and the dynamic RPC range are allowed.
-
DNS Misconfiguration: DNS problems are a very common cause. For example, a missing or stale DNS SRV record (such as
_ldap._tcp.dc._msdcs.contoso.com
) for a domain controller means other DCs cannot find it【5】【6】. If a DC’s DNS settings point to the wrong server or if DNS zones are not updated, replication partners will not be located (Event ID 2087 is logged on the destination DC when it cannot resolve its partner’s name【6】). Ensure each DC’s A (host) and service (SRV) records are registered and correct【7】【6】. - Authentication and Secure Channel Issues: Failed or broken secure channels cause “Access is denied” errors【7】. Common scenarios include: computer account password mismatches, invalid trusts, or excessive security policy restrictions. For example, error code 5 (“Access is denied”) in Repadmin output indicates an authentication problem【7】.
- Time Synchronization Errors: Kerberos requires that the time on all DCs (especially the PDC emulator) is consistent within a few minutes. A clock skew beyond the configured maximum (default 5 minutes) will abort authentication and replication (often logged as 0xc000007a or 0xc000133 errors)【2】.
- Replication Topology or Scheduling Issues: Intersite replication happens on a schedule. If the replication schedule is too infrequent or the volume of changes is large, the update queue can back up. In extreme cases, the replication queue may exceed the tombstone lifetime (causing objects deleted on one DC to be permanently removed before another DC ever receives them)【5】. Also, if a site link is missing or disabled, DCs in different sites may have “No inbound neighbors” in Repadmin, indicating topology problems【7】.
- Lingering Objects and Tombstone Lifetime: If a DC has been offline for longer than the AD tombstone lifetime (default 180 days), objects deleted elsewhere will have been garbage-collected, making those objects “lingering” on the offline DC. When the DC comes back online, replication of those objects fails and Event ID 2042 is logged【7】.
- Schema or Configuration Mismatches: If two DCs have inconsistent schema versions or domain functional levels, replication of schema partitions will fail (error 8418 in Repadmin often indicates a schema mismatch)【3】.
- DC Offline or Decommissioned: Sometimes a DC was intentionally taken offline (e.g. a staged DC in a branch site). Until its metadata is cleaned up, other DCs may report replication errors referring to that DC【5】.
- Hardware or System Problems: Disk full, memory exhaustion, or faulty hardware on a DC can interrupt the replication engine. Always check the Directory Service event log for database-related errors.
To diagnose replication problems, use a systematic approach and Microsoft tools【1】【5】:
- Check Event Logs: Review the Directory Service event log on each DC for replication-related errors (IDs 1311, 1566, 1793, 1844, 1925, 2087, 2088, 2042, etc.) and any hints in the DNS Server log. For example, Event 2042 indicates tombstone issues, and 2087 indicates a DNS lookup failure【7】【6】. The Directory Service log often suggests specific fixes in its description.
-
Repadmin Tool: Run
repadmin /showrepl
to list replication status for each naming context on a DC【2】. This shows inbound neighbor failures or error codes (5, 49, 8406, etc.). Userepadmin /replsum
to summarize replication health across the forest. For example, “No inbound neighbors” inrepadmin /showrepl
output means no connection objects exist【7】. Detailed messages like “Access is denied” or “Cannot open LDAP connection” help pinpoint the issue【7】. -
Dcdiag Tool: On each DC, run
dcdiag /test:replications
anddcdiag /test:DNS
(andnltest /dsregdns
) to check service availability, secure channel validity, and DNS registration【21】【25】. TheDNS
test verifies that each DC’s A record and required service records (SRV) exist in DNS【25】. If DNS tests fail, fix the DNS entries (for example, runipconfig /registerdns
or restart the Netlogon service to reregister records). -
Directory Service Diagnosis: In Active Directory Sites and Services or via PowerShell (
Get-ADReplicationPartnerMetadata
), verify that the replication topology is correct and connections exist between sites. In Sites and Services, ensure DCs are assigned to the correct site and have site links configured. - Microsoft Support Tools: Consider using the Microsoft Support and Recovery Assistant for Active Directory (a download from Microsoft) which can run tests and provide guided fixes【1】. This GUI tool can rapidly identify common issues.
-
PowerShell and Other Tools: Use
Get-ADReplicationFailure
,Get-ADReplicationPartnerMetadata
, andTest-ComputerSecureChannel
(PowerShell) to check for replication failures and secure channel health. Network troubleshooting (ping, telnet on port 135, etc.) can verify connectivity.
Remediation depends on the root cause found:
-
Fix Network Connectivity: If firewalls or routers are blocking RPC, configure them to allow TCP 135 and the RPC dynamic port range (by default 49152–65535 on newer Windows). Ensure DCs can reach each other by IP or name. A simple test is
nc -z <DCname> 135
(ortelnet
) andping
. If a domain controller was on an isolated network or permanently offline, use metadata cleanup to remove it from AD【21】. -
Correct DNS Issues: Ensure each DC is pointing to a valid DNS server (preferably another DC) and that the DC’s own A and SRV records exist and are correct【25】【21】. For example, if Event 2087 occurred, confirm the problematic GUID CNAME record and host A record exist on the DNS servers used by the DC【19】. Use
dcdiag /fix
andnltest /dsregdns
or manually add SRV records if necessary【21】. If a DC was reinstalled, ensure the old DNS records are removed. Once DNS is fixed, force replication (repadmin /syncall
) or reboot the DCs so they update their records. -
Reset Secure Channels and Credentials: If Repadmin shows “Access is denied” (error 5) due to a password mismatch or broken trust, reset the machine account password. For example, run
netdom resetpwd /server:<remoteDC> /userd:DOMAIN\Administrator /passwordd:*
on one DC to reset the trust with another【26】. Alternatively, usenltest /sc_verify
andnltest /sc_reset
or PowerShellReset-ComputerMachinePassword
. Check that the Enterprise Domain Controllers group has the “Access this computer from the network” right on each DC. -
Synchronize Time: Ensure the PDC emulator holds the authoritative time source and other DCs synchronize to it (use
w32tm /query /status
). If time skew is the problem, correcting the time (and ensuring the Windows Time service is running) will restore replication【13】. - Correct Topology and Schedules: If replication topology is incorrect, manually create missing site link connections or adjust schedules in AD Sites and Services. For domain controllers in the same site, ensure they have connection objects. If the replication queue is overwhelmed, consider splitting sites or adjusting schedules. If the queue has exceeded the tombstone lifetime (Event 2042), you may need to rebuild one of the DCs (for example, demote and re-promote it from a recent backup) because lingering objects cannot be removed otherwise【7】.
-
Clean up Lingering Objects: If tombstone problems are suspected, run
repadmin /removelingeringobjects
or perform an authoritative restore of deleted objects (if appropriate). - Demote/Reinstall DC (if necessary): As a last resort, remove Active Directory from the troubled server and reinstall it. Microsoft recommends this if an issue cannot be resolved by normal means【5】. You can use normal demotion (dcpromo), or if that fails, perform a forced removal in Directory Services Restore Mode and then clean up metadata【5】. After a rebuild, replicate and verify that the AD database is consistent.
-
Monitor Replication Health: Regularly run
repadmin /replsum
or equivalent to check the status, and review Directory Service event logs daily. Promptly address any warnings or errors. - Maintain DNS Health: Ensure DNS servers are properly configured and that DCs register their records. Consider using the “dcdiag /test:DNS” test during maintenance.
- Keep Clocks in Sync: Configure a reliable NTP hierarchy (the forest PDC emulator should use a trusted external time source) to avoid Kerberos-related replication failures.
- Document and Verify Topology: Keep the AD Sites and Services topology in sync with the physical network. Whenever adding a new site or DC, verify that site links and subnets are configured so that replication paths are possible.
- Plan for Tombstone Lifetime: Avoid scenarios where a DC is offline longer than the tombstone lifetime. If a DC will be unavailable for an extended period (e.g. shipped to another location), plan to disable it in AD or demote it first.
- Use Support Tools: Use Microsoft’s Support and Recovery Assistant and official documentation (such as the references below) to guide troubleshooting.
By understanding the dependencies and common failure modes, and by using Microsoft’s diagnostic tools and guidance, most Active Directory replication issues can be identified and resolved before they cause major disruptions【1】【5】. Consistent monitoring and prompt remediation ensure that the AD forest remains healthy and synchronized.
[1] Microsoft, “Troubleshooting Active Directory Replication Problems,” Microsoft Learn (2024). Available at: https://learn.microsoft.com/windows-server/identity/ad-ds/manage/troubleshoot/troubleshooting-active-directory-replication-problems
[2] Microsoft, “Diagnose Active Directory replication failures,” Microsoft Learn (2025). Available at: https://learn.microsoft.com/windows-server/active-directory/active-directory-diagnose-replication-failures
[3] Microsoft, “Active Directory replication troubleshooting guidance,” Microsoft Learn (2025). Available at: https://learn.microsoft.com/troubleshoot/windows-server/active-directory/troubleshoot-adreplication-guidance
[4] Microsoft, “How to troubleshoot Active Directory replication error 5 in Windows Server: Access is denied,” Microsoft Learn (2025). Available at: https://learn.microsoft.com/troubleshoot/windows-server/active-directory/replications-fail-with-error-5
[5] Microsoft, “Troubleshoot common Active Directory replication errors,” Microsoft Learn (2025). Available at: https://learn.microsoft.com/troubleshoot/windows-server/active-directory/common-active-directory-replication-errors
[6] Microsoft, “Active Directory replication Event ID 2087: DNS lookup failure caused replication to fail,” Microsoft Learn (2025). Available at: https://learn.microsoft.com/troubleshoot/windows-server/active-directory/active-directory-replication-event-id-2087
[7] Microsoft, “Verify DNS functionality to support directory replication,” Microsoft Learn (2021). Available at: https://learn.microsoft.com/windows-server/identity/ad-ds/manage/troubleshoot/verify-dns-functionality-to-support-directory-replication