Failover is a disaster recovery mechanism for web servers or data centers. It allows network traffic to be automatically re-routed to healthy servers during outages. This leads to maximum availability and uptime.
Reasons for downtime
There are a number of scenarios that can result in a website or application server experiencing downtime or outages. Scheduled server maintenance is one of the most non-threating ones. The real culprits include: hacker or DDOS attacks, unexpected spikes in traffic or natural disasters.
What is failover?
Failover architectures work by running continuous health checks on the web servers that are a part of the configuration. Whenever a server fails a health check, network traffic is re-routed to another backup server.
There are two types of failover based on the level they operate on. DNS failover operates at the DNS level whereas IP or cloud failover works on the network layer.
What is DNS failover?
DNS failover comes into play whenever the client initiates a DNS query. DNS servers resolve the human readable website name into a machine readable IP address. This IP address is then used by the client to access a web server.
DNS failover allows DNS queries to be re-routed among a set of servers, in the event of a server failure. Failover architectures ultimately lead to high availability and uptime.
How does DNS failover work?
Historically DNS failover has used a routing methodology called unicast routing. In unicast routing servers advertise different IP addresses. Traffic is routed to those servers based on a number of factors including geographical proximity. Traditional DNS failover architectures run continuous checks across a number of different servers providing the same service.
Users can designate the priority of different servers to handle DNS queries using the sequential mode. DNS queries are only forwarded to the servers that pass the health checks and have the highest priority. Servers can also be assigned equal priority to handle queries using the round robin mode. DNS queries are then distributed over the various servers. Any servers that fail health checks are simply taken out of circulation.
Responding to DNS queries based on geographical proximity or GeoDNS has one critical shortcoming. Client IPs originating from a specific geographical location are always served from the data center that is closest to that client, even if the client has moved during this time and that particular data center is no longer the most optimal destination for its traffic.
Using anycast for DNS failover
Anycast is a routing methodology that allows multiple servers to advertise the same IP address. Datapath.io has built an anycast DNS service that allows multiple geographically distributed DNS name servers to advertise the same IP address. AWS customers can also replicate DNS servers by code. The anycast DNS failover fabric runs continuous health checks on the servers and re-routes DNS queries automatically in case of outages or downtime.
Anycast DNS also plays an important role in failover on the IP or network level. DNS name servers respond to DNS queries with the same IP address. Additionally, multiple instances of a web server providing the same service, advertise the same IP address. Once the client receives the IP address for the web server, BGP routes those requests to the topologically nearest web server.
The DNS failover architecture from Datapath.io restricts failover times to less than 10 seconds. Most other failover solutions can take anywhere from 2 to 30 minutes for recovery.
Download the Anycast Whitepaper to learn more.