Availability refers to the amount of time a system or component is functional compared to the total time it is required or expected to function.
In the context of web services and network performance, availability refers to the amount of time web services are accessible. It is usually measured as a percentage. Availability is one of the most critical metrics for the performance of web services and applications. Ideally, web services would want to be available 100% of the time.
Effects of Outages and Downtime
Almost all public cloud providers have suffered outages and downtime at one point or another. In June 2012 Amazon Web Services suffered an 11 hours’ outage. This outage hit major content providers including Netflix, Pinterest, Heroku and Instagram. It resulted in AWS falling below its SLA’s of 99.95% for the EC2 service. Microsoft Azure and Google cloud suffered almost 44 hours of outages in 2014.
The four major cloud providers including AWS, Google cloud, Microsoft Azure and IBM softlayer suffered a combined downtime of almost 41 hours in 2015.
According to recent research, downtime costs data centers an average of $7900 per minute. These costs however, are not just limited to data centers. Web services also suffer when their services are un-reachable for even short durations of time. Users tend to lose trust in services which are frequently un-available or suffer from downtime on a regular basis.
Online service providers need to make their services more tolerant against outages of public cloud providers. One way of doing this is to provide a service which can intelligently switch between web servers based on availability and health checks.
How to ensure High Availability
Here at Datapath.io we designed an anycast architecture that guarantees highly available web services. The anycast architecture spans multiple AWS regions and conducts continuous health checks of network resources. This enables us to load balance internet traffic as well as failover in cases of outages.
Running health checks when the Anycast routes are advertised enables the system to remove or degrade the route to a failed server. Requests are then forwarded to the next closest network node. Once the server recovers and a health check establishes its recovery the route can once again be advertised and requests can be forwarded to it.
Using Failover on the AWS cloud
To understand how Datapath.io uses Anycast to avoid failovers and ensure high availability consider two VPCs at two AWS regions. Both VPCs utilize our anycast fabric to advertise the same IP address in an active/passive scenario. This means that only one VPC is active at any given time.
The anycast fabric advertises the IP address to transit providers with one caveat: the IP address for the inactive VPC is degraded. This degradation is intense enough to make any router in the public internet prefer routes originated at the active VPC. Any failure at the active VPC triggers a response from the routing fabric. It withdraws the degradation being advertised to BGP routers at the passive VPC. This results in a re-routing of traffic to the standby VPC ensuring high availability and failover. Whenever the failed VPC recovers and a health check establishes its recovery internet traffic is re-routed to it.
To learn more, you can download the Anycast Whitepaper.