Basic IaaS Resiliency – Availability Sets

Whether we like it or not, Virtual Machines are almost certainly the most common instrument running your applications and workloads. And because we run our mission-critical, production applications on Virtual Machines, it’s important to think about resiliency, availability, and disaster recovery.

There are multiple ways to build resiliency in an IaaS solution on Azure, and one way is through the use of Availability Sets. An Availability Set allows you to take a group of machines and spread them out across Fault Domains and Update Domains. But wait…what are those?

Think of a traditional datacenter with rows of racks, and in each rack there’s multiple servers and networking gear. In this example, each rack would represent an Azure Fault Domain (FD). It has its own power, network, and compute. Separating workloads across multiple FDs (or racks) adds a level of resiliency should a power or network failure bring an entire FD down.

The Update Domains would be a logical grouping of physical servers that get restarted together during planned maintenance. Imagine you have 10 servers in each of 3 racks and they all need to be patched and restarted. You wouldn’t (or shouldn’t) patch and reboot them all at the same time, and you wouldn’t patch and restart all 10 servers in a single rack at the same time. You’d want to stagger the patch/reboots. Maybe 2 servers per rack at a time.

Deploy and Use Availability Sets

Availability Sets (AS) should be created before you build your servers. This is because you can only add a server to an AS at the time you build the server. So start by creating an Availability Set in Azure:

Be sure to set both the Fault and Update Domains to values that make sense for your deployment, as you can’t update these after creation. In the above image, the defaults are 2 FD’s and 2 UD’s, which I’ve left alone. If you intend to use managed disks for your servers (and you should), then you’ll want to ensure the Use Managed Disks option is set to “Yes”. This ensures that all disks for a VM reside in the same FD as the VM.

With the Availability Set created, the next step is to populate it with servers. This is what it looks like in the Portal to pick the AS during server build:

Repeat this for two or more servers to get the full benefit of an Availability Set. In the example below, I’ve built three servers in the AS.

You can see which Fault and Update Domain each server lives in. A different way to view how these three servers are dispersed is the graphic below:

If a failure causes Fault Domain 0 to go down or lose connectivity completely, we’ll have SERVER03 able to still process incoming requests. Likewise, if everything in Update Domain 0 was restarted due to physical server patching, SERVER01 and SERVER03 would still be available to process incoming requests.

To see a quick demo, click here to watch a short YouTube video I made. I’m trying out video editing (more on this in another post).

Notes and Additional Reading

Availability Sets are one of many possible ways to build resiliency into a solution. It’s does not make sense in all cases, so it’s important to read about them from Microsoft’s official documentation and plan early if you decide to use them¹.

Availability Sets do not cost anything to use by themselves (you’ll still incur compute costs for the VM’s themselves, but nothing additional for using Availability Sets).
Availability Sets are good for Azure Regions which do not support Availability Zones (West US, as an example).
Microsoft recommends leveraging Availability Zones where available instead of Availability Sets (and if you’re eagle eyed, you’ll notice I built my example AS in West US 2, which supports AZ’s)
Microsoft recommends the use of Load Balancers in front of workloads in an Availability Set².
For maximum resiliency, if your application is built in an N-Tier fashion, each tier would be grouped in its own Availability Set.

https://learn.microsoft.com/en-us/azure/virtual-machines/availability-set-overview ↩︎
https://learn.microsoft.com/en-us/azure/architecture/checklist/resiliency-per-service#virtual-machines ↩︎

Basic IaaS Resiliency – Availability Sets

One thought on “Basic IaaS Resiliency – Availability Sets”

Leave a Reply Cancel reply