Azure.Pillar.Reliability

March 30, 2026 ยท View on GitHub

Microsoft Azure Well-Architected Framework - Reliability pillar specific baseline.

Rules

The following rules are included within the Azure.Pillar.Reliability baseline.

This baseline includes a total of 102 rules.

NameSynopsisSeverityMaturity
Azure.ACR.GeoReplicaApplications or infrastructure relying on a container image may fail if the registry is not available at the time they start.Important-
Azure.ACR.MinSkuThe Basic SKU provides limited performance and features for production container registry workloads.Important-
Azure.ADX.SLAUse SKUs that include an SLA when configuring Azure Data Explorer (ADX) clusters.Important-
Azure.AKS.AvailabilityZoneAKS clusters deployed with virtual machine scale sets should use availability zones in supported regions for high availability.Important-
Azure.AKS.CNISubnetSizeAKS clusters using Azure CNI should use large subnets to reduce IP exhaustion issues.Important-
Azure.AKS.MaintenanceWindowConfigure customer-controlled maintenance windows for AKS clusters.Important-
Azure.AKS.MinNodeCountAKS clusters should have minimum number of system nodes for failover and updates.Important-
Azure.AKS.MinUserPoolNodesUser node pools in an AKS cluster should have a minimum number of nodes for failover and updates.Important-
Azure.AKS.PoolVersionAKS node pools should match Kubernetes control plane version.Important-
Azure.AKS.UptimeSLAAKS clusters should have Uptime SLA enabled for a financially backed SLA.Important-
Azure.AKS.VersionOlder versions of Kubernetes may have known bugs or security vulnerabilities, and may have limited support.Important-
Azure.APIM.AvailabilityZoneAPI Management instances should use availability zones in supported regions for high availability.Important-
Azure.APIM.CertificateExpiryRenew certificates used for custom domain bindings.Important-
Azure.APIM.MultiRegionEnhance service availability and resilience by deploying API Management instances across multiple regions.Important-
Azure.APIM.MultiRegionGatewayAPI Management instances should have multi-region deployment gateways enabled.Important-
Azure.AppConfig.GeoReplicaReplicate app configuration store across all points of presence for an application.Important-
Azure.AppConfig.PurgeProtectConsider purge protection for app configuration store to ensure store cannot be purged in the retention period.Important-
Azure.AppConfig.SKUApp Configuration should use a minimum size of Standard.Important-
Azure.AppGw.AvailabilityZoneApplication Gateway (App Gateway) should use availability zones in supported regions for improved resiliency.Important-
Azure.AppGw.MigrateWAFPolicyMigrate to Application Gateway WAF policy.Critical-
Azure.AppGw.MinInstanceApplication Gateways should use a minimum of two instances.Important-
Azure.AppService.AlwaysOnConfigure Always On for App Service apps.Important-
Azure.AppService.AvailabilityZoneDeploy app service plan instances using availability zones in supported regions to ensure high availability and resilience.Important-
Azure.AppService.PlanInstanceCountApp Service Plan should use a minimum number of instances for failover.Important-
Azure.AppService.WebProbeConfigure and enable instance health probes.Important-
Azure.AppService.WebProbePathConfigure a dedicated path for health probe requests.Important-
Azure.ASE.AvailabilityZoneDeploy app service environments using availability zones in supported regions to ensure high availability and resilience.Important-
Azure.AVD.ScheduleAgentUpdateDefine a windows for agent updates to minimize disruptions to users.Important-
Azure.ContainerApp.AvailabilityZoneUse Container Apps environments that are zone redundant to improve reliability.Important-
Azure.ContainerApp.HealthProbeContainer app ingress that uses HTTP should have HTTP health probes configured for liveness and readiness checks.Important-
Azure.ContainerApp.MinReplicasUse multiple replicas to remove a single point of failure.Important-
Azure.ContainerApp.StorageUse of Azure Files volume mounts to persistent storage container data.Awareness-
Azure.Cosmos.AvailabilityZoneUse zone redundant Cosmos DB accounts in supported regions to improve reliability.ImportantL1
Azure.Cosmos.ContinuousBackupEnable continuous backup on Cosmos DB accounts.Important-
Azure.Cosmos.MongoAvailabilityZoneUse zone redundant Cosmos DB vCore clusters in supported regions to improve reliability.ImportantL1
Azure.Cosmos.SLAUse a paid tier to qualify for a Service Level Agreement (SLA).Important-
Azure.DataFactory.VersionConsider migrating to DataFactory v2.Awareness-
Azure.EntraDS.MinReplicasApplications or infrastructure relying on a managed domain may fail if the domain is not available.Important-
Azure.EntraDS.SKUThe default SKU for Microsoft Entra Domain Services supports resiliency in a single region.Important-
Azure.EventHub.AvailabilityZoneUse zone redundant Event Hub namespaces in supported regions to improve reliability.ImportantL1
Azure.Firewall.AvailabilityZoneDeploy firewall instances using availability zones in supported regions to ensure high availability and resilience.Important-
Azure.FrontDoor.ProbeUse health probes to check the health of each backend.Important-
Azure.FrontDoor.ProbeMethodConfigure health probes to use HEAD requests to reduce performance overhead.Important-
Azure.FrontDoor.ProbePathConfigure a dedicated path for health probe requests.Important-
Azure.Grafana.AvailabilityZoneUse zone redundant Grafana workspaces in supported regions to improve reliability.ImportantL1
Azure.Grafana.VersionGrafana workspaces should be on Grafana version 10.Important-
Azure.KeyVault.PurgeProtectEnable Purge Protection on Key Vaults to prevent early purge of vaults and vault items.Important-
Azure.KeyVault.SoftDeleteEnable Soft Delete on Key Vaults to protect vaults and vault items from accidental deletion.Important-
Azure.LB.AvailabilityZoneLoad balancers deployed with Standard SKU should be zone-redundant for high availability.Important-
Azure.LB.ProbeUse a specific probe for web protocols.Important-
Azure.LB.StandardSKULoad balancers should be deployed with Standard SKU for production workloads.Important-
Azure.Log.ReplicationLog Analytics workspaces should have workspace replication enabled to improve service availability.Important-
Azure.MariaDB.GeoRedundantBackupAzure Database for MariaDB should store backups in a geo-redundant storage.Important-
Azure.MICassandra.AvailabilityZoneUse zone redundant Managed Instance for Apache Cassandra clusters in supported regions to improve reliability.ImportantL1
Azure.Monitor.ServiceHealthConfigure Service Health alerts to notify administrators.Important-
Azure.MySQL.GeoRedundantBackupAzure Database for MySQL should store backups in a geo-redundant storage.Important-
Azure.MySQL.MaintenanceWindowConfigure a customer-controlled maintenance window for Azure Database for MySQL servers.Important-
Azure.MySQL.UseFlexibleUse Azure Database for MySQL Flexible Server deployment model.Important-
Azure.MySQL.ZoneRedundantHADeploy Azure Database for MySQL servers using zone-redundant high availability (HA) in supported regions to ensure high availability and resilience.Important-
Azure.NIC.UniqueDnsNetwork interfaces (NICs) should inherit DNS from virtual networks.Awareness-
Azure.NSG.DenyAllInboundWhen all inbound traffic is denied, some functions that affect the reliability of your service may not work as expected.Important-
Azure.PostgreSQL.GeoRedundantBackupAzure Database for PostgreSQL should store backups in a geo-redundant storage.Important-
Azure.PostgreSQL.MaintenanceWindowConfigure a customer-controlled maintenance window for Azure Database for PostgreSQL servers.Important-
Azure.PostgreSQL.ZoneRedundantHADeploy Azure Database for PostgreSQL servers using zone-redundant high availability (HA) in supported regions to ensure high availability and resilience.Important-
Azure.PublicIP.AvailabilityZonePublic IP addresses deployed with Standard SKU should use availability zones in supported regions for high availability.Important-
Azure.PublicIP.StandardSKUThe basic SKU is being retired on 30 September 2025, and does not include several reliability and security features.Important-
Azure.Redis.AvailabilityZonePremium Redis cache should be deployed with availability zones for high availability.Important-
Azure.Redis.VersionAzure Cache for Redis should use the latest supported version of Redis.Important-
Azure.RedisEnterprise.ZonesEnterprise Redis cache should be zone-redundant for high availability.Important-
Azure.RSV.ReplicationAlertRecovery Services Vaults (RSV) without replication alerts configured may be at risk.Important-
Azure.RSV.StorageTypeRecovery Services Vaults (RSV) not using geo-replicated storage (GRS) may be at risk.Important-
Azure.Search.IndexSLAUse a minimum of 3 replicas to receive an SLA for query and index updates.Important-
Azure.Search.QuerySLAUse a minimum of 2 replicas to receive an SLA for index queries.Important-
Azure.SignalR.SLAUse SKUs that include an SLA when configuring SignalR Services.Important-
Azure.SQL.MaintenanceWindowConfigure a customer-controlled maintenance window for Azure SQL databases.Important-
Azure.SQLMI.MaintenanceWindowConfigure a customer-controlled maintenance window for Azure SQL Managed Instances.Important-
Azure.Storage.ContainerSoftDeleteEnable container soft delete on Storage Accounts.Important-
Azure.Storage.FileShareSoftDeleteEnable soft delete on Storage Accounts file shares.Important-
Azure.Storage.SoftDeleteEnable blob soft delete on Storage Accounts.Important-
Azure.Storage.UseReplicationStorage Accounts using the LRS SKU are only replicated within a single zone.Important-
Azure.Template.LocationDefaultSet the default value for the location parameter within an ARM template to resource group location.Awareness-
Azure.TrafficManager.EndpointsTraffic Manager should use at lest two enabled endpoints.Important-
Azure.VM.ASAlignmentUse availability sets aligned with managed disks fault domains.Important-
Azure.VM.ASDistributeTrafficEnsure high availability by distributing traffic among members in an availability set.Important-
Azure.VM.ASMinMembersAvailability sets should be deployed with at least two virtual machines (VMs).Important-
Azure.VM.BasicSkuVirtual machines (VMs) should not use Basic sizes.Important-
Azure.VM.MaintenanceConfigUse a maintenance configuration for virtual machines.Important-
Azure.VM.StandaloneSingle instance VMs are a single point of failure, however reliability can be improved by using premium storage.Important-
Azure.VMSS.AutoInstanceRepairsApplications or infrastructure relying on a virtual machine scale sets may fail if VM instances are unhealthy.Important-
Azure.VMSS.AvailabilityZoneDeploy virtual machine scale set instances using availability zones in supported regions to ensure high availability and resilience.Important-
Azure.VMSS.ZoneBalanceDeploy virtual machine scale set instances using the best-effort zone balance in supported regions.Important-
Azure.VNET.BastionSubnetVNETs with a GatewaySubnet should have an AzureBastionSubnet to allow for out of band remote access to VMs.Important-
Azure.VNET.FirewallSubnetNATZonal-deployed Azure Firewalls should consider using an Azure NAT Gateway for outbound access.Awareness-
Azure.VNET.LocalDNSVirtual networks (VNETs) should use DNS servers deployed within the same Azure region.Important-
Azure.VNET.SingleDNSVirtual networks (VNETs) should have at least two DNS servers assigned.Important-
Azure.VNG.ERAvailabilityZoneSKUUse availability zone SKU for virtual network gateways deployed with ExpressRoute gateway type.Important-
Azure.VNG.ERLegacySKUMigrate from legacy SKUs to improve reliability and performance of ExpressRoute (ER) gateways.Critical-
Azure.VNG.MaintenanceConfigUse a customer-controlled maintenance configuration for virtual network gateways.Important-
Azure.VNG.VPNActiveActiveUse VPN gateways configured to operate in an Active-Active configuration to reduce connectivity downtime.Important-
Azure.VNG.VPNAvailabilityZoneSKUUse availability zone SKU for virtual network gateways deployed with VPN gateway type.Important-
Azure.VNG.VPNLegacySKUMigrate from legacy SKUs to improve reliability and performance of VPN gateways.Critical-
Azure.WebPubSub.SLAUse SKUs that include an SLA when configuring Web PubSub Services.Important-