Cloud factory: The synchronized optimization framework behind Azure cloud network growth
This post was co-authored by Nasser Elawaar, Partner Engineering Manager, Azure Networking.
Cloud factory is the production process sustaining a constant growth of Microsoft cloud. It is a globally distributed assembly line connecting datacenter clusters. New and existing regions are connected by a nested and closed-looped optimization of the planning and design variables. While datacenters are housed in massive warehouses, they are deployed and connected in a carefully coordinated factory operation.
Microsoft cloud has achieved global scale for greater flexibility, resilience and efficiency. Cloud scales out with feature capabilities such as, geo and zone redundant storage (GRS, ZRS), control plane and traffic engineering levers. As a result, each cloud service can withstand multiple independent failure domains.
An intended consequence of this global scale action is the network effect (defined by the Metcalf law stating the value of any network is proportional to the square of the connected nodes), which helps draw new developers and consumers to the cloud customers, simplify application deployment and service, and streamline the act of finding relevant talent.
Azure networking scale-action (scaling up, out, down) optionality opens up new collaboration opportunities and triggers breakthrough business insights. The “lift & shift” migration strategy from on-premises to Azure regions necessitates a highly responsive cloud, able to withstand rapid burst-out needs and macro-level scaling across all regions.
The cloud network stack as shown is deployed within new and existing datacenters, the metro/regional fabric employ network graph theory design principles to connect the long-haul network segments, turning the multiple warehouse size datacenters into a hyper-connected computer. The factory process applies even more to cloud networking, as a leading indicator of the next-growth spurt. The cloud networking outpaces the overall infrastructure growth by a factor of 5X. The high throughput cloud networking factory decouples the cloud supply chain, attenuates spikes in demand for cloud infrastructure and avoids choking capacity in critical locations. If unabated, capacity starvation becomes a slow-moving outage impacting overall cloud reliability.
Network traffic management, control plane policy and network design choices, are distributed over different timelines as a nested optimization problem – Azure networking has close-looped these choices to a unified framework. This all boils down to a programmatic way to optimize the use of our ecosystem including servers, network and people. A system that automates the functions of deployment, planning of the infrastructure based on set rules for managing the network to continuously improve the cloud efficiency frontier.
Azure cloud factory continues to bend the infrastructure growth curve by dampening the bullwhip effect. Azure leads the competitive landscape by meeting customer demand with a cloud infrastructure growing at a steady optimized scale. Such an efficient cloud factory is run through: SKU standardization, configuration standardization, fungibility and t-shirt sizing of network widgets. As a result of this global optimization, Azure networking can pool massive resources across IaaS (Infrastructure as a Service) virtual networks, load balancers and PaaS (Platform as a Service) technologies to offer new capabilities of Content Delivery Networks and Traffic Manager.
The factory principles of identifying bottlenecks, and scaling the network in the right places, at the right time, in the right quantity is critical to meeting the network scale and availability targets.
The collection of systems optimize our assets from server to client throughout Azure networking as follows:
- Allow the customer to define network and resource use by defining the business rules instead of having to learn the mechanics of the network operating system.
- Improve the service performance to the customer with increased visibility into the performance of the app, versus visibility into the network or the network as hindrance to business goals.
- Rapid scaling capability embedded in all systems; push to decouple the SW and HW scaling.
Azure networking has been able to double its network density in the past 12 months, a dense network lowers the transaction costs of further expansion as it requires incrementally less equipment and acquisition cost. A meaningful network density enables better end-to-end performance; as Azure cloud network is increasingly attractive for peering/interconnection point in the middle and last mile ecosystem.
In cloud computing, we’ve seen these network effects help solidify the position of massive players and now the same effect is repeating itself in the cloud networking space.
Source: Microsoft Azure News