High Availability Architecture

High availability networks are complex and costly systems to rollout with long planning timeframes  and high expectations from users that the system will perform. To choose the correct hardware and software and the optimum network design IT administrators and their staff invest a considerable amount of time in analysis of corporate needs. Network architecture that is rated high availability needs to perform to very strict standards of uptime.

Redundancy rather than being a stated goal becomes a necessary component of high availability networks, inherent in the design architecture. From RAID, redundant networking, to spare hot servers, high availability requires them all. From the use to the data and back again, almost every connection will be duplicated at least once, although in mission critical networks that require 100% uptime this may increase to 4, 6, or 8 duplications.

High availability networks are measured on the percentage of time they are online doing their assigned task, mission critical networks needing 24 hour uptime 365 days a year cannot fail, and uptimes are measured in the high 99th percentile. To put this in perspective, the architecture needs to be designed with goals of just one hour of downtime per year (99.99%), or just 5 minutes of downtime per year (99.999%)

High Availability Architecture Disaster Recovery Architecture

Levels of uptime in the high 99th percentile is only achieved thru developing architecture that allows part of the system to be brought down without affecting running processes and user access. Scheduled maintenance of primary servers can then be carried out over time periods stretching into several hours without loss of productivity.

The architecture of high availability networks is generally designed with two purposes in mind, local high availability, typically for networks in smaller corporations, and geographically distributed high availability networks that are more typically rolled out within very large corporations and where disaster recovery is an important design consideration.

A number of node configurations are possible in high availability architectures, and no single design can be described as best since each installation is always different requiring specific design features over others. The service level standards being applied will for the most part dictate the design chosen of the many that are commonly used, namely active/active, active/passive, N+1, N+M, N-to-1, and N-to-N.

To be effective, a high availability network needs to be configured to avoid conflicts in the event of a failed device. Planning the design architecture should promote reliability, availability, and serviceability (RAS) thru designated succession whether this is pre-configured or voted on by the remaining servers.

Correctly configured, most high availability servers include heartbeat daemons that actively send small packets to other servers in the network, which are in turn also sending heartbeat packets, and all are listening so that a failure is quickly identified and the failed server fenced off and it’s duties reassigned within the system. The avoidance of a split brain scenario where two or more servers attempt to control resources, a situation that will quickly bring down the network and could result in severe damage to data.

Comments are closed.