High availability clusters (HA clusters), also known as failover clusters, are a network of servers configured to operate as a single machine with data backups occurring in real time so that in the event of a failure in the network backup machines will take over seamlessly and continue to operate. High availability clusters are designed to operate automatically and not need the intervention of an IT support person to restart the network.
A single server that fails results in downtime of the network, in some organizations this can be measured in days or weeks and threatens business continuity. Even short period downtime of critical servers can in some instances result in significant losses of income. Building fault tolerant servers with redundant power supplies, duplicated components, and RAID storage offers some protection of data and improved uptime, yet still offers only a single CPU and if that fails so too will the server.
By contrast, high availability clustered servers are setup with intelligent software that monitors all servers within the cluster for problems, and are designed to keep going regardless of component failures. In business critical situations downtime is costly both in terms of revenue and customer trust, and in extreme situations such as hospitals may even be life threatening.
Very few Fortune 500 companies and other major corporations operate without high availability clusters. Each node in the cluster has the capability to take over from another, so even though some servers are properly setup as backup servers ready to assume primary server roles, the same could be said in reverse. Clustered backup servers are therefore never configured in a cold or warm state, they are in fact always hot, with real time synchronization occurring.
Enhancing a clustered server setup for even more redundancy usually requires setting up geographically disparate data centers with very high speed network connections, that are synchronized the same way, although the hope is that a major disaster affecting the corporation won’t shut down the network.
Multiple site high availability clusters are more expensive to setup, and more difficult to configure, but once the network is operational maintenance is routine. Servers are connected using multiple links from different carriers ensuring further redundancy. Typically data centers are chosen that don’t share the same risks of disaster, so a data center in California that could be affected by earthquake might be mirrored with a center in Chicago that would not be affected.
High availability clustering design is technically demanding and complex, and can be quite tricky to configure to ensure load balancing is equally matched across all servers. Seamless load balancing of some tasks such as serving web pages can be accomplished quite easily, but is less easy with transactional tasks or read/write services unless disk mirroring is very close to real time and sessions can be shared across servers.
Software for running a high availability cluster is readily available, however applications will need to be tested for robustness, particularly it’s handling of data in the event of a crash and use of non-volatile storage in preference to shared storage.