In IT terms, high availability is the term used to describe a server or network, and by extension all of the constituent components, that is continuously available for data input, processing, and report querying. In it’s simplest sense, we see high availability in our everyday interactions on the web, it is very rare for the top 10 websites to be offline.
The systems and protocols in place largely determine the success or not of a high availability goal, mostly because components fail randomly, and no group of components is ever going to be failsafe, so ensuring continuous supply of service is dependent on planning and understanding the tolerances that affect the network.
Twenty-first century society is so dependent on computers and their processing power that anything less reliable and permanent uptime is considered a failing, corporations thrive or fail on their network or service uptime reputation.
Even at the small corporation and private individual level, high availability may still be necessary and desirable. Our dependence on timely email, online banking and financial trades, upto date weather and news, are all examples of services that require high availability, failing to deliver only ever benefits the competition in a free market.
Fault tolerance is an important measure of any high availability system or network, essentially a single failure point should never bring the entire system down, an alternate component should be able to transparently take over the task until the original is replaced or repaired.
Reliable components with long lives and capable of sustaining extensive periods without maintenance also form an important aspect of high availability networks. Typically these components are not the cheapest models in the market, increased reliability is a factor of how well designed and made a component is, and higher quality standards come with a price.
Measuring reliability is often considered in terms of an uptime percentage, certainly most telecommunications and web hosting corporations advertise uptime guarantees of 99 or higher percent. Mission critical availability of servers, and networking systems is often described in terms of 99.999% uptime, equating to just 5.26 minutes of downtime per month.
In fact, this measure is not easily quantified and is subject to interpretation, a server that is up may not be available to the end user, for example when an outage occurs on some other part of the network, or where a software glitch prevents the user from completing their task.
Creating a high availability system will involve extensive planning, with the goal being to make the entire system as simple as possible, with the least number of potential failure points. Overly complex systems typically have more failure points that need to be accounted for, increasing cost, complexity, and monitoring. By contrast simpler systems are easier to monitor, and despite using more expensive and more reliable components are often cheaper to build in the long run.
Testing high availability networks at their failure points should be conducted regularly, however backup systems or redundant components also require testing, thus ensuring the entire network remains available in the event of component failure or disaster.