ClusterLabs

Open Source High Availability Cluster Stack

The ClusterLabs stack unifies a large group of Open Source projects related to High Availability into a cluster offering suitable for both small and large deployments. Together, Corosync, Pacemaker, DRBD, ScanCore, and many other projects have been enabling detection and recovery of machine and application-level failures in production clusters since 1999. The ClusterLabs stack supports practically any redundancy configuration imaginable.

Learn More

Quick Overview

Deploy

We support many deployment scenarios, from the simplest 2-node standby cluster to a 32-node active/active configuration. We can also dramatically reduce hardware costs by allowing several active/passive clusters to be combined and share a common backup node.

Monitor

We monitor the system for both hardware and software failures. In the event of a failure, we will automatically recover your application and make sure it is available from one of the remaning machines in the cluster.

Recover

After a failure, we use advanced algorithms to quickly determine the optimum locations for services based on relative node preferences and/or requirements to run with other cluster services (we call these "constraints").

Why clusters

At its core, a cluster is a distributed finite state machine capable of co-ordinating the startup and recovery of inter-related services across a set of machines.

System HA is possible without a cluster manager, but you save many headaches using one anyway

Even a distributed and/or replicated application that is able to survive the failure of one or more components can benefit from a higher level cluster:

awareness of other applications in the stack
a shared quorum implementation and calculation
data integrity through fencing (a non-responsive process does not imply it is not doing anything)
automated recovery of instances to ensure capacity

While SYS-V init replacements like systemd can provide deterministic recovery of a complex stack of services, the recovery is limited to one machine and lacks the context of what is happening on other machines - context that is crucial to determine the difference between a local failure, clean startup or recovery after a total site failure.

Features

The ClusterLabs stack, incorporating Corosync and Pacemaker defines an Open Source, High Availability cluster offering suitable for both small and large deployments.

Detection and recovery of machine and application-level failures
Supports practically any redundancy configuration
Supports both quorate and resource-driven clusters
Configurable strategies for dealing with quorum loss (when multiple machines fail)
Supports application startup/shutdown ordering, without requiring the applications to run on the same node
Supports applications that must or must not run on the same node
Supports applications which need to be active on multiple nodes
Supports applications with dual roles (promoted and unpromoted)
Provably correct response to any failure or cluster state. The cluster's response to any stimuli can be tested offline before the condition exists

Components

"The definitive open-source high-availability stack for the Linux platform builds upon the Pacemaker cluster resource manager."
-- LINUX Journal, "Ahead of the Pack: the Pacemaker High-Availability Stack"

A Pacemaker stack is built on five core components:

libQB - core services (logging, IPC, etc)
Corosync - Membership, messaging and quorum
Resource agents - A collection of scripts that interact with the underlying services managed by the cluster
Fencing agents - A collection of scripts that interact with network power switches and SAN devices to isolate cluster members
Pacemaker itself

We describe each of these in more detail as well as other optional components such as CLIs and GUIs.

Background

Pacemaker has been around since 2004 and is primarily a collaborative effort between Red Hat and SUSE, however we also receive considerable help and support from the folks at LinBit and the community in general.

Corosync also began life in 2004 but was then part of the OpenAIS project. It is primarily a Red Hat initiative, with considerable help and support from the folks in the community.

The core ClusterLabs team is made up of full-time developers from Australia, Austria, Canada, China, Czech Repulic, England, Germany, Sweden and the USA. Contributions to the code or documentation are always welcome.

The ClusterLabs stack ships with most modern enterprise distributions and has been deployed in many critical environments including Deutsche Flugsicherung GmbH (DFS) which uses Pacemaker to ensure its air traffic control systems are always available.