There are two models for clustering in Neo4j as of version 3.1:
- High Availability (HA) clustering
- Causal clustering
In both configurations, the full dataset is replicated to each instance – Neo4j doesn't currently support "sharding" a graph across instances of a cluster.
High Availability clustering
HA clusters are made up of at least three Neo4j instances: one "master" and two "slave" instances. The master instance performs writes, and pushes data out to slave instances upon successful write completion. Reads can happen from master or slave instances. All members of the cluster should have identical hardware.
It's also possible to configure one slave instance as an "arbiter" instance, which only participates in quorums to elect a master, and does not replicate data. In this setup, the arbiter can be a low-powered server.
HA clusters are useful for 24/7 uptime and increased read performance.
Causal clustering was introduced in Neo4j 3.1 to support data replication between geographical regions, and support continued read and write operations in the event of multiple hardware and network failures.
A causal cluster is made up of two groups:
- One set of "core servers" that handle read and write operations. A simple majority of core servers needs to stay in operation to handle continued write functionality. So, if you have five core servers, two can fail before write capability is lost.
- One or more "read replicas." These are read-only instances with data that is asynchronously replicated from the core servers. These are suitable for wide geographic distribution of data, and allow for scaling out query workloads across lots of servers.
Graph Story can set up both HA and causal Neo4j clusters for your organization. In general, we find most applications are well-served by a single HA cluster, but systems that require extremely fault tolerant setups and geographic distribution may benefit from Causal clustering.