Stardog Academy Training FAQ:
High Availability

A cluster will provide high availability. In the event that a node goes down, the other servers will pick up the slack.

At least 3 are needed. If more than 3 nodes are used, then odd numbers are recommended for the best fault tolerance.

Yes, the node will respond as expected and any writes will be propagated to the rest of the cluster as needed.

Reads will be more performant while writes will be less. The goal of the cluster is high availability and performance must be balanced with cluster size.

The remaining cluster nodes will elect a new coordinator and the old coordinator will need to sync to join the cluster.

The node will be expelled from the cluster and sync up before asking to rejoin the cluster.

Each Stardog node in the cluster can mount a volume created from the snapshot, bulk load the data at startup, and since any node can independently respond to a read request the load balancer can distribute requests round-robin. Joining nodes aren’t blocked by read requests so nodes will generally be able to join on their first attempt. Refer to Stardog Docs for more information.

If it’s important to your use case that a joining node not block writes, you can configure Stardog to never forcibly obtain the join lock. A real consideration is weighing the risk of losing a node: E.g., are you operating your cluster in an unreliable environment? How many nodes can you afford to lose and for how long? If you deploy a three-node cluster but it’s too risky to operate your production cluster with only two nodes for HA, then it may make sense to deploy a larger cluster so you can afford to lose more nodes during write-heavy times and wait for nodes to rejoin once writes subside. Refer to the Stardog Blog Post - Tuning Cluster for Cloud for more information.

You can configure a joining node to obtain the lock on the second attempt. In this case, the joining node will block the writes; but, since the node will sync without the lock on the first attempt, it will be able to mostly catch up to the other nodes in the cluster. On the second attempt, it will forcibly obtain the lock and sync any transactions it missed in that short window and join, only blocking writes for a short time.
Finally, if your workload consists of a lot of small transactions and higher throughput (in terms of writes per second) is more important than transaction latency, consider batching your smaller transactions into fewer larger transactions. Larger transactions will reduce the cluster overhead required to commit the data on all of the nodes.