Training FAQ: High Availability

Stardog Academy Training FAQ:

High Availability

What is the benefit of using a Stardog cluster?

A cluster will provide high availability. In the event that a node goes down, the other servers will pick up the slack.

How many nodes are required for a Stardog cluster?

At least 3 are needed. If more than 3 nodes are used, then odd numbers are recommended for the best fault tolerance.

Can I connect directly to a single node and run queries against the cluster?

Yes, the node will respond as expected and any writes will be propagated to the rest of the cluster as needed.

Will the cluster be more performant than a standalone node?

Reads will be more performant while writes will be less. The goal of the cluster is high availability and performance must be balanced with cluster size.

What happens if the coordinator of the cluster goes down?

The remaining cluster nodes will elect a new coordinator and the old coordinator will need to sync to join the cluster.

What happens if a node goes out of sync from the cluster?

The node will be expelled from the cluster and sync up before asking to rejoin the cluster.

The bulk of my data is loaded into a database at startup and the majority of requests are read queries, how should I optimize my deployment?

Each Stardog node in the cluster can mount a volume created from the snapshot, bulk load the data at startup, and since any node can independently respond to a read request the load balancer can distribute requests round-robin. Joining nodes aren’t blocked by read requests so nodes will generally be able to join on their first attempt. Refer to Stardog Docs for more information.

My data is written to Stardog throughout the day in frequent transactions but not at night, how should I optimize my deployment?

If it’s important to your use case that a joining node not block writes, you can configure Stardog to never forcibly obtain the join lock. A real consideration is weighing the risk of losing a node: E.g., are you operating your cluster in an unreliable environment? How many nodes can you afford to lose and for how long? If you deploy a three-node cluster but it’s too risky to operate your production cluster with only two nodes for HA, then it may make sense to deploy a larger cluster so you can afford to lose more nodes during write-heavy times and wait for nodes to rejoin once writes subside. Refer to the Stardog Blog Post - Tuning Cluster for Cloud for more information.

My cluster rarely experiences quiet time with respect to writes and I want nodes to rejoin as quickly as possible, how can I achieve this?

You can configure a joining node to obtain the lock on the second attempt. In this case, the joining node will block the writes; but, since the node will sync without the lock on the first attempt, it will be able to mostly catch up to the other nodes in the cluster. On the second attempt, it will forcibly obtain the lock and sync any transactions it missed in that short window and join, only blocking writes for a short time.

Finally, if your workload consists of a lot of small transactions and higher throughput (in terms of writes per second) is more important than transaction latency, consider batching your smaller transactions into fewer larger transactions. Larger transactions will reduce the cluster overhead required to commit the data on all of the nodes.