Skip to content

Server - Cluster

Lavender Data can be distributed to multiple servers to improve performance and scalability. This allows you to handle larger datasets and higher request loads by spreading the work across several machines.

To enable the cluster mode, you need to set the following environment variables for each server instance (node).

VariableDescriptionDefault
LAVENDER_DATA_CLUSTER_ENABLEDWhether to enable the cluster mode.false
LAVENDER_DATA_CLUSTER_SECRETThe shared secret for the cluster.""
LAVENDER_DATA_CLUSTER_HEAD_URLThe URL of the head node (must be reachable by workers).""
LAVENDER_DATA_CLUSTER_NODE_URLThe URL of the current node (must be reachable by head).""

LAVENDER_DATA_CLUSTER_SECRET is a random string (e.g. 34f8f98ee8d9afdf81eb) used for authentication between nodes. All nodes in the cluster must have the same secret.

LAVENDER_DATA_CLUSTER_HEAD_URL and LAVENDER_DATA_CLUSTER_NODE_URL must be accessible from all other nodes in the cluster. These URLs define the communication endpoints for the cluster coordination.

Example Configuration

Let’s go through an example of how to configure a cluster.

Env files

In this example, we’ll run a head node and a worker node on the same machine. The head node will be running on port 8000 and the worker node will be running on port 8001. Thus, LAVENDER_DATA_CLUSTER_NODE_URL is set to http://127.0.0.1:8000 for the head node and http://127.0.0.1:8001 for the worker node.

LAVENDER_DATA_DB_URL and LAVENDER_DATA_LOG_FILE are also set to ensure the nodes can have their own independent database and log file.

.env.head (Head Node)

LAVENDER_DATA_DB_URL=sqlite:///head.db
LAVENDER_DATA_LOG_FILE=logs/head.log
LAVENDER_DATA_CLUSTER_SECRET=your_shared_secret_here # Replace with a secure secret
LAVENDER_DATA_CLUSTER_ENABLED=true
LAVENDER_DATA_CLUSTER_HEAD_URL=http://127.0.0.1:8000 # URL of this head node
LAVENDER_DATA_CLUSTER_NODE_URL=http://127.0.0.1:8000 # URL of this head node

.env.worker (Worker Node)

LAVENDER_DATA_DB_URL=sqlite:///worker.db
LAVENDER_DATA_LOG_FILE=logs/worker.log
LAVENDER_DATA_CLUSTER_SECRET=your_shared_secret_here
LAVENDER_DATA_CLUSTER_ENABLED=true
LAVENDER_DATA_CLUSTER_HEAD_URL=http://127.0.0.1:8000 # URL of the head node
LAVENDER_DATA_CLUSTER_NODE_URL=http://127.0.0.1:8001 # URL of this worker node

Running the Cluster

Use --env-file option to specify the environment file for each node.

Terminal window
# Terminal 1: Start Head Node
lavender-data server run --port 8000 --env-file .env.head
# Terminal 2: Start Worker Node
lavender-data server run --port 8001 --env-file .env.worker

Head Node Startup Log

When the head node starts successfully and workers connect, you’ll see logs indicating registration.

INFO - uvicorn.error: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
INFO - uvicorn.access: 127.0.0.1:50356 - "GET /version HTTP/1.1" 200
INFO - uvicorn.access: 127.0.0.1:50357 - "POST /cluster/register HTTP/1.1" 200
INFO - uvicorn.access: 127.0.0.1:50358 - "POST /cluster/heartbeat HTTP/1.1" 200
INFO - lavender_data.server.distributed.cluster: Node http://127.0.0.1:8001 registered <- worker node registered

Worker Node Startup Log

Worker nodes will wait for the head node to become available before completing startup. They log synchronization attempts.

INFO - lavender_data.server.distributed.cluster: Waiting for head node to be ready: http://127.0.0.1:8000 <- waiting for head node to be ready
INFO - uvicorn.error: Application startup complete.
INFO - uvicorn.error: Uvicorn running on http://0.0.0.0:8001 (Press CTRL+C to quit)
INFO - uvicorn.access: 127.0.0.1:50359 - "GET /version HTTP/1.1" 200
INFO - uvicorn.access: 127.0.0.1:50360 - "POST /cluster/sync HTTP/1.1" 200 <- worker node synced with head node

If a worker node cannot reach the head node specified in LAVENDER_DATA_CLUSTER_HEAD_URL, it will fail to start:

INFO - lavender_data.server.distributed.cluster: Waiting for head node to be ready: http://127.0.0.1:8000
ERROR: Traceback (most recent call last):
...
RuntimeError: Node http://127.0.0.1:8000 did not start in 10.0 seconds
ERROR: Application startup failed. Exiting.

Data Synchronization

When you create a dataset or add shardsets via any node, this information is synchronized across the cluster.

Shard Discovery

Head node will inspect the shardset location and discover the shards in it when a shardset is created or being synced.

You can trigger shardset sync with CLI.

Terminal window
lavender-data client --api-url <any-node-url> \
shardsets sync --dataset-id <dataset-id> --shardset-id <shardset-id>

If you have different shard files in each node, iteration will fail in the middle as some nodes cannot find some shard files.

For example, if you have shard files in two different nodes like below, node_1 will not be able to find shard.03.csv, shard.04.csv, and shard.05.csv.

  • Directory

    node_1: /path/to/the/shardset/

    • shard.00.csv
    • shard.01.csv
    • shard.02.csv
  • Directory

    node_2: /path/to/the/shardset/

    • shard.03.csv
    • shard.04.csv
    • shard.05.csv

Therefore, all nodes must have the same shard files.

  • Directory

    node_1: /path/to/the/shardset/

    • shard.00.csv
    • shard.01.csv
    • shard.02.csv
    • shard.03.csv
    • shard.04.csv
    • shard.05.csv
  • Directory

    node_2: /path/to/the/shardset/

    • shard.00.csv
    • shard.01.csv
    • shard.02.csv
    • shard.03.csv
    • shard.04.csv
    • shard.05.csv

Client Interaction

Clients can interact with the cluster by connecting to any node (head or worker) using its URL. The cluster handles the internal routing and coordination.

if args.node == "head":
api_url = "http://localhost:8000" # Head node URL
elif args.node == "worker":
api_url = "http://localhost:8001" # Worker node URL
lavender.init(api_url=api_url)
iteration = Iteration.from_dataset(
dataset_name="my-dataset",
api_url=api_url, # you can also specify the api_url directly here
)