Server - Cluster
Lavender Data can be distributed to multiple servers to improve performance and scalability. This allows you to handle larger datasets and higher request loads by spreading the work across several machines.
To enable the cluster mode, you need to set the following environment variables for each server instance (node).
Variable | Description | Default |
---|---|---|
LAVENDER_DATA_CLUSTER_ENABLED | Whether to enable the cluster mode. | false |
LAVENDER_DATA_CLUSTER_SECRET | The shared secret for the cluster. | "" |
LAVENDER_DATA_CLUSTER_HEAD_URL | The URL of the head node (must be reachable by workers). | "" |
LAVENDER_DATA_CLUSTER_NODE_URL | The URL of the current node (must be reachable by head). | "" |
LAVENDER_DATA_CLUSTER_SECRET
is a random string (e.g. 34f8f98ee8d9afdf81eb
) used for authentication between nodes. All nodes in the cluster must have the same secret.
LAVENDER_DATA_CLUSTER_HEAD_URL
and LAVENDER_DATA_CLUSTER_NODE_URL
must be accessible from all other nodes in the cluster. These URLs define the communication endpoints for the cluster coordination.
Example Configuration
Let’s go through an example of how to configure a cluster.
Env files
In this example, we’ll run a head node and a worker node on the same machine.
The head node will be running on port 8000 and the worker node will be running on port 8001.
Thus, LAVENDER_DATA_CLUSTER_NODE_URL
is set to http://127.0.0.1:8000
for the head node and http://127.0.0.1:8001
for the worker node.
LAVENDER_DATA_DB_URL
and LAVENDER_DATA_LOG_FILE
are also set to ensure the nodes can have their own independent database and log file.
.env.head
(Head Node)
LAVENDER_DATA_DB_URL=sqlite:///head.dbLAVENDER_DATA_LOG_FILE=logs/head.log
LAVENDER_DATA_CLUSTER_SECRET=your_shared_secret_here # Replace with a secure secretLAVENDER_DATA_CLUSTER_ENABLED=trueLAVENDER_DATA_CLUSTER_HEAD_URL=http://127.0.0.1:8000 # URL of this head nodeLAVENDER_DATA_CLUSTER_NODE_URL=http://127.0.0.1:8000 # URL of this head node
.env.worker
(Worker Node)
LAVENDER_DATA_DB_URL=sqlite:///worker.dbLAVENDER_DATA_LOG_FILE=logs/worker.log
LAVENDER_DATA_CLUSTER_SECRET=your_shared_secret_hereLAVENDER_DATA_CLUSTER_ENABLED=trueLAVENDER_DATA_CLUSTER_HEAD_URL=http://127.0.0.1:8000 # URL of the head nodeLAVENDER_DATA_CLUSTER_NODE_URL=http://127.0.0.1:8001 # URL of this worker node
Running the Cluster
Use --env-file
option to specify the environment file for each node.
# Terminal 1: Start Head Nodelavender-data server run --port 8000 --env-file .env.head
# Terminal 2: Start Worker Nodelavender-data server run --port 8001 --env-file .env.worker
Head Node Startup Log
When the head node starts successfully and workers connect, you’ll see logs indicating registration.
INFO - uvicorn.error: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)INFO - uvicorn.access: 127.0.0.1:50356 - "GET /version HTTP/1.1" 200INFO - uvicorn.access: 127.0.0.1:50357 - "POST /cluster/register HTTP/1.1" 200INFO - uvicorn.access: 127.0.0.1:50358 - "POST /cluster/heartbeat HTTP/1.1" 200INFO - lavender_data.server.distributed.cluster: Node http://127.0.0.1:8001 registered <- worker node registered
Worker Node Startup Log
Worker nodes will wait for the head node to become available before completing startup. They log synchronization attempts.
INFO - lavender_data.server.distributed.cluster: Waiting for head node to be ready: http://127.0.0.1:8000 <- waiting for head node to be readyINFO - uvicorn.error: Application startup complete.INFO - uvicorn.error: Uvicorn running on http://0.0.0.0:8001 (Press CTRL+C to quit)INFO - uvicorn.access: 127.0.0.1:50359 - "GET /version HTTP/1.1" 200INFO - uvicorn.access: 127.0.0.1:50360 - "POST /cluster/sync HTTP/1.1" 200 <- worker node synced with head node
If a worker node cannot reach the head node specified in LAVENDER_DATA_CLUSTER_HEAD_URL
, it will fail to start:
INFO - lavender_data.server.distributed.cluster: Waiting for head node to be ready: http://127.0.0.1:8000ERROR: Traceback (most recent call last): ...RuntimeError: Node http://127.0.0.1:8000 did not start in 10.0 seconds
ERROR: Application startup failed. Exiting.
Data Synchronization
When you create a dataset or add shardsets via any node, this information is synchronized across the cluster.
Shard Discovery
Head node will inspect the shardset location and discover the shards in it when a shardset is created or being synced.
You can trigger shardset sync with CLI.
lavender-data client --api-url <any-node-url> \ shardsets sync --dataset-id <dataset-id> --shardset-id <shardset-id>
If you have different shard files in each node, iteration will fail in the middle as some nodes cannot find some shard files.
For example, if you have shard files in two different nodes like below,
node_1
will not be able to find shard.03.csv
, shard.04.csv
, and shard.05.csv
.
Directory
node_1: /path/to/the/shardset/
- shard.00.csv
- shard.01.csv
- shard.02.csv
Directory
node_2: /path/to/the/shardset/
- shard.03.csv
- shard.04.csv
- shard.05.csv
Therefore, all nodes must have the same shard files.
Directory
node_1: /path/to/the/shardset/
- shard.00.csv
- shard.01.csv
- shard.02.csv
- shard.03.csv
- shard.04.csv
- shard.05.csv
Directory
node_2: /path/to/the/shardset/
- shard.00.csv
- shard.01.csv
- shard.02.csv
- shard.03.csv
- shard.04.csv
- shard.05.csv
Client Interaction
Clients can interact with the cluster by connecting to any node (head or worker) using its URL. The cluster handles the internal routing and coordination.
if args.node == "head": api_url = "http://localhost:8000" # Head node URLelif args.node == "worker": api_url = "http://localhost:8001" # Worker node URL
lavender.init(api_url=api_url)
iteration = Iteration.from_dataset( dataset_name="my-dataset", api_url=api_url, # you can also specify the api_url directly here)