Release Notes
0.1.2 (2025-05-20)
Added
Preview media
Preview media files like images, videos in dataset preview.

Selectable columns
Select columns and sort them as you like.

HTTP Storage
Added HTTP/HTTPS storage support. Fetch files from HTTP/HTTPS URLs.
For example,
https://docs.lavenderdata.com/example-dataset/images/shard.00000.csv
shardset_location
option for dataset creation
Added shardset_location
option so you can create a dataset and the shardset at the same time.
lavender-data client \ datasets create \ --name my_dataset \ --uid-column-name id \ --shardset-location https://docs.lavenderdata.com/example-dataset/images/
0.1.1 (2025-05-19)
Added
overwrite
, drop_last
options for dataset preprocessing
Fixed
Background Preprocessing
Background preprocessing might not preserve the order of samples.
Only one worker is used for background preprocessing.
Server Daemon
Server does not terminate gracefully when lavender-data server stop
is called.
Server is not aware of if all background workers are ready or errored.
LavenderDataLoader
LavenderDataLoader unnecessarily calls /version
endpoint multiple time.
0.1.0 (2025-05-15)
Added
Categorizer
Please refer here for more details.
Background preprocessing
Please refer here for more details.
0.0.13 (2025-05-12)
Added
Background worker
Heavy tasks like shardset synchronization and sample processing are now running in background workers not to disturb the main process.
Added LAVENDER_DATA_NUM_WORKERS
env to control the number of workers for the background worker.
0.0.10 (2025-05-08)
Added
Delete dataset & shardset
Added delete_dataset
and delete_shardset
APIs as well as in the UI.
0.0.9 (2025-05-02)
Added
Fault handling
Added fault handling features.
skip_on_failure
optionmax_retry_count
option
0.0.8 (2025-04-30)
Added
Daemon commands
Added cli command: start
stop
restart
logs
Those are daemon-related commands.
start
starts the server daemon in background. You can stop, restart with stop
, restart
commands.
lavender-data server start --init> lavender-data is running on 0.0.0.0:8000> UI is running on http://localhost:8000> API key created: la-...
You can check the logs from daemon with logs
command.
lavender-data server logs -n 100# print last 100 lines of the logs
lavender-data server logs -f# wait for new logs and print them
Converters
A Converter is a tool that converts data from a source to a Lavender Data shardset.
import webdataset as wds
url = "https://storage.googleapis.com/webdataset/testdata/publaynet-train-{000000..000009}.tar"pil_dataset = wds.WebDataset(url).decode("pil")
converter = lavender.Converter.get("webdataset")converter.to_shardset( pil_dataset, "dataset_name", location=f"file:///path/to/shardset", uid_column_name="id",)
UI breadcrumb
Added breadcrumb
Fixed
Client sdk interface
Before
from lavender_data.client import api as lavender, Iteration
lavender.init(api_url="")
lavender.get_datasets()
for row in Iteration.from_dataset(...).to_torch_data_loader(): ...
After
import lavender_data.client as lavender
lavender.init(api_url="")
lavender.api.get_datasets()# orlavender.get_client().get_datasets()# or (in case you need ignore `lavender.init()`)lavender.LavenderDataClient(api_url="").get_datasets()
for row in lavender.LavenderDataLoader(...).torch(): ...
0.0.7 (2025-04-21)
Added
Cluster
Added cluster features (LAVENDER_DATA_CLUSTER_*
envs, cluster_sync
option)
Fixed
Turned env API_URL
of UI into runtime so it can be injected when running lavender-data server run
Fixed shards not shuffling bug