Skip to content

Server - Background Preprocess

Start background preprocessing with preprocess_dataset API. Preprocessed results are saved as a new shardset.

These are the parameters you need to specify:

  1. Dataset ID
  2. Source shardset IDs: List of shardsets to preprocess
  3. Preprocessors: List of preprocessors to apply
  4. Batch size: Number of samples to process in each batch
  5. Export columns: List of columns to export
  6. Destination shardset location: Location to save the preprocessed shardset

Click the “Preprocess” button in the dataset settings page.

Dataset settingsPreprocess dataset

The preprocess job will be enqueued and executed in the background.