Skip to main content

Documentation Index

Fetch the complete documentation index at: https://wb-21fd5541-sweeps-updates.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

This guide provides recommendations on how to integrate W&B into your Python training script or notebook for hyperparameter search optimization.

Original training script

Suppose you have a Python script that trains a model (see below). Your goal is to find the hyperparameters that maxmimizes the validation accuracy(val_acc). In your Python script, you define two functions: train_one_epoch and evaluate_one_epoch. The train_one_epoch function simulates training for one epoch and returns the training accuracy and loss. The evaluate_one_epoch function simulates evaluating the model on the validation data set and returns the validation accuracy and loss. You define a configuration dictionary (config) that contains hyperparameter values such as the learning rate (lr), batch size (batch_size), and number of epochs (epochs). The values in the configuration dictionary control the training process. Next you define a function called main that mimics a typical training loop. For each epoch, the accuracy and loss is computed on the training and validation data sets.
This code is a mock training script. It does not train a model, but simulates the training process by generating random accuracy and loss values. The purpose of this code is to demonstrate how to integrate W&B into your training script.
train.py
import random
import numpy as np

def train_one_epoch(epoch, lr, batch_size):
    acc = 0.25 + ((epoch / 30) + (random.random() / 10))
    loss = 0.2 + (1 - ((epoch - 1) / 10 + random.random() / 5))
    return acc, loss

def evaluate_one_epoch(epoch):
    acc = 0.1 + ((epoch / 20) + (random.random() / 10))
    loss = 0.25 + (1 - ((epoch - 1) / 10 + random.random() / 6))
    return acc, loss

# config variable with hyperparameter values
config = {"lr": 0.0001, "batch_size": 16, "epochs": 5}

def main():
    lr = config["lr"]
    batch_size = config["batch_size"]
    epochs = config["epochs"]

    for epoch in np.arange(1, epochs):
        train_acc, train_loss = train_one_epoch(epoch, lr, batch_size)
        val_acc, val_loss = evaluate_one_epoch(epoch)

        print("epoch: ", epoch)
        print("training accuracy:", train_acc, "training loss:", train_loss)
        print("validation accuracy:", val_acc, "validation loss:", val_loss)

if __name__ == "__main__":
    main()
In the next section, you will add W&B to your Python script to track hyperparameters and metrics during training. You want to use W&B to find the best hyperparameters that maximize the validation accuracy (val_acc).

Add W&B to your training script

Update you training script to include W&B. How you integrate W&B to your Python script or notebook depends on how you manage sweeps. To use the W&B Python SDK to start, stop, and manage sweeps, follow the instructions in the Python script or notebook tab. To use the W&B CLI instead, follow the instructions in the CLI tab.
Create a YAML file that defines the hyperparameters to optimize and the metric to optimize. W&B uses this file to determine which hyperparameters to vary during the sweep and which metric to optimize.Add the name of your Python script to the program key in the YAML file on line 1.
The sweep agent selects a value from the values list and passes it to wandb.config in the training script. For example, if you define the batch_size parameter with the values [16, 32, 64], the sweep agent selects one of those values and passes it to the training script as wandb.config.batch_size.
The following YAML file corresponds to the original training script shown earlier. The training script varies the batch_size, lr, and epochs hyperparameters. The YAML file defines the same hyperparameters and specifies the values to try for each one on lines 8 to 14.The training script also computes the validation accuracy metric, val_acc. The YAML file specifies that the sweep should maximize val_acc on line 5.
config.yaml
program: train.py
method: random
name: sweep
metric:
  goal: maximize
  name: val_acc
parameters:
  batch_size:
    values: [16, 32, 64]
  lr:
    min: 0.0001
    max: 0.1
  epochs:
    values: [5, 10, 15]
For more information on how to create a W&B Sweep configuration, see Define sweep configuration.After you define your sweep configuration in a YAML file, you need to add W&B to your training script to read in the YAML file and log the metric you want to optimize for.Within your training script, add the following code snippets to integrate W&B:
  1. Import the W&B Python SDK (wandb).
  2. Initialize a run with wandb.init().
  3. Read the YAML configuration file with a Python package such as yaml, and pass the configuration to wandb.init().
  4. Pass the configuration object to the config parameter of wandb.init().
  5. Retrieve the hyperparameter values from wandb.Run.config so that your script uses the values defined in the YAML file instead of hard-coded values. W&B flattens configuration values, so you can access nested values with dot notation or bracket notation as though they were top-level keys.
  6. Log the metric that you want to optimize with wandb.Run.log().
The following code snippet shows how to integrate W&B into your training script. Lines 4 to 7 show how to read in the YAML configuration file and pass the configuration to wandb.init().Lines 9 and 10 show how to fetch the hyperparameter values from the wandb.Run.config object. Line 17 shows how to log the metric you are optimizing for (val_acc) to W&B.
train.py
import wandb
import yaml
import random
import numpy as np

def train_one_epoch(epoch, lr, batch_size):
    """Simulates training for one epoch and returns the training accuracy and loss."""
    acc = 0.25 + ((epoch / 30) + (random.random() / 10))
    loss = 0.2 + (1 - ((epoch - 1) / 10 + random.random() / 5))
    return acc, loss

def evaluate_one_epoch(epoch):
    """Simulates evaluation for one epoch and returns the validation accuracy and loss."""
    acc = 0.1 + ((epoch / 20) + (random.random() / 10))
    loss = 0.25 + (1 - ((epoch - 1) / 10 + random.random() / 6))
    return acc, loss

def main():
    # Read in the configuration file
    with open("./config.yaml") as file:
        config = yaml.load(file, Loader=yaml.FullLoader)

    with wandb.init(config=config) as run:
        for epoch in np.arange(1, run.config['epochs']):
            train_acc, train_loss = train_one_epoch(epoch, run.config['lr'], run.config['batch_size'])
            val_acc, val_loss = evaluate_one_epoch(epoch)
            run.log(
                {
                    "epoch": epoch,
                    "train_acc": train_acc,
                    "train_loss": train_loss,
                    "val_acc": val_acc,
                    "val_loss": val_loss,
                }
            )

# Call the main function.
main()
W&B flattens configuration values passed to wandb.init(config=)Normally, you access nested values in a configuration object with dot notation or bracket notation. For example, consider the following nested configuration:
sample.yaml
key1: value1
key2:
    nested_key1: nested_value1
    nested_key2: nested_value2
You then read in the file with yaml and pass the configuration to wandb.init(config=):
import yaml

with open("sample.yaml") as file:
    yaml_sample = yaml.load(file, Loader=yaml.FullLoader)
You can then access nested_value1 with yaml_sample["key2"]["nested_key1"] or yaml_sample.key2.nested_key1.When you pass a configuration to wandb.init(config=), W&B flattens the values. This means that you access nested values as though they were top-level keys.For example, consider the following YAML file:
config.yaml
program: train.py
method: random
name: sweep
metric:
    goal: maximize
    name: val_acc
parameters:
epochs:
    values: [10, 20, 30]
learning_rate:
    min: 0.001
    max: 0.1
After you read in the file and pass the configuration to wandb.init(config=), access the goal value with run.config["goal"] instead of run.config["metric"]["goal"] or run.config.metric.goal.
import yaml
with open("config.yaml") as file:
    config = yaml.load(file, Loader=yaml.FullLoader)
with wandb.init(config=config) as run:
    # Access the metric goal
    metric_goal = run.config["goal"] # "maximize"
In your shell, set a maximum number of runs for the sweep agent to try. This is optional. In this example, we set the maximum number to 5.
NUM=5
Next, initialize the sweep with the wandb sweep command. Provide the name of the YAML file. Optionally provide the name of the project for the project flag (--project):
wandb sweep --project project_name config.yaml
This returns a sweep ID. For more information on how to initialize sweeps, see Initialize sweeps.Copy the sweep ID and replace sweepID in the following code snippet to start the sweep job with the wandb agent command:
wandb agent --count $NUM your-entity/project_name/sweepID
For more information, see Start sweep jobs.
Logging metrics to W&B in a sweepYou must log the metric you define and are optimizing for in both your sweep configuration and with wandb.Run.log(). For example, if you define the metric to optimize as val_acc within your sweep configuration, you must also log val_acc to W&B. If you do not log the metric, W&B does not know what to optimize for.
with wandb.init() as run:
    val_loss, val_acc = train()
    run.log(
        {
            "val_loss": val_loss,
            "val_acc": val_acc
            }
        )
The following is an incorrect example of logging the metric to W&B. The metric that is optimized for in the sweep configuration is val_acc, but the code logs val_acc within a nested dictionary under the key validation. You must log the metric directly, not within a nested dictionary.
with wandb.init() as run:
    val_loss, val_acc = train()
    run.log(
        {
            "validation": {
                "val_loss": val_loss, 
                "val_acc": val_acc
                }
            }
        )