Quick start¶

A worked example¶

Let’s delve in into the API of Hypertunity by going through a worked example—neural network hyperparameter optimisation. In the following we will tune the number of layers and units, the non-linearity type, as well as the dropout rate and the learning rate of the optimiser.

Disclaimer: This example serves a demonstration purpose only. It does not represent an advanced way of performing neural network architecture search!

First thing we do it to import Hypertunity, tensorflow and numpy and define a helper data loading function:

import hypertunity as ht
import numpy as np
import tensorflow as tf

import hypertunity.reports.tensorboard as ht_tb


def load_mnist():
    (train_x, train_y), (test_x, test_y) = tf.keras.datasets.mnist.load_data()
    data_shape = train_x.shape[1:]
    train_x = train_x.reshape(-1, np.prod(data_shape)).astype(np.float32) / 255.
    mean_train = np.mean(train_x, axis=0)
    train_x -= mean_train
    test_x = test_x.reshape(-1, np.prod(data_shape)).astype(np.float32) / 255.
    test_x -= mean_train
    train_y = tf.keras.utils.to_categorical(train_y, num_classes=10)
    test_y = tf.keras.utils.to_categorical(test_y, num_classes=10)
    return (train_x, train_y), (test_x, test_y)

Next we define a function that will build the model given the architectural hyperparameters and the learning rate, followed by the objective which will wrap the model building and evaluation:

def build_model(inp_size, out_size, n_layers, n_units, p_dropout, activation):
    inp = tf.keras.Input(inp_size)
    h = inp
    for l in range(n_layers - 1):
        h = tf.keras.layers.Dense(n_units, activation=activation)(h)
        h = tf.keras.layers.Dropout(rate=p_dropout)(h)
    h = tf.keras.layers.Dense(out_size, activation=None)(h)
    out = tf.keras.layers.Softmax()(h)
    model = tf.keras.models.Model(inputs=inp, outputs=out)
    return model


def objective_fn(**config) -> float:
    (train_x, train_y), (test_x, test_y) = load_mnist()
    model = build_model(train_x.shape[-1], train_y.shape[-1],
                        config["arch"]["n_layers"],
                        config["arch"]["n_units"],
                        config["arch"]["p_dropout"],
                        config["arch"]["activation"])
    opt = tf.keras.optimizers.Adam(learning_rate=config["opt"]["lr"])
    model.compile(optimizer=opt, loss="categorical_crossentropy")
    model.fit(train_x, train_y, batch_size=100, epochs=1)
    score = model.evaluate(test_x, test_y, batch_size=test_x.shape[0])
    return score

Now that we can build a model, we should define the ranges of possible values for the these parameters. This can be done with creating a Domain instance as follows:

domain = ht.Domain({
    "arch": {
        "n_layers": {1, 3, 5},
        "n_units": {10, 50, 100, 500},
        "p_dropout": [0, 0.9999],
        "activation": {"relu", "selu", "elu"}
    },
    "opt": {
        "lr": [1e-9, 1e-2]
    }
})

The Domain plays a central role in Hypertunity and we will make a frequent use of it later as well. An important related class is the Sample. It can be thought of as one realisation of the variables from the domain, which in our case is one particular configuration of network hyperparameters.

Using the domain, we can set up the optimiser and the result visualiser also used for experiment logging. In this case we use BayesianOptimisation and Tensorboard respectively:

optimiser = ht.BayesianOptimisation(domain)
tb_rep = ht_tb.Tensorboard(domain,
                           metrics=["cross-entropy"],
                           logdir="./mnist_mlp",
                           database_path="./mnist_mlp")

After we create the Tensorboard reporter we will be prompted to run tensorboard –logdir=./mnist_mlp in the console and open Tensorboard in the browser. We can do this also before we launch the actual optimisation.

One last bit before running it is the definition of the job schedule as well as optimiser and reporter update loop. This is to ensure that samples are generated, experiments are run and the results used to improve the underlying model of the BayesianOptimisation optimiser. To schedule one experiment at a time, for 50 consecutive steps we create a Job for each function call of objective_fn with a set of suggested hyperparameters:

n_steps = 50
batch_size = 1
with ht.Scheduler(n_parallel=batch_size) as scheduler:
    for i in range(n_steps):
        samples = optimiser.run_step(batch_size=batch_size, minimise=True)
        jobs = [ht.Job(task=objective_fn, args=s.as_dict() for s in samples]
        scheduler.dispatch(jobs)
        evaluations = [r.data for r in scheduler.collect(n_results=batch_size, timeout=100.0)]
        optimiser.update(samples, evaluations)
        for sample_evaluation_pair in zip(samples, evaluations):
            tb_rep.log(sample_evaluation_pair)

If we have a look at the Tensorboard dashboard while this is running, we should be able to see results being updated live!

Even quicker start¶

A high-level wrapper class Trial allows for seamless parallel optimisation without having to schedule jobs, update the optimiser or log results explicitly. The API is reduced to the minimum and yet remains flexible as one can specify any optimiser or reporter:

trial = ht.Trial(objective=objective_fn,
                 domain=domain,
                 optimiser="bo",
                 reporter="tensorboard",
                 logdir="./mnist_mlp",
                 database_path="./mnist_mlp",
                 metrics=["cross-entropy"])

trial.run(n_steps, batch_size=batch_size, n_parallel=batch_size)