Hyperparameter optimization¶

Hyperparameter optimization (HPO) is for tuning the hyperparameters of your machine learning model, e.g., the learning rate, filter sizes, etc. There are several popular algorithms used for HPO including grid search, random search, Bayesian optimization, and genetic optimization. Similarly, there are several libraries and tools implementing these algorithms, each having their own tradeoffs in usability, flexibility, and feature support.

On this page we will collect recommendations and examples for running distributed HPO tasks on our HPC systems.

Weights and Biases¶

W&B is a great tool for experiment logging and visualization, in addition to HPO. The W&B webpage has documentation and examples: https://wandb.ai/

Additionally, we provide a PyTorch codebase that can serve as a template for logging and HPO with W&B for your deep learning applications (including multi-GPU distributed data parallel applications). See the template here: W&B template for NERSC

KerasTuner¶

An easy-to-use tool if you're using Keras: https://keras.io/keras_tuner/

RayTune¶

Tune is an open-source Python library for experiment execution and hyperparameter tuning at any scale. RayTune:

supports any ML framework
implements state of the art HPO strategies
natively integrates with optimization libraries (HyperOpt, BayesianOpt, and Facebook Ax)
integrates well with Slurm
handles trial micro scheduling on multi-GPU-node resources (no GPU binding boilerplate needed)

We provide RayTune in all of our GPU TensorFlow and PyTorch modules and Shifter images. You can also use our slurm-ray-cluster scripts for running multi-GPU node HPO campaigns, and the repo includes a "hello world" MNIST example.

HYPPO¶

A new tool built by some LBNL folks which is tested on NERSC systems: https://hpo-uq.gitlab.io/

DeepHyper¶

DeepHyper is a Python package for distributed Hyperparameter Optimization, Neural Architecture Search and Uncertainty Quantification. It can interface with different backends to distribute computation such as threads, processes, Ray and MPI.