Skip to content

Hyperparameter optimization

Hyperparameter optimization (HPO) is for tuning the hyperparameters of your machine learning model. E.g., the learning rate, filter sizes, etc. There are several popular algorithms used for HPO including grid search, random search, Bayesian optimization, and genetic optimization. Similarly, there are several libraries and tools implementing these algorithms, each having their own tradeoffs in usability, flexibility, and feature support.

On this page we will collect recommendations and examples for running distributed HPO tasks on our HPC systems.

Weights and Biases

W&B is a great tool for experiment logging and visualization, in addition to HPO. The W&B webpage has documentation and examples:


An easy-to-use tool if you're using Keras:


Tune is an open-source Python library for experiment execution and hyperparameter tuning at any scale. RayTune:

  • supports any ML framework
  • implements state of the art HPO strategies
  • natively integrates with optimization libraries (HyperOpt, BayesianOpt, and Facebook Ax)
  • 1ntegrates well with Slurm Handles trials micro scheduling on
  • multi-gpu-node resources (no GPU binding boilerplate needed)

We provide RayTune in all of our GPU TensorFlow and PyTorch modules and shifter image. You can also use our slurm-ray-cluster scripts for running multi-gpu nodes HPO campaigns, the repo also include a hello world MNIST example.


A new tool built by some LBNL folks which is tested on NERSC systems:

Cray HPO

Cray provides an HPO library which integrates very naturally with the Cray systems. It can use SLURM to request and use an allocation and provides genetic search, random search, grid search, and population-based training.

The official Cray HPO documentation can be found here:

You can load the latest version on Cori with:

module load cray-hpo

You can find an example Jupyter notebook for genetic search here:


DeepHyper is a Python package for distributed Hyperparameter Optimization, Neural Architecture Search and Uncertainty Quantification. It can interface with different backends to distribute computation such as threads, processes, Ray and MPI.

In case of issue contact Prasanna Balaprakash (pbalapra[at]anl[dot]gov) or directly open an issue on our Github.

A quick example of DeepHyper API:

def run(config: dict):
    return -config["x"]**2

# Necessary IF statement otherwise it will enter in a infinite loop
# when loading the 'run' function from a subprocess
if __name__ == "__main__":
    from deephyper.problem import HpProblem
    from import CBO
    from deephyper.evaluator import Evaluator

    # define the variable you want to optimize
    problem = HpProblem()
    problem.add_hyperparameter((-10.0, 10.0), "x")

    # define the evaluator to distribute the computation
    evaluator = Evaluator.create(
            "num_workers": 2,

    # define your search and execute it
    search = CBO(problem, evaluator)

    results =

which outputs a Pandas DataFrame where the best x is clearly near 0:

         p:x  job_id     objective  timestamp_submit  timestamp_gather
0  -7.744105       1 -5.997117e+01          0.011047          0.037649
1  -9.058254       2 -8.205196e+01          0.011054          0.056398
2  -1.959750       3 -3.840621e+00          0.049750          0.073166
3  -5.150553       4 -2.652819e+01          0.065681          0.089355
4  -6.697095       5 -4.485108e+01          0.082465          0.158050
..       ...     ...           ...               ...               ...
95 -0.034096      96 -1.162566e-03         26.479630         26.795639
96 -0.034204      97 -1.169901e-03         26.789255         27.155481
97 -0.037873      98 -1.434366e-03         27.148506         27.466934
98 -0.000073      99 -5.387088e-09         27.460253         27.774704
99  0.697162     100 -4.860350e-01         27.768153         28.142431