Hyperparameter optimization¶
Hyperparameter optimization (HPO) is for tuning the hyperparameters of your machine learning model, e.g., the learning rate, filter sizes, etc. There are several popular algorithms used for HPO including grid search, random search, Bayesian optimization, and genetic optimization. Similarly, there are several libraries and tools implementing these algorithms, each having their own tradeoffs in usability, flexibility, and feature support.
On this page we will collect recommendations and examples for running distributed HPO tasks on our HPC systems.
Weights and Biases¶
W&B is a great tool for experiment logging and visualization, in addition to HPO. The W&B webpage has documentation and examples: https://wandb.ai/
Additionally, we provide a PyTorch codebase that can serve as a template for logging and HPO with W&B for your deep learning applications (including multi-GPU distributed data parallel applications). See the template here: W&B template for NERSC
KerasTuner¶
An easy-to-use tool if you're using Keras: https://keras.io/keras_tuner/
RayTune¶
Tune is an open-source Python library for experiment execution and hyperparameter tuning at any scale. RayTune:
- supports any ML framework
- implements state of the art HPO strategies
- natively integrates with optimization libraries (HyperOpt, BayesianOpt, and Facebook Ax)
- integrates well with Slurm
- handles trial micro scheduling on multi-GPU-node resources (no GPU binding boilerplate needed)
We provide RayTune in all of our GPU TensorFlow and PyTorch modules and Shifter images. You can also use our slurm-ray-cluster scripts for running multi-GPU node HPO campaigns, and the repo includes a "hello world" MNIST example.
HYPPO¶
A new tool built by some LBNL folks which is tested on NERSC systems: https://hpo-uq.gitlab.io/
DeepHyper¶
DeepHyper is a Python package for distributed Hyperparameter Optimization, Neural Architecture Search and Uncertainty Quantification. It can interface with different backends to distribute computation such as threads, processes, Ray and MPI.
In case of issue contact Prasanna Balaprakash (pbalapra[at]anl[dot]gov) or directly open an issue on the Github repository.
A quick example of DeepHyper API:
def run(config: dict):
return -config["x"]**2
# Necessary IF statement otherwise it will enter in a infinite loop
# when loading the 'run' function from a subprocess
if __name__ == "__main__":
from deephyper.problem import HpProblem
from deephyper.search.hps import CBO
from deephyper.evaluator import Evaluator
# define the variable you want to optimize
problem = HpProblem()
problem.add_hyperparameter((-10.0, 10.0), "x")
# define the evaluator to distribute the computation
evaluator = Evaluator.create(
run,
method="process",
method_kwargs={
"num_workers": 2,
},
)
# define your search and execute it
search = CBO(problem, evaluator)
results = search.search(max_evals=100)
which outputs a Pandas DataFrame where the best x
is clearly near 0
:
p:x job_id objective timestamp_submit timestamp_gather
0 -7.744105 1 -5.997117e+01 0.011047 0.037649
1 -9.058254 2 -8.205196e+01 0.011054 0.056398
2 -1.959750 3 -3.840621e+00 0.049750 0.073166
3 -5.150553 4 -2.652819e+01 0.065681 0.089355
4 -6.697095 5 -4.485108e+01 0.082465 0.158050
.. ... ... ... ... ...
95 -0.034096 96 -1.162566e-03 26.479630 26.795639
96 -0.034204 97 -1.169901e-03 26.789255 27.155481
97 -0.037873 98 -1.434366e-03 27.148506 27.466934
98 -0.000073 99 -5.387088e-09 27.460253 27.774704
99 0.697162 100 -4.860350e-01 27.768153 28.142431