libEnsemble¶
libEnsemble is a complete Python toolkit for steering dynamic ensembles of calculations. Workflows are highly portable and detect/integrate heterogeneous resources with little effort. For instance, libEnsemble can automatically detect, assign, and reassign allocated processors and GPUs to ensemble members.
Users select or supply generator and simulator functions to express their ensembles; the generator typically steers the ensemble based on prior simulator results. Such functions can also launch and monitor external executables at any scale.
Installing libEnsemble¶
Begin by loading the python
module:
module load python
Create a conda virtual environment:
conda create -n my_environment python=3
Activate your virtual environment:
export PYTHONNOUSERSITE=1
conda activate my_environment
Then either install via pip
:
pip install libensemble
or via conda
:
conda config --add channels conda-forge
conda install -c conda-forge libensemble
Other installation options are described in Advanced Installation.
Example¶
Runs an ensemble of simulations, where inputs are selected by random sampling.
The simulation runs a MPI/OpenMP forces application that uses one GPU for each MPI rank, and reads the output energy from a file.
First, obtain the forces.c code.
Compile to forces.x:
module load PrgEnv-nvidia cudatoolkit craype-accel-nvidia80
cc -DGPU -Wl,-znoexecstack -O3 -fopenmp -mp=gpu -target-accel=nvidia80 -o forces.x forces.c
Simulation function:
Put the following in a file called forces_simf.py
. Or use the latest forces sim_f.
import numpy as np
def run_forces(H, _, sim_specs, libE_info):
# Parse out num particles and make arguments for forces.x
particles = str(int(H["x"][0][0]))
args = particles + " " + str(10) + " " + particles
# Retrieve our MPI Executor and submit application
exctr = libE_info["executor"]
task = exctr.submit(
app_name="forces",
app_args=args,
auto_assign_gpus=True,
match_procs_to_gpus=True,
)
task.wait()
data = np.loadtxt("forces.stat")
final_energy = data[-1]
# Define our output array, populate with energy reading
output = np.zeros(1, dtype=sim_specs["out"])
output["energy"] = final_energy
return output
Put the following in a file called run_libe_forces.py
. Or find latest forces run script.
import os
import sys
import numpy as np
from pprint import pprint
from forces_simf import run_forces # Sim func from current dir
from libensemble import Ensemble
from libensemble.alloc_funcs.start_only_persistent import only_persistent_gens as alloc_f
from libensemble.executors import MPIExecutor
from libensemble.gen_funcs.persistent_sampling import persistent_uniform as gen_f
from libensemble.specs import AllocSpecs, ExitCriteria, GenSpecs, LibeSpecs, SimSpecs
if __name__ == "__main__":
# Initialize MPI Executor
exctr = MPIExecutor()
sim_app = os.path.join(os.getcwd(), "forces.x")
exctr.register_app(full_path=sim_app, app_name="forces")
# Parse number of workers, comms type, etc. from arguments
ensemble = Ensemble(parse_args=True, executor=exctr)
nsim_workers = ensemble.nworkers
# Persistent gen does not need resources
ensemble.libE_specs = LibeSpecs(
gen_on_manager=True,
sim_dirs_make=True,
)
ensemble.sim_specs = SimSpecs(
sim_f=run_forces,
inputs=["x"],
outputs=[("energy", float)],
)
ensemble.gen_specs = GenSpecs(
gen_f=gen_f,
inputs=[], # No input when start persistent generator
persis_in=["sim_id"], # Return sim_ids of evaluated points to generator
outputs=[("x", float, (1,))],
user={
"initial_batch_size": nsim_workers,
"lb": np.array([50000]), # min particles
"ub": np.array([100000]), # max particles
},
)
# Starts one persistent generator. Simulated values are returned in batch.
ensemble.alloc_specs = AllocSpecs(
alloc_f=alloc_f,
user={
"async_return": False, # False causes batch returns
},
)
# Instruct libEnsemble to exit after this many simulations
ensemble.exit_criteria = ExitCriteria(sim_max=8)
# Seed random streams for each worker, particularly for gen_f
ensemble.add_random_streams()
# Run ensemble
H, persis_info, flag = ensemble.run()
if ensemble.is_manager:
pprint(H[["sim_id", "x", "energy"]])
Obtain a node allocation on Perlmutter (try on one or more nodes). For one node:
salloc -N 1 -t 20 -C gpu -q interactive -A <project_id>
And run with:
python run_libe_forces.py --nworkers 4
The four workers will be concurrently running a forces simulation, each using one MPI rank and one GPU. You may generate as many inputs as you wish at a time, libEnsemble will schedule the simulations inside your node allocation.
The simulation ID, input value and energy for each simulation will be printed.
To see GPU usage, ssh into the node you are on in another window and run:
watch -n 0.1 nvidia-smi
Try running on two nodes to see that each forces simulation will use 2 GPUs (as there are still 4 workers). Similarly if you run this on 8 nodes, you will see that each run of forces using 8 GPUs (across 2 nodes).
You can try the forces example, without GPUs, and learn more, including how to run using an input file, in our forces notebook.
Note that these scripts will work with no modification on most platforms, including those with AMD or Intel GPUs, such as Frontier or Aurora.
Dynamic ensembles¶
Most real cases use a dynamic generator that takes back simulation results and uses some model or algorithm to produce new simulation inputs. Find examples that use dynamic generators in the regression tests). Or try online in the APOSMM notebook.
Resources¶
The libEnsemble documentation also has a Perlmutter guide.
See this video for a demonstration workflow that coordinates GPU application runs on Perlmutter.
Find more examples in github.