libEnsemble¶

libEnsemble is a complete Python toolkit for steering dynamic ensembles of calculations. Workflows are highly portable and detect/integrate heterogeneous resources with little effort. For instance, libEnsemble can automatically detect, assign, and reassign allocated processors and GPUs to ensemble members.

Users select or supply generator and simulator functions to express their ensembles; the generator typically steers the ensemble based on prior simulator results. Such functions can also launch and monitor external executables at any scale.

Installing libEnsemble¶

Begin by loading the python module:

module load python

Create a conda virtual environment:

conda create -n my_environment python=3

Activate your virtual environment:

export PYTHONNOUSERSITE=1
conda activate my_environment

Then either install via pip:

pip install libensemble

or via conda:

conda config --add channels conda-forge
conda install -c conda-forge libensemble

Other installation options are described in Advanced Installation.

Example¶

Runs an ensemble of simulations, where inputs are selected by random sampling.

The simulation runs a MPI/OpenMP forces application that uses one GPU for each MPI rank, and reads the output energy from a file.

First, obtain the forces.c code.

Compile to forces.x:

module load PrgEnv-nvidia cudatoolkit craype-accel-nvidia80
cc -DGPU -Wl,-znoexecstack -O3 -fopenmp -mp=gpu -target-accel=nvidia80 -o forces.x forces.c

Simulation function:

Put the following in a file called forces_simf.py. Or use the latest forces sim_f.

import numpy as np

def run_forces(H, _, sim_specs, libE_info):
    # Parse out num particles and make arguments for forces.x
    particles = str(int(H["x"][0][0]))
    args = particles + " " + str(10) + " " + particles

    # Retrieve our MPI Executor and submit application
    exctr = libE_info["executor"]
    task = exctr.submit(
        app_name="forces",
        app_args=args,
        auto_assign_gpus=True,
        match_procs_to_gpus=True,
    )
    task.wait()
    data = np.loadtxt("forces.stat")
    final_energy = data[-1]

    # Define our output array, populate with energy reading
    output = np.zeros(1, dtype=sim_specs["out"])
    output["energy"] = final_energy
    return output

Put the following in a file called run_libe_forces.py. Or find latest forces run script.

import os
import sys
import numpy as np
from pprint import pprint

from forces_simf import run_forces  # Sim func from current dir

from libensemble import Ensemble
from libensemble.alloc_funcs.start_only_persistent import only_persistent_gens as alloc_f
from libensemble.executors import MPIExecutor
from libensemble.gen_funcs.persistent_sampling import persistent_uniform as gen_f
from libensemble.specs import AllocSpecs, ExitCriteria, GenSpecs, LibeSpecs, SimSpecs

if __name__ == "__main__":
    # Initialize MPI Executor
    exctr = MPIExecutor()
    sim_app = os.path.join(os.getcwd(), "forces.x")
    exctr.register_app(full_path=sim_app, app_name="forces")

    # Parse number of workers, comms type, etc. from arguments
    ensemble = Ensemble(parse_args=True, executor=exctr)
    nsim_workers = ensemble.nworkers

    # Persistent gen does not need resources
    ensemble.libE_specs = LibeSpecs(
        gen_on_manager=True,
        sim_dirs_make=True,
    )

    ensemble.sim_specs = SimSpecs(
        sim_f=run_forces,
        inputs=["x"],
        outputs=[("energy", float)],
    )

    ensemble.gen_specs = GenSpecs(
        gen_f=gen_f,
        inputs=[],  # No input when start persistent generator
        persis_in=["sim_id"],  # Return sim_ids of evaluated points to generator
        outputs=[("x", float, (1,))],
        user={
            "initial_batch_size": nsim_workers,
            "lb": np.array([50000]),  # min particles
            "ub": np.array([100000]),  # max particles
        },
    )

    # Starts one persistent generator. Simulated values are returned in batch.
    ensemble.alloc_specs = AllocSpecs(
        alloc_f=alloc_f,
        user={
            "async_return": False,  # False causes batch returns
        },
    )

    # Instruct libEnsemble to exit after this many simulations
    ensemble.exit_criteria = ExitCriteria(sim_max=8)

    # Seed random streams for each worker, particularly for gen_f
    ensemble.add_random_streams()

    # Run ensemble
    H, persis_info, flag = ensemble.run()

    if ensemble.is_manager:
        pprint(H[["sim_id", "x", "energy"]])

Obtain a node allocation on Perlmutter (try on one or more nodes). For one node:

salloc -N 1 -t 20 -C gpu -q interactive -A <project_id>

And run with:

python run_libe_forces.py --nworkers 4

The four workers will be concurrently running a forces simulation, each using one MPI rank and one GPU. You may generate as many inputs as you wish at a time, libEnsemble will schedule the simulations inside your node allocation.

The simulation ID, input value and energy for each simulation will be printed.

To see GPU usage, ssh into the node you are on in another window and run:

watch -n 0.1 nvidia-smi

Try running on two nodes to see that each forces simulation will use 2 GPUs (as there are still 4 workers). Similarly if you run this on 8 nodes, you will see that each run of forces using 8 GPUs (across 2 nodes).

You can try the forces example, without GPUs, and learn more, including how to run using an input file, in our forces notebook.

Note that these scripts will work with no modification on most platforms, including those with AMD or Intel GPUs, such as Frontier or Aurora.

Dynamic ensembles¶

Most real cases use a dynamic generator that takes back simulation results and uses some model or algorithm to produce new simulation inputs. Find examples that use dynamic generators in the regression tests). Or try online in the APOSMM notebook.

Resources¶

The libEnsemble documentation also has a Perlmutter guide.

See this video for a demonstration workflow that coordinates GPU application runs on Perlmutter.

Find more examples in github.