Machine Learning at NERSC¶

NERSC supports a variety of software for Machine Learning and Deep Learning on our systems.

These docs include details about how to use our system optimized frameworks, deploying large language models, multi-node training libraries, and performance guidelines.

Classical Machine Learning¶

Libraries like scikit-learn and other non-deep-learning libraries are supported through our standard installations and environments for Python and R, more under Analytics.

Deep Learning Frameworks¶

We have prioritized support for the following Deep Learning frameworks:

Deploying with Jupyter¶

Users can deploy distributed deep learning workloads from Jupyter notebooks using parallel execution libraries such as IPyParallel. Jupyter notebooks can be used to submit workloads to the batch system and also provide powerful interactive capabilities for monitoring and controlling those workloads.

Deploying Large Language Models on NERSC¶

The NERSC Chatbot Deployment package enables users to deploy Hugging Face large language models (LLMs) on NERSC supercomputers using Slurm and the vLLM serving framework. It supports both command-line and Python library interfaces, with utilities for seamless Gradio integration on NERSC JupyterHub. The deployed models expose an OpenAI-compatible API endpoint, facilitating easy integration with existing tools while enforcing secure API key access. For detailed usage and best practices, see the documentation.