Changes expected as we migrate services from Perlmutter to Cori
As part of the migration from Cori to Perlmutter, we are preparing to offer a new GitLab runner setup for running CI jobs on Perlmutter. At the same time, we will be modifing and retiring the curently offered services on Cori.
Gitlab is a DevOps platform to allow software development teams to collaborate together by hosting code in a source repository and providing infrastructure for automated builds, integration and verification of code using Continuous Integration/Continuous Delivery (CI/CD). The Gitlab Project is open source and actively maintained by Gitlab Inc.
NERSC provides a GitLab service for users available at https://software.nersc.gov/. You will be required to type your NERSC credentials in order to access this service.
Running CI Pipelines at NERSC¶
The GitLab server provides shared runners in order to run CI jobs on NERSC resources. Currently we have the following runners:
|Tag (used in CI yaml file)||Runner Name||System||Access|
cori runner will use the system default slurm binaries
/usr/bin/sbatch to submit jobs to the cluster.
CI Job Parameters¶
We make use of Jacamar CI which is a Gitlab custom executor for CI/CD jobs on HPC systems. Jacamar provides integration with batch schedulers and downscoping of permissions to ensure jobs are run via your user account. Jacamar will direct CI jobs through the SLURM scheduler. SLURM parameters are defined via the
SCHEDULER_PARAMETERS variable which is used to request a job allocation on a compute node. This variable can be defined in your
.gitlab-ci.yml file or as a project CI/CD variable.
You are resonsible for CI jobs!
Please be careful of what you automatically run in your CI jobs as they will be run via your user account. Each GitLab job will have access to all your shared filesystems including your $HOME directory. Any sensitive information should not be stored on NERSC systems or displayed in a GitLab job. It is your responsiblity to properly use NERSC systems, including this Gitlab service. We are not responsible for any loss of data or issues with your user environment as a result of CI jobs.
The Gitlab CI configuration is declared in a special file .gitlab-ci.yml (by default) that is typically available in the root of the project. Please review the reference guide for .gitlab-ci.yml. Furthermore, we encourage you to review the Gitlab CI/CD documentation. Please ensure you review the documentation for the appropriate GitLab version. You can see the GitLab version by navigating to https://software.nersc.gov/help.
Please see Slurm example jobs for information about job submission parameters. If options are not correctly defined via
SCHEDULER_PARAMETERS your CI job will fail during slurm allocation. Here is a simple example of how one submits a job to the Cori Haswell node. The tags keyword is used to select the GitLab runner to use in this case.
tags: [cori] informs GitLab to send job to the Cori system. The keyword
after_script are sections where you can run arbitrary shell commands. The stages keyword is used to define a list of stage name to group GitLab jobs; all jobs within a stage can execute in parallel. The stage keyword is used in context of a GitLab job, in this example the name of job is cori-haswell
You can find this example in https://software.nersc.gov/ci-resources/hello-environment.
Gitlab runner will be down when the system is offline which may result in termination or failure of CI jobs
stages: - examine cori-haswell: stage: examine tags: [cori] variables: SCHEDULER_PARAMETERS: "-C haswell --qos=debug -N1 -t 00:05:00" script: - echo "Script" - bash ./environment.bash before_script: - echo "Before Script" - pwd - ls -la after_script: - echo "After Script" - whoami - hostname
Increase Job Timeout¶
By default, GitLab job will timeout after 60min and GitLab will terminate job and mark job as failure. You can increase the job timeout in project settings by navigating to
Settings > CI/CD > General Pipelines and set the Timeout value in minutes (
10m), hours (
10h) or days (
10d). The maximum timelimit is 30 days (
For more details see https://docs.gitlab.com/ee/ci/pipelines/settings.html#set-a-limit-for-how-long-jobs-can-run
In order to use our GitLab server, you will need to create a Personal Access Token to perform any action since we have disabled SSH authentication when cloning repo. To create an access token navigate to https://software.nersc.gov/-/profile/personal_access_tokens and create a token name with appropriate scope. We recommend you enable scope
write_repository to read and write to repository, if you plan to use the GitLab API you may enable scope
api. Once you create a token, you will see a randomly generated token, please save this token, if you are using Mac you can use Keychain Access to store your password.
|Introduction to CI at NERSC||July 7th, 2021||Slides |
- CI Tutorial: https://software.nersc.gov/ci-resources