FAQ and Troubleshooting¶

The shifter --help command can be very useful.

Multi-arch builds¶

Users who build on non-x86 hardware may see an error like this:

shifter: /bin/bash: Exec format error

To fix this, users can consider trying a multi-arch build. Here is an example::

docker buildx create --use
docker buildx build --platform linux/amd64,linux/arm64/v8 --push -t elvis/image:latest .

This example both builds the cross-platform image and pushes it to the registry. To verify that the build did work as intended, the user can check the image metadata in the registry (for example, dockerhub) to see if the image architecture is correct.

Using Intel compilers¶

Intel previously used to require access to a license server to use Intel compilers. Fortunately for container users, this is no longer the case. Users who wish to use the Intel compilers can pull and build using one of theirIntel oneAPI HPC Toolkit images. These images come with compilers like ifort. Users can use these as a base image and compile their own application on top.

Failed to lookup Image¶

Warning

If you are trying to start many tasks at the same time with Shifter, this can create congestion on the image gateway.

If all the processes will use the same image, then you can avoid this by specifying the image in the batch submit script instead of on the command-line.

For example:

#SBATCH --image=myimage:latest

shifter /path/to/app arg1 arg2

Using this format, the image will be looked up at submission time and cached as part of the job.

If your jobs needs to use multiple images during execution then the approach above will not be sufficient. A workaround is to specify the image by its ID which will avoid the lookup. Just specify the image as id: followed by the id number which can be obtained with shifterimg lookup. The image lookup should be done in advance to avoid the lookup occurring during the job.

# Done in advance...
user:~> shifterimg lookup centos:8
76d24f3ba3317fa945743bb3746fbaf3a0b752f10b10376960de01da70685fbd
# In the job...
shifter --image=id:76d24f3ba3317fa945743bb3746fbaf3a0b752f10b10376960de01da70685fbd /bin/hostname

Invalid Volume Map¶

Sometimes volume mounting a directory will fail with invalid volume map or with this error:

ERROR: unclean exit from bind-mount routine. /var/udiMount/tmp may still be mounted.
BIND MOUNT FAILED from /var/udiMount/<full path to directory> to /var/udiMount/tmp
FAILED to setup user-requested mounts.
FAILED to setup image.

This can happen for different reasons but a common case has to do with the permissions of the directory being mounted. Let's take an example

shifter --volume /global/cfs/cdirs/myproj/a/b --image=myimage bash

In order for Shifter to allow the mount, it needs to be able to see up to the last path as user nobody. The easiest way to fix this is to use setfacl to allow limited access to the directory. This needs to be done for the full path up to the final directory. For example:

setfacl -m u:nobody:x /global/cfs/cdirs/myproj/
setfacl -m u:nobody:x /global/cfs/cdirs/myproj/a

Note that only the owner of a directory can change the access controls, so you may need the project owner to fix some path elements.

GLIBC_2.25 not found¶

This error will typically contain the following line but other variations may appear.

/lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.25' not found (required by /opt/udiImage/modules/mpich/mpich-7.7.19/lib64/dep/libexpat.so.1)

By default Shifter automatically injects libraries to support running MPI and GPU support (where applicable). This can sometimes conflict with the contents of the image if the image uses an older version of GLIBC. If the application doesn't require MPI support you can try adding the flag --module none to disable the injection.

elvis@nid00042:~> shifter --image=elvis/test:123 --module=none /bin/bash

If your application requires MPI support, you may need to rebuild your image on top of a newer OS.

Issues with MPI No space left on device¶

This can show up as an error like the following.

create_endpoint(1361).......: OFI EP enable failed (ofi_init.c:1361:create_endpoint:No space left on device)

Try adding a --network=no_nvi to the srun command.

srun --network=no_vni shifter myapp

Loading the Cray PrgEnv programming environment into Shifter¶

Launching executables which depend on shared libraries that live on parallel file systems can be problematic when running at scale, e.g., at high MPI concurrency. Attempts to srun these kinds of applications often encounter Slurm timeouts, since each task in the job is attempting to dlopen() the same set of ~20 shared libraries on the parallel file system, which can take a very long time. These timeouts often manifest in job logs as error messages like the following:

Tue Apr 21 20:02:50 2020: [PE_517381]:_pmi_mmap_tmp: Warning bootstrap barrier failed: num_syncd=58, pes_this_node=64, timeout=180 secs

Shifter solves this scalability problem by storing shared libraries in memory on each compute node, but has the disadvantage of isolating the user from the full Cray programming environment on which the user may depend for compilation and linking of applications. E.g., the application may depend on a shared library provided by a NERSC module, which is located in /usr/common/software, a parallel file system, rather than in /opt or /usr, which are encoded directly into the compute node's boot image.

This example works around this limitation, by combining the scalability of Shifter with the existing Cray programming environment that many users rely on for compilation and linking.

First, one must gather together all of the shared libraries on which a dynamically linked executable depends. This can be accomplished with the lddtree utility, which recurses through the tree of shared libraries linked dynamically to an executable. After one has compiled lddtree and put it in $PATH, one can run the following script, which finds all shared libraries that the application is linked to, copies them into a temporary dir, and then creates a tarball from that temporary dir.

#!/bin/bash

# Script to gather shared libs for a parallel code compiled on a Cray XC
# system. The libs and exe can be put into a Shifter image. Depends on the
# `lddtree` utility provided by Gentoo 'pax-utils':
# https://gitweb.gentoo.org/proj/pax-utils.git/tree/lddtree.sh

if [ -z "$1" ]; then
  printf "%s\n" "Usage: "$0" <file>"
  exit 1
elif [ "$#" -ne 1 ]; then
  printf "%s\n" "ERROR: this script takes exactly 1 argument"
  exit 1
fi

if [ ! $(command -v lddtree.sh) ]; then
  printf "%s\n" "ERROR: lddtree.sh not in \$PATH"
  exit 1
fi

exe="$1"

if [ ! -f $exe ]; then
  printf "%s\n" "ERROR: file ${exe} does not exist"
  exit 1
fi

# Ensure file has dynamic symbols before we proceed. If the file is a static
# executable, we don't need this script anyway; the user can simply sbcast the
# executable to /tmp on each compute node.
dyn_sym=$(objdump -T ${exe})
if [ $? -ne 0 ]; then
  printf "%s\n" "ERROR: file has no dynamic symbols"
  exit 1
fi

target_dir=$(mktemp -d -p ${PWD} $(basename $exe)-tmpdir.XXXXXXXXXX)
tar_file="$(basename $exe).tar.bz2"
printf "%s\n" "Putting libs into this dir: ${target_dir}"

# First copy all of the shared libs which are dynamically linked to the exe.

lddtree.sh ${exe} | while read f; do
  file_full=$(echo $f | grep -Po "=> (.*)" | cut -d" " -f2)
  cp ${file_full} ${target_dir}
done

# Then find the network libs that Cray manually dlopen()s. (Just running
# lddtree on the compiled executable won't find these.) Shifter has to do this
# step manually too: see
# https://github.com/NERSC/shifter/blob/master/extra/prep_cray_mpi_libs.py#L282.
# Also copy all of their dependent shared libs.

for f in $(find /opt \
  -name '*wlm_detect*\.so' -o \
  -name '*rca*\.so' -o \
  -name '*alps*\.so' 2>/dev/null); do
  lddtree.sh $f | while read f; do
    file_full=$(echo $f | grep -Po "=> (.*)" | cut -d" " -f2)
    cp ${file_full} ${target_dir}
  done
done

tar cjf ${tar_file} -C ${target_dir} .
mv ${tar_file} ${target_dir}
printf "%s\n" "Combined shared libraries into this tar file: ${target_dir}/${tar_file}"

cat << EOF > ${target_dir}/Dockerfile
FROM opensuse/leap:15.2
ADD ${tar_file} /my_dynamic_exe
ENV PATH="/my_dynamic_exe:\${PATH}"
ENV LD_LIBRARY_PATH="/my_dynamic_exe:\${LD_LIBRARY_PATH}
EOF
printf "%s\n\n" "Created this Dockerfile: ${target_dir}/Dockerfile"

printf "%s\n" "Now copy the following files to your workstation:"
printf "%s\n" ${target_dir}/${tar_file}
printf "%s\n" ${target_dir}/Dockerfile
printf "\n"
printf "%s\n" "Then create a Docker image and push it to the NERSC Shifter registry."
printf "%s\n" "Instructions for doing this are provided here:"
printf "%s\n" "https://docs.nersc.gov/languages/shifter/how-to-use/#using-nerscs-private-registry"
printf "%s\n" "After your image is in the NERSC Shifter registry, you can execute your code"
printf "%s\n" "using a script like the following:"
printf "\n"
printf "%s\n" "#!/bin/bash"
printf "%s\n" "#SBATCH -C <arch>"
printf "%s\n" "#SBATCH -N <num_nodes>"
printf "%s\n" "#SBATCH -t <time>"
printf "%s\n" "#SBATCH -q <QOS>"
printf "%s\n" "#SBATCH -J <job_name>"
printf "%s\n" "#SBATCH -o <job_log_file>"
printf "%s\n" "#SBATCH --image=registry.services.nersc.gov/${USER}/<image_name>:<version>"
printf "\n"
printf "%s\n" "srun <args> shifter $(basename ${exe}) <inputs>"

One can test this script on a simple MPI example:

program main
  use mpi
  implicit none

  integer :: ierr, world_size, world_rank

  ! Initialize the MPI environment
  call MPI_Init(ierr)

  ! Get the number of processes
  call MPI_Comm_size(MPI_COMM_WORLD, world_size, ierr)

  ! Get the rank of the process
  call MPI_Comm_rank(MPI_COMM_WORLD, world_rank, ierr)

  ! Print off a hello world message
  print *, "Hello world from rank ", world_rank, " out of ", world_size, " processors"

  ! Finalize the MPI environment.
  call MPI_Finalize(ierr)

end program main

Then one can run the script shown above to gather the relevant shared libraries into a new directory in $PWD, and create a tarball from those shared libraries:

user@login03:~> ftn -o mpi-hello-world.ex main.f90
user@login03:~> ./shifterize.sh mpi-hello-world.ex
Putting libs into this dir: /global/homes/u/user/mpi-hello-world.ex-tmpdir.GbITsvCZZX
Combined shared libraries into this tar file: /global/homes/u/user/mpi-hello-world.ex-tmpdir.GbITsvCZZX/mpi-hello-world.ex.tar.bz2
Created this Dockerfile: /global/homes/u/user/mpi-hello-world.ex-tmpdir.GbITsvCZZX/Dockerfile

Now copy the following files to your workstation:
/global/homes/u/user/mpi-hello-world.ex-tmpdir.GbITsvCZZX/mpi-hello-world.ex.tar.bz2
/global/homes/u/user/mpi-hello-world.ex-tmpdir.GbITsvCZZX/Dockerfile

Then create a Docker image and push it to the NERSC Shifter registry.
Instructions for doing this are provided here:
https://docs.nersc.gov/languages/shifter/how-to-use/#using-nerscs-private-registry
After your image is in the NERSC Shifter registry, you can execute your code
using a script like the following:

#!/bin/bash
#SBATCH -C <arch>
#SBATCH -N <num_nodes>
#SBATCH -t <time>
#SBATCH -q <QOS>
#SBATCH -J <job_name>
#SBATCH -o <job_log_file>
#SBATCH --image=registry.services.nersc.gov/user/<image_name>:<version>

srun <args> shifter mpi-hello-world.ex <inputs>
user@login03:~>

where the new file my_dynamic_exe.tar.bz2 contains the contents of the dynamic executable and all of its dependent shared libraries:

user@login03:~> tar tjf my_dynamic_exe.tar.bz2|head
./
./libalps.so
./libjob.so.0
./libmunge.so.2
./libudreg.so.0
./libintlc.so.5
./libnodeservices.so.0
./libugni.so.0
./libalpsutil.so
./libalpslli.so

and the newly generated Dockerfile is already configured to build a working Shifter image:

FROM opensuse/leap:15.2
ADD mpi-hello-world.ex.tar.bz2 /my_dynamic_exe
ENV PATH="/my_dynamic_exe:${PATH}"
ENV LD_LIBRARY_PATH="/my_dynamic_exe:${LD_LIBRARY_PATH}

Development in Shifter using VSCode¶

Here's how to do remote development at NERSC, inside of Shifter containers, using Visual Studio Code on your local machine. This will make the remote VS-Code server instance at NERSC run inside the the container instance, so the container's file system will be fully visible to VS-Code.

VS code doesn't natively support non-Docker-like container runtimes yet, the workaround described below works well in practice though. The procedure is a bit involved, but the result is worth the effort.

Requirements¶

You'll need VS-Code >= v1.64 (older versions don't support the SSH RemoteCommand setting)

Step 1¶

At NERSC, create a script $HOME/.local/bin/run-shifter that looks like this

#!/bin/sh
export XDG_RUNTIME_DIR="${TMPDIR:-/tmp}/`whoami`/run"
exec shifter --image="$1"

This is necessary since VS-Code tries to access the $XDG_RUNTIME_DIR. At NERSC, $XDG_RUNTIME_DIR points to /run/user/YOUR_UID by default, which is not accessible from within Shifter container instances, so we need to override the default location.

Step 2¶

In your "$HOME/.ssh/config" on your local system, add something like

Host someimage~*
  RemoteCommand ~/.local/bin/run-shifter someorg/someimage:latest
  RequestTTY yes

Host otherimage~*
  RemoteCommand ~/.local/bin/run-shifter someorg/someimage:latest
  RequestTTY yes

Host perlmutter*.nersc.gov
  IdentityFile ~/.ssh/nersc

Host perlmutter someimage~perlmutter otherimage~perlmutter
  HostName perlmutter.nersc.gov
  User YOUR_NERSC_USERNAME

Test whether this works by running ssh someimage~perlmutter on your local system. This should drop you into an SSH session running inside of an instance of the someorg/someimage:latest container image at NERSC.

Step 3¶

In your VS-Code settings on your local system, set

"remote.SSH.enableRemoteCommand": true

Step 4¶

Since VS-Code reuses remote server instances, the above is not sufficient to run multiple container images on the same NERSC host at the same time. To get separate (per container image) VS-Code server instances on the same host, add something like this to your VS-Code settings on your local system:

"remote.SSH.serverInstallPath": {
  "someimage~perlmutter": "~/.vscode-container/someimage",
  "otherimage~perlmutter": "~/.vscode-container/otherimage"
}

Step 5¶

Connect to NERSC from with VS-Code running your local system:

F1 > "Connect to Host" > "someimage~perlmutter” should now start a remote VS-Code session with the VS-Code server component running inside a Shifter container instance at NERSC. The same for "otherimage~perlmutter".

Tips and tricks¶

If things don't work, try "Kill server on remote" from VS-Code and reconnect.

You can also try starting over from scratch with brute force: Close the VS-Code remote connection. Then, from an external terminal, kill the remote VS-Code server instance (and everything else):

ssh perlmutter
pkill -9 node

(This will kill all Node.jl processes you own on the remote host.)

Remove the ~/.vscode-server directory in your NERSC home directory.

Further troubleshooting¶

If you have a Shifter question or problem, please open a ticket at help.nersc.gov.