Skip to content

FAQ and Troubleshooting

The shifter --help command can be very useful.

Multi-arch builds

Users who build on non-x86 hardware may see an error like this:

shifter: /bin/bash: Exec format error

To fix this, users can consider trying a multi-arch build. Here is an example::

docker buildx create --use
docker buildx build --platform linux/amd64,linux/arm64/v8 --push -t elvis/image:latest .

This example both builds the cross-platform image and pushes it to the registry. To verify that the build did work as intended, the user can check the image metadata in the registry (for example, dockerhub) to see if the image architecture is correct.

Using Intel compilers

Intel previously used to require access to a license server to use Intel compilers. Fortunately for container users, this is no longer the case. Users who wish to use the Intel compilers can pull and build using one of theirIntel oneAPI HPC Toolkit images. These images come with compilers like ifort. Users can use these as a base image and compile their own application on top.

Failed to lookup Image

Warning

If you are trying to start many tasks at the same time with Shifter, this can create congestion on the image gateway.

If all the processes will use the same image, then you can avoid this by specifying the image in the batch submit script instead of on the command-line.

For example:

#SBATCH --image=myimage:latest

shifter /path/to/app arg1 arg2

Using this format, the image will be looked up at submission time and cached as part of the job.

If your jobs needs to use multiple images during execution then the approach above will not be sufficient. A workaround is to specify the image by its ID which will avoid the lookup. Just specify the image as id: followed by the id number which can be obtained with shifterimg lookup. The image lookup should be done in advance to avoid the lookup occurring during the job.

# Done in advance...
user:~> shifterimg lookup centos:8
76d24f3ba3317fa945743bb3746fbaf3a0b752f10b10376960de01da70685fbd
# In the job...
shifter --image=id:76d24f3ba3317fa945743bb3746fbaf3a0b752f10b10376960de01da70685fbd /bin/hostname

Invalid Volume Map

Sometimes volume mounting a directory will fail with invalid volume map or with this error:

ERROR: unclean exit from bind-mount routine. /var/udiMount/tmp may still be mounted.
BIND MOUNT FAILED from /var/udiMount/<full path to directory> to /var/udiMount/tmp
FAILED to setup user-requested mounts.
FAILED to setup image.

This can happen for different reasons but a common case has to do with the permissions of the directory being mounted. Let's take an example

shifter --volume /global/cfs/cdirs/myproj/a/b --image=myimage bash

In order for Shifter to allow the mount, it needs to be able to see up to the last path as user nobody. The easiest way to fix this is to use setfacl to allow limited access to the directory. This needs to be done for the full path up to the final directory. For example:

setfacl -m u:nobody:x /global/cfs/cdirs/myproj/
setfacl -m u:nobody:x /global/cfs/cdirs/myproj/a

Note that only the owner of a directory can change the access controls, so you may need the project owner to fix some path elements.

Mounting directories from $HOME

Users may encounter an error when they try to mount directories in their $HOME directory. The fix is to grant execute permissions to user other using the chmod command, as we demonstrate here:

elvis@perlmutter:login01:~> shifter --image=ubuntu:latest --volume=$HOME/test:/tmp cat /tmp/hello
ERROR: unclean exit from bind-mount routine. /var/udiMount/tmp may still be mounted.
BIND MOUNT FAILED from /var/udiMount//global/homes/s/stephey/test to /var/udiMount/tmp
FAILED to setup user-requested mounts.
FAILED to setup image.
elvis@perlmutter:login01:~> chmod o+x $HOME
elvis@perlmutter:login01:~> shifter --image=ubuntu:latest --volume=$HOME/test:/tmp cat /tmp/hello
hello
elvis@perlmutter:login01:~> 

GLIBC_2.25 not found

This error will typically contain the following line but other variations may appear.

/lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.25' not found (required by /opt/udiImage/modules/mpich/mpich-7.7.19/lib64/dep/libexpat.so.1)

By default Shifter automatically injects libraries to support running MPI and GPU support (where applicable). This can sometimes conflict with the contents of the image if the image uses an older version of GLIBC. If the application doesn't require MPI support you can try adding the flag --module none to disable the injection.

elvis@nid00042:~> shifter --image=elvis/test:123 --module=none /bin/bash

If your application requires MPI support, you may need to rebuild your image on top of a newer OS.

Issues with MPI No space left on device

This can show up as an error like the following.

create_endpoint(1361).......: OFI EP enable failed (ofi_init.c:1361:create_endpoint:No space left on device)

Try adding a --network=no_nvi to the srun command.

srun --network=no_vni shifter myapp

Further troubleshooting

If you have a Shifter question or problem, please open a ticket at help.nersc.gov.