Containers at CSCS¶
JFrog¶
Pull a container from JFrog locally¶
- Enable the VPN (if not on the CSCS network)
- Log in to JFrog
The password is the JFrog API key, which can be generated in the JFrog web interface:
User Menu > Edit Profile > API Key
- Pull the container
MPI Containers¶
MPICH¶
The following containers were created by Andreas Fink.
Eiger (zen2)¶
FROM docker.io/ubuntu:24.04
ARG libfabric_version=1.22.0
ARG mpi_version=4.3.1
ARG osu_version=7.5.1
RUN apt-get update \
&& DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends build-essential ca-certificates automake autoconf libtool make gdb strace wget python3 git gfortran \
&& rm -rf /var/lib/apt/lists/*
RUN git clone https://github.com/hpc/xpmem \
&& cd xpmem/lib \
&& gcc -I../include -shared -o libxpmem.so.1 libxpmem.c \
&& ln -s libxpmem.so.1 libxpmem.so \
&& mv libxpmem.so* /usr/lib64 \
&& cp ../include/xpmem.h /usr/include/ \
&& ldconfig \
&& cd ../../ \
&& rm -Rf xpmem
RUN wget -q https://github.com/ofiwg/libfabric/archive/v${libfabric_version}.tar.gz \
&& tar xf v${libfabric_version}.tar.gz \
&& cd libfabric-${libfabric_version} \
&& ./autogen.sh \
&& ./configure --prefix=/usr \
&& make -j$(nproc) \
&& make install \
&& ldconfig \
&& cd .. \
&& rm -rf v${libfabric_version}.tar.gz libfabric-${libfabric_version}
RUN wget -q https://www.mpich.org/static/downloads/${mpi_version}/mpich-${mpi_version}.tar.gz \
&& tar xf mpich-${mpi_version}.tar.gz \
&& cd mpich-${mpi_version} \
&& ./autogen.sh \
&& ./configure --prefix=/usr --enable-fast=O3,ndebug --enable-fortran --enable-cxx --with-device=ch4:ofi --with-libfabric=/usr --with-xpmem=/usr \
&& make -j$(nproc) \
&& make install \
&& ldconfig \
&& cd .. \
&& rm -rf mpich-${mpi_version}.tar.gz mpich-${mpi_version}
RUN wget -q http://mvapich.cse.ohio-state.edu/download/mvapich/osu-micro-benchmarks-v${osu_version}.tar.gz \
&& tar xf osu-micro-benchmarks-v${osu_version}.tar.gz \
&& cd osu-micro-benchmarks-v${osu_version} \
&& ./configure --prefix=/usr/local CC=$(which mpicc) CFLAGS=-O3 \
&& make -j$(nproc) \
&& make install \
&& cd .. \
&& rm -rf osu-micro-benchmarks-v${osu_version} osu-micro-benchmarks-v${osu_version}.tar.gz
The container engine will inject xpmem
and libfabric
into the container at runtime.
Building and running the container
Build the container:
- Import the container with enroot for running with the container engine.
Prepare the TOML file for the container engine:
image = '<IMAGE>'
mounts = ["/capstor/scratch/cscs/"]
workdir = '<WORKDIR>'
writable = true
entrypoint = true
annotations.com.hooks.cxi.enabled = 'true' # (1)!
Run the [OSU micro-benchmarks]:
Daint (GH200)¶
FROM docker.io/nvidia/cuda:12.8.1-devel-ubuntu24.04 # (1)!
ARG libfabric_version=1.22.0
ARG mpi_version=4.3.1
ARG osu_version=7.5.1
RUN apt-get update \
&& DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends build-essential ca-certificates automake autoconf libtool make gdb strace wget python3 git gfortran \
&& rm -rf /var/lib/apt/lists/*
RUN echo '/usr/local/cuda/lib64/stubs' > /etc/ld.so.conf.d/cuda_stubs.conf && ldconfig # (2)!
RUN git clone https://github.com/hpc/xpmem \
&& cd xpmem/lib \
&& gcc -I../include -shared -o libxpmem.so.1 libxpmem.c \
&& ln -s libxpmem.so.1 libxpmem.so \
&& mv libxpmem.so* /usr/lib \
&& cp ../include/xpmem.h /usr/include/ \
&& ldconfig \
&& cd ../../ \
&& rm -Rf xpmem
RUN wget -q https://github.com/ofiwg/libfabric/archive/v${libfabric_version}.tar.gz \
&& tar xf v${libfabric_version}.tar.gz \
&& cd libfabric-${libfabric_version} \
&& ./autogen.sh \
&& ./configure --prefix=/usr --with-cuda=/usr/local/cuda \ # (3)!
&& make -j$(nproc) \
&& make install \
&& ldconfig \
&& cd .. \
&& rm -rf v${libfabric_version}.tar.gz libfabric-${libfabric_version}
RUN wget -q https://www.mpich.org/static/downloads/${mpi_version}/mpich-${mpi_version}.tar.gz \
&& tar xf mpich-${mpi_version}.tar.gz \
&& cd mpich-${mpi_version} \
&& ./autogen.sh \
&& ./configure --prefix=/usr --enable-fast=O3,ndebug --enable-fortran --enable-cxx --with-device=ch4:ofi --with-libfabric=/usr --with-xpmem=/usr --with-cuda=/usr/local/cuda \ # (4)!
&& make -j$(nproc) \
&& make install \
&& ldconfig \
&& cd .. \
&& rm -rf mpich-${mpi_version}.tar.gz mpich-${mpi_version}
RUN wget -q http://mvapich.cse.ohio-state.edu/download/mvapich/osu-micro-benchmarks-v${osu_version}.tar.gz \
&& tar xf osu-micro-benchmarks-v${osu_version}.tar.gz \
&& cd osu-micro-benchmarks-v${osu_version} \
&& ./configure --prefix=/usr/local --with-cuda=/usr/local/cuda CC=$(which mpicc) CFLAGS=-O3 \
&& make -j$(nproc) \
&& make install \
&& cd .. \
&& rm -rf osu-micro-benchmarks-v${osu_version} osu-micro-benchmarks-v${osu_version}.tar.gz
RUN rm /etc/ld.so.conf.d/cuda_stubs.conf && ldconfig
- Use the NVIDIA CUDA container as base image.
- Add
/usr/local/cuda/lib64/stubs
as default linking directory during the build process. This is required because at build time no CUDA driver/GPU is available. This path is removed at the end of the build process. - Build
libfabric
with CUDA support. - Build
MPICH
with CUDA support.
Building and running the container
Build the container:
- Import the container with enroot for running with the container engine.
Prepare the TOML file for the container engine:
image = '<IMAGE>'
mounts = ["/capstor/scratch/cscs/"]
workdir = '<WORKDIR>'
writable = true
entrypoint = true
annotations.com.hooks.cxi.enabled = 'true' # (1)!
Run the [OSU micro-benchmarks] device-to-device (D D
) bandwidth test:
srun --mpi=pmi2 -p debug -N2 -n2 --environment=$PWD/osu.toml \
env MPIR_CVAR_CH4_OFI_ENABLE_HMEM=1 \ # (1)!
/usr/local/libexec/osu-micro-benchmarks/mpi/pt2pt/osu_bw D D
- Enables GPU direct RDMA support in the provider.
Equivalent to Cray-MPICH
MPICH_GPU_SUPPORT_ENABLED=1
.
Run the [OSU micro-benchmarks] host-to-host (H H
) bandwidth test: