GPU Tasks

How to request a GPU for your job

Whilst GPU tasks can simply be submitted to the short.gq or long.gq queues fsl_sub also provides helper options which can automatically select a GPU queue and select the appropriate CUDA toolkit for you.

-c|--coprocessor <coprocessor name>: This selects the coprocessor with the given name (see fsl_sub --help for details of available coprocessors)
--coprocessor_multi <number>: This allows you to request multiple GPUs. On the FMRIB cluster you can select no more than two GPUs. You will automatically be given a two-slot openmp parallel environment
--coprocessor_class <class>: This would allow you to select which GPU hardware model you require, see fsl_sub --help for details
--coprocessor_toolkit <toolkit version>: This allows you to select the API toolkit your sofware needs. This will automatically make available the requested CUDA libraries where these haven't been compiled into the software

There are two CUDA coprocessor definitions configured for fsl_sub, cuda and cuda_ml.

cuda selects GPUs capable of high-performance double-precision workloads and would normally be used for queued tasks such as Eddy and BedpostX.
cuda_all selects all GPUs.
cuda_ml selects GPUs more suited to machine learning tasks, they typically have very poor double-precision performance, instead being optimised for single, half and quarter precision workloads - use these for tasks involving ML inference and development, although training may still be more optimal on the general purpose GPUs depending on the task this involves, ask the developer of the software for advice on this. In the case of the FMRIB SLURM cluster there is no difference in double precision capability for all our GPUs - this partition is only included to allow for straightforward porting of your scripts to BMRC's cluster.

GPU and queue aware tools will automatically select the cuda queue if they detect it.

There are six A30 units, each divided into two, providing for twelve 12GB GPU cards and two H100 units, one full 80GB card and one divided into seven 10GB GPUs.

When indicating RAM requirements with -R you should consider the following quotation from the BMRC pages:

When submitting jobs, the total memory requirement for your job should be equal to the the compute memory + GPU memory i.e. you will need to request a sufficient number of slots to cover this total memory requirement.

INTERACTIVE JOBS (INCLUDING GPU/MACHINE LEARNING TASKS)

Where your program requires interaction you can select a GPU when requesting a VDI, graphical MATLAB, Jupyter or RStudio session.

Alternatively, within a VDI session, you can request a text only interactive session using:

salloc -p gpu_short --gres=gpu:1 --cpus-per-gpu=2 --mem-per-cpu=8G

(...wait for job allocation...)

srun --pty /bin/bash -l

There may be a delay during the salloc command whilst the system finds a suitable host. Adapt the options as required, the example above requests:

-p gpu_short - gpu_short partition (1.25 days)
--gres=gpu:1 - requests a single gpu, for a specific type use `gpu:k40:1` and change the number to 2 to request two GPUs
--cpus-per-gpu=2 - requests two CPU cores for each GPU allocated.
--mem-per-cpu=8G - allocates 16GB of memory for the task.

The `srun` command then launches a terminal into this interactive job.

When you have finished, use the command `exit` twice to return to your original terminal.

Cookies on this website

GPU Tasks

INTERACTIVE ​JOBS (INCLUDING GPU/MACHINE LEARNING TASKS)

INTERACTIVE JOBS (INCLUDING GPU/MACHINE LEARNING TASKS)