GPU Tasks
How to request a GPU for your job
Whilst GPU tasks can simply be submitted to the short.gq or long.gq queues fsl_sub also provides helper options which can automatically select a GPU queue and select the appropriate CUDA toolkit for you.
- -c|--coprocessor <coprocessor name>: This selects the coprocessor with the given name (see fsl_sub --help for details of available coprocessors)
- --coprocessor_multi <number>: This allows you to request multiple GPUs. On the FMRIB cluster you can select no more than two GPUs. You will automatically be given a two-slot openmp parallel environment
- --coprocessor_class <class>: This would allow you to select which GPU hardware model you require, see fsl_sub --help for details
- --coprocessor_toolkit <toolkit version>: This allows you to select the API toolkit your sofware needs. This will automatically make available the requested CUDA libraries where these haven't been compiled into the software
- cuda selects GPUs capable of high-performance double-precision workloads and would normally be used for queued tasks such as Eddy and BedpostX.
- cuda_all selects all GPUs.
- cuda_ml selects GPUs more suited to machine learning tasks, they typically have very poor double-precision performance, instead being optimised for single, half and quarter precision workloads - use these for tasks involving ML inference and development, although training may still be more optimal on the general purpose GPUs depending on the task this involves, ask the developer of the software for advice on this. In the case of the FMRIB SLURM cluster there is no difference in double precision capability for all our GPUs - this partition is only included to allow for straightforward porting of your scripts to BMRC's cluster.
INTERACTIVE JOBS (INCLUDING GPU/MACHINE LEARNING TASKS)
Where your program requires interaction you can select a GPU when requesting a VDI, graphical MATLAB, Jupyter or RStudio session.
Alternatively, within a VDI session, you can request a text only interactive session using:
salloc -p gpu_short --gres=gpu:1 --cpus-per-gpu=2 --mem-per-cpu=8G
(...wait for job allocation...)
srun --pty /bin/bash -l
There may be a delay during the salloc command whilst the system finds a suitable host. Adapt the options as required, the example above requests:
- -p gpu_short - gpu_short partition (1.25 days)
- --gres=gpu:1 - requests a single gpu, for a specific type use `gpu:k40:1` and change the number to 2 to request two GPUs
- --cpus-per-gpu=2 - requests two CPU cores for each GPU allocated.
- --mem-per-cpu=8G - allocates 16GB of memory for the task.
The `srun` command then launches a terminal into this interactive job.
When you have finished, use the command `exit` twice to return to your original terminal.