GPU Tasks
How to request a GPU for your job
Whilst GPU tasks can simply be submitted to the short.gq or long.gq queues fsl_sub also provides helper options which can automatically select a GPU queue and select the appropriate CUDA toolkit for you.
- -c|--coprocessor <coprocessor name>: This selects the coprocessor with the given name (see fsl_sub --help for details of available coprocessors)
- --coprocessor_multi <number>: This allows you to request multiple GPUs. On the FMRIB cluster you can select no more than two GPUs. You will automatically be given a two-slot openmp parallel environment
- --coprocessor_class <class>: This would allow you to select which GPU hardware model you require, see fsl_sub --help for details
- --coprocessor_class_strict: If a class is requested you will normally be allocated a card at least as capable as the model requested. By adding this option you ensure that you only get the GPU model you asked for
- --coprocessor_toolkit <toolkit version>: This allows you to select the API toolkit your sofware needs. This will automatically make available the requested CUDA libraries where these haven't been compiled into the software
- cuda selects GPUs capable of high-performance double-precision workloads and would normally be used for queued tasks such as Eddy and BedpostX.
- cuda_all selects all GPUs.
- cuda_ml selects GPUs more suited to machine learning tasks, the typically have very poor double-precision performance, instead being optimised for single, half and quarter precision workloads - use these for tasks involving ML inference and development, although training may still be more optimal on the general purpose GPUs depending on the task this involves, ask the developer of the software for advice on this.
INTERACTIVE JOBS (INCLUDING GPU/MACHINE LEARNING TASKS)
Where your program requires interaction we offer an interactive queue which can be used to get a terminal session on one of the cluster nodes.
To request a terminal session for GPU tasks, issue the following command on a resomp head node:
srun -p gpu_short --pty bash
There may be a delay whilst the system finds a suitable host. Once one becomes available, if this is the first time you have logged into a particular node you may be asked to accept the host key. Enter `yes` to accept this host key and then you will be presented with a terminal session. Your job will be subject to the same limits as a batch job and if you expect your session to need to be interupted then you should start screen or tmux on rescomp before using qlogin.
srun -p gpu_short --gres gpu:quadro-rtx8000:1 --pty bash
The RTX8000 nodes are deployed in pairs, so if your software is multi-GPU aware you can request two GPUs to double your available GPU compute power and GPU memory. You can do this by increasing the number of GPUs requested with:
srun -p gpu_short --gres gpu:quadro-rtx8000:2 --pty bash
The pairs of cards share a high-speed interconnect, so although slower than local GPU memory, the performance of this distant GPU memory is significantly higher than a traditional multi-GPU setup.