Advanced Usage

If your task comprises a complicated pipeline of interconnected tasks there are several options for splitting into dependent tasks or parallelisation of independent portions across many cluster nodes. Information on these techniques and other advance options is in this section.

Cluster advanced usage

GPU Tasks

How to request a GPU for your job

Whilst GPU tasks can simply be submitted to the short.gq or long.gq queues fsl_sub also provides helper options which can automatically select a GPU queue and select the appropriate CUDA toolkit for you.

-c|--coprocessor <coprocessor name>: This selects the coprocessor with the given name (see fsl_sub --help for details of available coprocessors)
--coprocessor_multi <number>: This allows you to request multiple GPUs. On the FMRIB cluster you can select no more than two GPUs. You will automatically be given a two-slot openmp parallel environment
--coprocessor_class <class>: This would allow you to select which GPU hardware model you require, see fsl_sub --help for details
--coprocessor_toolkit <toolkit version>: This allows you to select the API toolkit your sofware needs. This will automatically make available the requested CUDA libraries where these haven't been compiled into the software

There are two CUDA coprocessor definitions configured for fsl_sub, cuda and cuda_ml.

cuda selects GPUs capable of high-performance double-precision workloads and would normally be used for queued tasks such as Eddy and BedpostX.
cuda_all selects all GPUs.
cuda_ml selects GPUs more suited to machine learning tasks, they typically have very poor double-precision performance, instead being optimised for single, half and quarter precision workloads - use these for tasks involving ML inference and development, although training may still be more optimal on the general purpose GPUs depending on the task this involves, ask the developer of the software for advice on this. In the case of the FMRIB SLURM cluster there is no difference in double precision capability for all our GPUs - this partition is only included to allow for straightforward porting of your scripts to BMRC's cluster.

GPU and queue aware tools will automatically select the cuda queue if they detect it.

There are six A30 units, each divided into two, providing for twelve 12GB GPU cards and two H100 units, one full 80GB card and one divided into seven 10GB GPUs.

When indicating RAM requirements with -R you should consider the following quotation from the BMRC pages:

When submitting jobs, the total memory requirement for your job should be equal to the the compute memory + GPU memory i.e. you will need to request a sufficient number of slots to cover this total memory requirement.

INTERACTIVE JOBS (INCLUDING GPU/MACHINE LEARNING TASKS)

Where your program requires interaction you can select a GPU when requesting a VDI, graphical MATLAB, Jupyter or RStudio session.

Alternatively, within a VDI session, you can request a text only interactive session using:

salloc -p gpu_short --gres=gpu:1 --cpus-per-gpu=2 --mem-per-cpu=8G

(...wait for job allocation...)

srun --pty /bin/bash -l

There may be a delay during the salloc command whilst the system finds a suitable host. Adapt the options as required, the example above requests:

-p gpu_short - gpu_short partition (1.25 days)
--gres=gpu:1 - requests a single gpu, for a specific type use `gpu:k40:1` and change the number to 2 to request two GPUs
--cpus-per-gpu=2 - requests two CPU cores for each GPU allocated.
--mem-per-cpu=8G - allocates 16GB of memory for the task.

The `srun` command then launches a terminal into this interactive job.

When you have finished, use the command `exit` twice to return to your original terminal.

Conda Environments

How to submit commands that rely on a personal Conda environment

When a job starts up on an OOD compute node it does not inherit the environment that you had when it was submitted. Modules are ordinarily re-loaded, so if the software is configured using a module then it should still work when submitted (you could write your own modules if you wish - see https://lmod.readthedocs.io/en/latest/015_writing_modules.html).

If your command is installed into a Conda environment not configured by a module then the cluster node will not know where to find it. You can either specify the full path to the script (typically <path of environment>/bin/<scriptname>) or you can create a wrapper script and submit this script to fsl_sub. A basic (generic) wrapper follows:

#!/bin/bash
# Enable Conda
eval "$(conda shell.bash hook)"
# Activate your environment
conda activate <name or pathtoenvironment>
"$@"

Make this executable (chmod +x <name of script>) and then you can use it to run commands as follows.

If you wish to run 'mypython_command option1 option2 option2' then use:

fsl_sub -R 1 -T 1 ./conda_wrapper.sh mypython_command option1 option2

Multi-Threaded Software

How to request a multi-threaded slot and how to ensure your software only uses the CPU cores it has been allocated

Running multi-threaded programs can cause significant problems with cluster scheduling software if the clustering software is not made aware of the multiple threads (your job is allocated one slot but actually consumes many more, often ALL the CPUs, overloading the machine).

We support the running of shared memory multi-threaded software only (e.g. OpenMP, multi-threaded MKL, OpenBLAS etc).

To submit an OpenMP job, use the -s (or --parallelenv) option to fsl_sub. For example:

fsl_sub -s 2 <command or script>

2 being the number of threads you wish to allocate to your jobs.

The task running on the queue will be able to determine how many slots it has by querying the environment variable pointed to by FSLSUB_NSLOTS. For example in BASH the number of slots is equal to ${!FSLSUB_NSLOTS}.

In Python you would be able to get this figure with the following code:

import os
slots = os.environ[os.environ['FSLSUB_NSLOTS']]

Within MATLAB you can control the number of slots with:

n = getenv("FSLSUB_NSLOTS");
LASTN = maxNumCompThreads(n);

To be able to provide these threads the cluster software needs to reserve slots on compute nodes, this may lead to significant wait times whilst sufficient slots become available on a single device.

MATLAB Tasks

How to submit non-interactive MATLAB scripts to the queues

See the MATLAB page for details on selecting MATLAB versions, compilation and using compilation runtimes.

For non-interactive MATLAB, it is more efficient to compile your code (see the MATLAB page).

When using MATLAB 2019a onwards, there is a new command line option `-batch` specifically for running MATLAB in the most efficient manner on compute clusters. Unfortunately, when run in this mode, MATLAB only accepts simple commands, so if your script is in a file called 'mytask.m' then you would call with:

fsl_aub -R 16 -T 100 matlab -batch "run('mytask.m')"

For older MATLAB versions then the equivalent would be:

fsl_sub -R 16 -T 100 matlab -nodisplay -nosplash \< mytask.m

NB The "\" is very important since MATLAB won't read your script otherwise.

Warning: MATLAB tasks will often attempt to carry out some operations using multiple threads. Our cluster is configured to run only single thread programs unless you request multiple threads. SLURM will enforce these limits so preventing MATLAB from overloading the system, but it may be advisable to limit the number of threads MATLAB uses to ensure optimum performance.

Once the task is running you can look at the file "matlab.o<jobid>" for any output.

If you wish to take advantage of the multi-threaded facilities in MATLAB request multiple cores with the -s option to fsl_sub.

Where you must interact with the process see the section on the MATLAB gui within the VDI.

fsl_sub Environment variables

Environment variables that can be set to control fsl_sub submitted tasks

Available Environment Variables

fsl_sub sets or can be controlled with the following shell variables. These can be set either for the duration of the fsl_sub run by prepending the call with the setting of the value:

ENVVAR=VALUE fsl_sub ...

or by exporting the value to your shell so that all subsequent calls will also have this variable set this way:

export ENVVAR=VALUE

Envrionment variable	Who sets	Purpose	Example values
FSLSUB_JOBID_VAR	fsl_sub	Variable name of Grid job id	JOB_ID
FSLSUB_ARRAYTASKID_VAR	fsl_sub	Variable name of Grid task id	SGE_TASK_ID
FSLSUB_ARRAYSTARTID_VAR	fsl_sub	Variable name of Grid first task id	SGE_TASK_FIRST
FSLSUB_ARRAYENDID_VAR	fsl_sub	Variable name of Grid last task id	SGE_TASK_LAST
FSLSUB_ARRAYSTEPSIZE_VAR	fsl_sub	Variable name of Grid step between task ids	SGE_TASK_STEPSIZE
FSLSUB_ARRAYCOUNT_VAR	fsl_sub	Variable name of Grid number of tasks in array	Not supported in Grid Engine
FSLSUB_MEMORY_REQUIRED	You	Advise fsl_sub of expected memory required	32G
FSLSUB_PROJECT	You	Name of Grid project to run jobs under	MyProject
FSLSUB_PARALLEL	You/fsl_sub	Control array task parallelism when running without a cluster engine (e.g. when a queued task itself submits an array task)	4 (for four threads), 0 to let fsl_sub's shell plugin use all available cores
FSLSUB_CONF	You	Provides the path to the configuration file	/usr/local/etc/fslsub_conf.yml
FSLSUB_NSLOTS	fsl_sub	Variable name of Grid allocated slots	NSLOTS
FSLSUB_DEBUG	You/fsl_sub	Enable debugging in child fsl_sub	1
FSLSUB_PLUGINPATH	You	Where to find installed plugins (do not change this variable)	/path/to/folder
FSLSUB_NOTIMELIMIT	You	Disable notification of job time to the cluster	1

Where a FSLSUB_* variable is a reference to another variable you need to read the content of the referred to variable. This can be achieved as follows:

BASH: the number of slots is equal to ${!FSLSUB_VARIABLE}

Python:

import os
value = os.environ[os.environ['FSLSUB_VARIABLE']]

MATLAB:

NSLOT_VAR = getenv('FSLSUB_VARIABLE')
N = getenv(NSLOT_VAR)

Miscellaneous Techniques

Other potentially useful submission options or techniques

Capturing job submission information

fsl_sub can store the commands used to submit the job if you provide the option --keep_jobscript. When provided, post submission you will find a file in the current folder (assuming you have write permissions there) a script called wrapper-<jobid>.sh. This exact submission may be repeated by using:

fsl_sub -F wrapper-<jobid>.sh

The script contents is described below:

#!/bin/bash	Run the script in BASH
#SBATCH OPTION	SLURM options
#SBATCH OPTION	SLURM options
module load <module name>	Load a Shell Module
# Built by fsl_sub v.2.3.0 and fsl_sub_plugin_sge v.1.3.0	Version of fsl_sub and plugin that submitted the job
# Command line: <command line>	Command line that invoked fsl_sub
# Submission time (H:M:S DD/MM/YYYY) <date/time>	Date and time that the job was submitted
<command>

PASSING ENVIRONMENT VARIABLES TO QUEUED JOBS

It is not possible to inherit all the environment variables from the shell that submits a job, so fsl_sub allows you to specify environment variables that should be transferred to the job. This can also be useful if you are scheduling many similar tasks and need to specify a different value for an environment variable for each run, for example SUBJECTS_DIR which FreeSurfer uses to specify where your data sets reside. The --export option is used for this purpose.

SKIPPING COMMAND VALIDATION

By default fsl_sub will check the command given (or the commands in the lines in an array task file) can be found and are executable. If this causes issues, often because a particular program is only available on the compute nodes, not on the submission host, then you can disable this check with -n (--novalidation).

Requesting a specific resource

Some resources may have a limited quantity available for use, e.g. software licenses or RAM. fsl_sub has the ability to request these resources from the cluster (the --coprocessor options do this to automatically to request the appropriate number of GPUs). The option -r (--resource) allows you to pass a resource string directly through to the Grid Engine software. If you need to do this you will be advised by the computing help team or software documentation the exact string to pass.

Submitting Serial Jobs (Job Holds)

How to submit pipeline stages such that they wait for their predecessor to complete

If you have a multi-stage task to run, you can submit the jobs all at once, specifying that later stages must wait for the previous task to complete. This is achieved by providing the '-j' (or --jobhold) option with the job id of the task to wait for. For example:

jid=$(fsl_sub -R 3 -T 16 ./my_first_stage)
fsl_sub -R 1 -T 8 -j $jid ./my_second_stage

Note the $() surrounding the first fsl_sub command, this captures the output of a command and stores the text in the variable 'jid'. This is then passed as the job id to wait for before running 'my_second_stage'.

It is also possible to submit array holds with the --array_hold command which takes the job id of the predecessor array task. This can only be used when both the first and subsequent job are both array tasks of the same size (same number of sub-tasks) and each sub-task in the second array depends only on the equivalent sub-task in the first array.

Array Jobs

How to submit independent 'clone' tasks for running in parallel

An array task is a set of closely related tasks that do not rely on the output of any other members of the set of jobs. An example might be where you need to process each slice of a brain volume but there is no need to know or effect the content of any other slice (the array tasks can't communicate with each other to advise of changes to data). These tasks allow you to submit large numbers of discrete jobs and manage them under one job id, with each sub-task being allocated a unique task id and potentially able to run in parallel given enough compute slot availability.

You can submit an array task with the -t/--array_task option or with the --array_native option:

TEXT FILE ARRAY TASKS

The -t (or --array_task) option needs the name of a text file that contains the array task commands, one per line. Sub-tasks will be generated from these lines, with the task ID being equivalent to the line number in the file (starting from 1). e.g.

fsl_sub -R 12 -T 8 -t ./myparalleljobs

The array task has a parent job id which can be used to control/delete all of the sub-tasks, the sub-tasks may be specified as job id:sub-task id, eg ''12345:10'' for sub-task 10 of job 12345.

NATIVE ARRAY TASKS

The --array_task option requires an argument n[-m[:s]] which specifies the array:

n provided alone will run the command n-times in parallel
n-m will run the command once for each number in the range with task ids equal to the position in this range
n-m:s similarly, but with s specifying the increment in task id.

The cluster software will set environment variables that the script/binary can use to determine what task they need to carry out. For example, this might be used to represent the brain volume slice to process. As these environment variables differ between different cluster software, fsl_sub sets several environment variables to the name of the environment variable the script can use to obtain it's task id from the cluster software:

Envrionment variable	...points to variable containing
FSLSUB_JOBID_VAR	job id
FSLSUB_ARRAYTASKID_VAR	task id
FSLSUB_ARRAYSTARTID_VAR	first task id
FSLSUB_ARRAYENDID_VAR	last task id
FSLSUB_ARRAYSTEPSIZE_VAR	step between task ids
FSLSUB_ARRAYCOUNT_VAR	number of tasks in array (not supported in Grid Engine)

To use these you need to look up the variable name and then read the value from the variable, for example in BASH use ${!FSLSUB_ARRAYTASKID_VAR} to get the value of the task id.

Important The tasks must be truly independent - ie, they must not write to the same file(s) or rely on calculations in other array jobs in this set otherwise you may get unpredictable results (or sub-tasks may crash).

LIMITING CONCURRENT ARRAY TASKS

Sometimes it may be necessary to limit the number of array sub-tasks runnning at any one time. You can do this by providing the -x (or --array_limit) option which takes a integer, e.g.:

fsl_sub -T10 -x 10 -t ./myparalleljobs

Will limit sub-tasks to ten running at any one time.

ARRAY TASKS WITH THE SHELL RUNNER

If running without a cluster backend or when fsl_sub is called from within an already scheduled task, the shell backend is capable of running array tasks in parallel. If running as a cluster job, the shell plugin will run no more than the number of threads selected in your parallel environment (if one is specified, default is one task at a time).

If you are not running on a cluster then by default fsl_sub will use all of the CPUs on your system. You can control this either using the -x|--array-limit option or by setting the environment variable FSLSUB_PARALLEL to the maximum number of array tasks to run at once. It is also possible to configure this in your own personal fsl_sub configuration file (see below).

Cookies on this website