Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

If your task comprises a complicated pipeline of interconnected tasks there are several options for splitting into dependent tasks or parallelisation of independent portions across many cluster nodes. Information on these techniques and other advance options is in this section.

Cluster advanced usage

How to request a GPU for your job

Whilst GPU tasks can simply be submitted to the short.gq or long.gq queues fsl_sub also provides helper options which can automatically select a GPU queue and select the appropriate CUDA toolkit for you.

  • ​-c|--coprocessor <coprocessor name>: This selects the coprocessor with the given name (see fsl_sub --help for details of available coprocessors)
  • --coprocessor_multi <number>: This allows you to request multiple GPUs. On the FMRIB cluster you can select no more than two GPUs. You will automatically be given a two-slot openmp parallel environment
  • --coprocessor_class <class>: This would allow you to select which GPU hardware model you require, see fsl_sub --help for details
  • --coprocessor_class_strict: If a class is requested you will normally be allocated a card at least as capable as the model requested. By adding this option you ensure that you only get the GPU model you asked for
  • --coprocessor_toolkit <toolkit version>: This allows you to select the API toolkit your sofware needs. This will automatically make available the requested CUDA libraries where these haven't been compiled into the software
There are two CUDA coprocessor definitions configured for fsl_sub, cuda and cuda_ml
  • cuda selects GPUs capable of high-performance double-precision workloads and would normally be used for queued tasks such as Eddy and BedpostX.
  • cuda_all selects all GPUs.
  • cuda_ml selects GPUs more suited to machine learning tasks, they typically have very poor double-precision performance, instead being optimised for single, half and quarter precision workloads - use these for tasks involving ML inference and development, although training may still be more optimal on the general purpose GPUs depending on the task this involves, ask the developer of the software for advice on this. In the case of the FMRIB SLURM cluster there is no difference in double precision capability for all our GPUs - this partition is only included to allow for straightforward porting of your scripts to BMRC's cluster.
GPU and queue aware tools will automatically select the cuda​ queue if they detect it.
 
At present we only have four A30 units (provided by two cards) available for use on the SLURM cluster. We will be adding more A30 units in April 2024, with H100 units arriving in June 2024.
 
When indicating RAM requirements with -R you should consider the following quotation from the BMRC pages:​
 
When submitting jobs, the total memory requirement for your job should be equal to the the compute memory + GPU memory i.e. you will need to request a sufficient number of slots to cover this total memory requirement. 

INTERACTIVE ​JOBS (INCLUDING GPU/MACHINE LEARNING TASKS)

Where your program requires interaction you can select a GPU when requesting a VDI, graphical MATLAB, Jupyter or RStudio session.

Alternatively, within a VDI session, you can request a text only interactive session using:

salloc -p gpu_short --gres=gpu:1 --cpus-per-gpu=2 --mem-per-cpu=8G

(...wait for job allocation...)

srun --pty /bin/bash -l

There may be a delay during the salloc command whilst the system finds a suitable host. Adapt the options as required, the example above requests:

  • -p gpu_short - gpu_short partition (1.25 days)
  • --gres=gpu:1 - requests a single gpu, for a specific type use `gpu:k40:1` and change the number to 2 to request two GPUs
  • --cpus-per-gpu=2 - requests two CPU cores for each GPU allocated.
  • --mem-per-cpu=8G - allocates 16GB of memory for the task.

 

The `srun` command then launches a terminal into this interactive job.

When you have finished, use the command `exit` twice to return to your original terminal.

How to request a multi-threaded slot and how to ensure your software only uses the CPU cores it has been allocated

Running multi-threaded programs can cause significant problems with cluster scheduling software if the clustering software is not made aware of the multiple threads (your job is allocated one slot but actually consumes many more, often ALL the CPUs, overloading the machine).

We support the running of shared memory multi-threaded software only (e.g. OpenMP, multi-threaded MKL, OpenBLAS etc).

To submit an OpenMP job, use the -s (or --parallelenv) option to fsl_sub. For example:

fsl_sub -s 2 <command or script>

2 being the number of threads you wish to allocate to your jobs.

The task running on the queue will be able to determine how many slots it has by querying the environment variable pointed to by FSLSUB_NSLOTS. For example in BASH the number of slots is equal to ${!FSLSUB_NSLOTS}.

In Python you would be able to get this figure with the following code:

import os
slots = os.environ[os.environ['FSLSUB_NSLOTS']]

To be able to provide these threads the cluster software needs to reserve slots on compute nodes, this may lead to significant wait times whilst sufficient slots become available on a single device.​

How to submit non-interactive MATLAB scripts to the queues

Wherever possible DO NOT run full MATLAB directly on the cluster, instead compile your code (see the MATLAB page) but where this is not possible or you only need to run a quick single job task it is acceptable to run the full MATLAB environment on the cluster.

Any non-interactive MATLAB task needs to be submitted by creating a file (typically with the extension '.m'), eg 'myfile.m' with all your MATLAB commands in and submit it using 'fsl_sub'; once the task is running you can look at the file "matlab.o<jobid>" for any output. 

fsl_sub -q short.q matlab -singleCompThread -nodisplay -nosplash \< mytask.m

NB The "\" is very important since MATLAB won't read your script otherwise.

Warning: MATLAB tasks will often attempt to carry out some operations using multiple threads. Our cluster is configured to run only single thread programs unless you request multiple threads. SLURM will enforce these limits so preventing MATLAB from overloading the system.

If you wish to take advantage of the multi-threaded facilities in MATLAB request multiple cores with the -s option to fsl_sub.

Where you must interact with the process see the section on the MATLAB gui within the VDI.

Environment variables that can be set to control fsl_sub submitted tasks

Available Environment Variables

fsl_sub sets or can be controlled with the following shell variables. These can be set either for the duration of the fsl_sub run by prepending the call with the setting of the value:

ENVVAR=VALUE fsl_sub ...

or by exporting the value to your shell so that all subsequent calls will also have this variable set this way:

export ENVVAR=VALUE

Envrionment variable​​Who sets​PurposeExample values
​FSLSUB_JOBID_VAR ​fsl_sub ​Variable name of Grid job id JOB_ID
​FSLSUB_ARRAYTASKID_VAR ​fsl_sub Variable name of Grid task id SGE_TASK_ID
​​FSLSUB_ARRAYSTARTID_VAR ​fsl_sub Variable name of Grid first task id SGE_TASK_FIRST
​FSLSUB_ARRAYENDID_VAR ​fsl_sub Variable name of Grid last task id SGE_TASK_LAST
​FSLSUB_ARRAYSTEPSIZE_VAR ​fsl_sub ​​Variable name of Grid step between task ids SGE_TASK_STEPSIZE
​FSLSUB_ARRAYCOUNT_VAR ​fsl_sub ​Variable name of Grid number of tasks in array ​Not supported in Grid Engine
​FSLSUB_MEMORY_REQUIRED ​You ​Advise fsl_sub of expected memory required ​32G
​FSLSUB_PROJECT ​You ​Name of Grid project to run jobs under ​MyProject
​FSLSUB_PARALLEL ​You/fsl_sub ​Control array task parallelism when running without a cluster engine (e.g. when a queued task itself submits an array task) ​4 (for four threads), 0 to let fsl_sub's shell plugin use all available cores
​FSLSUB_CONF You ​Provides the path to the configuration file /usr/local/etc/fslsub_conf.yml​
​FSLSUB_NSLOTS ​fsl_sub ​Variable name of Grid allocated slots ​NSLOTS
​FSLSUB_DEBUG ​You/fsl_sub ​Enable debugging in child fsl_sub ​1
​FSLSUB_PLUGINPATH ​You ​​Where to find installed plugins (do not change this variable) ​/path/to/folder
​FSLSUB_NOTIMELIMIT ​You ​Disable notification of job time to the cluster ​1​

​Where a FSLSUB_* variable is a reference to another variable you need to read the content of the referred to variable. This can be achieved as follows:

BASH: the number of slots is equal to ${!FSLSUB_VARIABLE}

Python:

import os
value = os.environ[os.environ['FSLSUB_VARIABLE']]

MATLAB:

NSLOT_VAR = ge​tenv('FSLSUB_VARIABLE')
N = getenv(NSLOT_VAR)

How to change fsl_sub's configuration for all jobs you run

Some of the operation of fsl_sub can be configured such that all runs will enable/disable features. To configure fsl_sub create a file ~/.fsl_sub.yml and add the configuration to this file - it is in YAML format. To see what the current configuration is use:

fsl_sub --show_config

Take care - the system configuration has been setup to be optimal for the cluster, changing these settings may cause your job to fail.

FSL_SUB.YML SECTIONS​

TOP LEVEL​​

These options control the basic operation of fsl_sub and are keys in a YAML dictionary. To change a setting add 'keyname: value' to your file with no indent.

​​Key name​​Default​Purpose​Examples/Allowed Options
​method ​'shell', 'slurm' (or 'sge') ​Define whether to use the cluster ('slurm') or run things without a cluster ('shell') ​'shell' or the name of an installed plugin, e.g. 'slurm'
​ram_units ​'G' ​When -R is specified, what are the units ​'K', 'M', 'G', 'T', 'P'(!) - recommend this is not changed
​modulecmd ​False ​Where 'modulecmd' is not findable via PATH, where is the program ​Path to modulecmd
​export_vars ​Empty list = [] ​List of environment variables (with optional values) to always pass to jobs running on the cluster. List you provide will be added to the default list ​[SUBJECTSDIR, "MYVARIABLE=MYVALUE"]
The list can also be specified by starting a new line and adding items as '  - SUBJECTSDIR' (note the two spaces before the '-') on separate lines
​thread_control ​['OMP_NUM_THREADS', 'MKL_NUM_THREADS', 'MKL_DOMAIN_NUM_THREADS', 'OPENBLAS_NUM_THREADS', 'GOTO_NUM_THREADS'] ​Environment variables to set to ensure threads are limited to those requested by a parallel envrionment. Any values you configure will be added to the default list. ​Names of environment variables
​method_opts ​{} ​Control the method that runs your job ​See below
​coproc_opts ​{} ​Control the coprocessor options ​Should not be changed
​queues ​{} ​Control the queues ​Must not be changed

METHOD_O​PTS

These control how the shell and sge job runners operate, most of these should not be changed, but some useful ones include:
method_opts:
  shell:
    parallel_disable_matches:
      - "*_string"

parrallel_disable_matches enables you to specify portions of a command name that should never be attempted to be run in parallel when submitted as an array task but running with the shell backend. The default list contains '*_gpu' which ensures that the FSL GPU enabled tools do not attempt to start up in parallel as they are likely to be unable to access multiple GPUs. fsl_sub supports matching a full program name, a full path to a program and *<name> and <name>* to match the end or start of a program name respectively.

method_opts:
  slurm:
    keep_jobscript: True|False

or for the legacy Jalapeno cluster:

method_opts:
  sge:
    keep_jobscript: True|False

When the cluster backends submit your job they generate a submission script, the keep_jobscript option will leave a copy of this script in the current folder for reference or for later reuse

You can also control this on a job by job basis with the option --keep_jobscript, but where tasks don't allow this (e.g. FEAT) you can control this here.

Other potentially useful submission options or techniques

Capturing job submission information

fsl_sub can store the commands used to submit the job if you provide the option --keep_jobscript. When provided, post submission you will find a file in the current folder (assuming you have write permissions there) a script called wrapper-<jobid>.sh. This exact submission may be repeated by using:

fsl_sub -F wrapper-<jobid>.sh

​​The script contents is described below:

​#!/bin/bash ​Run the script in BASH
​#SBATCH OPTION SLURM options​​ ​
#SBATCH OPTION
​module load <module name> ​Load a Shell Module
​# Built by fsl_sub v.2.3.0 and fsl_sub_plugin_sge v.1.3.0 ​Version of fsl_sub and plugin that submitted the job
​# Command line: <command line> ​Command line that invoked fsl_sub
​# Submission time (H:M:S DD/MM/YYYY) <date/time> ​Date and time that the job was submitted
​<command>
​<command>

PASSING ENVIRONMENT VARIABLES TO QUEUED JOBS

It is not possible to inherit all the environment variables from the shell that submits a job, so fsl_sub allows you to specify environment variables that should be transferred to the job. This can also be useful if you are scheduling many similar tasks and need to specify a different value for an environment variable for each run, for example SUBJECTS_DIR which FreeSurfer uses to specify where your data sets reside. The --export option is used for this purpose.

SKIPPING COMMAND VALIDATION

By default fsl_sub will check the command given (or the commands in the lines in an array task file) can be found and are executable. If this causes issues, often because a particular program is only available on the compute nodes, not on jalapeno itself, then you can disable this check with -n (--novalidation).

Requesting a specific resource

Some resources may have a limited quantity available for use, e.g. software licenses or RAM. fsl_sub has the ability to request these resources from the cluster (the --coprocessor options do this to automatically to request the appropriate number of GPUs). The option -r (--resource) allows you to pass a resource string directly through to the Grid Engine software. If you need to do this you will be advised by the computing help team or software documentation the exact string to pass.

How to submit pipeline stages such that they wait for their predecessor to complete

If you have a multi-stage task to run, you can submit the jobs all at once, specifying that later stages must wait for the previous task to complete. This is achieved by providing the '-j' (or --jobhold) option with the job id of the task to wait for. For example:

jid=$(fsl_sub -R 3 -T 16 ./my_first_stage)
fsl_sub -R 1 -T 8 -j $jid ./my_second_stage

Note the $() surrounding the first fsl_sub command, this captures the output of a command and stores the text in the variable 'jid'. This is then passed as the job id to wait for before running 'my_second_stage'.

It is also possible to submit array holds with the --array_hold command which takes the job id of the predecessor array task. This can only be used when both the first and subsequent job are both array tasks of the same size (same number of sub-tasks) and each sub-task in the second array depends only on the equivalent sub-task in the first array.

How to submit independent 'clone' tasks for running in parallel

An array task is a set of closely related tasks that do not rely on the output of any other members of the set of jobs. An example might be where you need to process each slice of a brain volume but there is no need to know or effect the content of any other slice (the array tasks can't communicate with each other to advise of changes to data). These tasks allow you to submit large numbers of discrete jobs and manage them under one job id, with each sub-task being allocated a unique task id and potentially able to run in parallel given enough compute slot availability.

You can submit an array task with the -t/--array_task option or with the --array_native option:

TEXT FILE ARRAY TASKS

The -t (or --array_task) option needs the name of a text file that contains the array task commands, one per line. Sub-tasks will be generated from these lines, with the task ID being equivalent to the line number in the file (starting from 1). e.g.

fsl_sub -R 12 -T 8 -t ./myparalleljobs

The array task has a parent job id which can be used to control/delete all of the sub-tasks, the sub-tasks may be specified as job id:sub-task id, eg ''12345:10'' for sub-task 10 of job 12345.

NATIVE ARRAY TASKS​​

The --array_task option requires an argument n[-m[:s]] which specifies the array:

  • n provided alone will run the command n-times in parallel
  • n-m will run the command once for each number in the range with task ids equal to the position in this range
  • n-m:s similarly, but with s specifying the increment in task id.

The cluster software will set environment variables that the script/binary can use to determine what task they need to carry out. For example, this might be used to represent the brain volume slice to process. As these environment variables differ between different cluster software, fsl_sub sets several environment variables to the name of the environment variable the script can use to obtain it's task id from the cluster software:

Envrionment variable​​...points to variable containing
​FSLSUB_JOBID_VAR job id
​FSLSUB_ARRAYTASKID_VAR ​task id
​​FSLSUB_ARRAYSTARTID_VAR ​first task id
​FSLSUB_ARRAYENDID_VAR ​last task id
​FSLSUB_ARRAYSTEPSIZE_VAR ​step between task ids
​FSLSUB_ARRAYCOUNT_VAR ​number of tasks in array (not supported in Grid Engine)

To use these you need to look up the variable name and then read the value from the variable, for example in BASH use ${!FSLSUB_ARRAYTASKID_VAR} to get the value of the task id.

Important The tasks must be truly independent - ie, they must not write to the same file(s) or rely on calculations in other array jobs in this set otherwise you may get unpredictable results (or sub-tasks may crash).

LIMITING CONCURRENT ARRAY TASKS​

​Sometimes it may be necessary to limit the number of array sub-tasks runnning at any one time. You can do this by providing the -x (or --array_limit) option which takes a integer, e.g.:

fsl_sub -T10 -x 10 -t ./myparalleljobs

​Will limit sub-tasks to ten running at any one time.

ARRAY TASKS WITH THE SHELL RUNNER

If running without a cluster backend or when fsl_sub is called from within an already scheduled task, the shell backend is capable of running array tasks in parallel. If running as a cluster job, the shell plugin will run no more than the number of threads selected in your parallel environment (if one is specified, default is one task at a time).

If you are not running on a cluster then by default fsl_sub will use all of the CPUs on your system. You can control this either using the -x|--array-limit option or by setting the environment variable FSLSUB_PARALLEL to the maximum number of array tasks to run at once. It is also possible to configure this in your own personal fsl_sub configuration file (see below).