Compute Cluster Usage

Submitting jobs to the FMRIB SLURM compute cluster

Please see our job submission and monitoring section.

More advanced techniques for submitting jobs, e.g. GPU, array and MATLAB tasks and full fsl_sub usage information

If your task comprises a complicated pipeline of interconnected tasks there are several options for splitting into dependent tasks or parallelisation of independent portions across many cluster nodes. Information on these techniques and other advance options is in this section.

Cluster advanced usage

The new FMRIB cluster, Ood, uses the SLURM cluster software and the fsl_sub module now uses the SLURM cluster.

SLURM is significantly different from Grid Engine, in particular, there are no RAM limits for jobs. We STRONGLY recommend that you specify RAM (with fsl_sub's -R option) to ensure efficient use of the cluster, without it, all jobs will default to requesting 15GB of RAM. This also means that the -S/--noramsplit option is meaningless.
To assist with converting scripts that utilise fsl_sub with queue names for the OOD cluster we have provided a script. To enable the script use:

module add queue_migration

And then call with arguments of your original Grid Engine based script and the name of the script to create with time based partition selection.

queue_migration myscript.sh myscript_slurm.sh

If you are targeting the bigmem.q in your script then this will default to requesting 64GB of RAM. If you require more than this then use the --ram option to queue_migration.

Queue mapping

Jalapeno Queue	Ood Queue
veryshort.q	short
short.q	short
long.q	long
verylong.q	long
bigmem.q	long (+ memory specifier)
interactive.q	Reserved for remote desktop system Can launch interactive tasks on any of the normal queues
gpu.q----------------->	gpu_short
\|_________________>	gpu_long

Multi-Threaded Tasks

fsl_sub's native options remain the same, but of note, SLURM does not support parallel environments, so when requesting multi-thread jobs slots you can use -s <number>. If you provide a parallel environment name this will be discarded, so existing scripts should continue to work as is.

Interactive GUI apps

Interactive tasks should be run via the new Open OnDemand virtual desktop facility.

How to run tasks on the cluster queues

Job Limits

All job submissions are subject to fair-share rules. If your job cannot be run without violating the rules enforced on the partition then you will receive an error message when submission is attempted.

After submission, if you see '(QOSxxxxx)' in the Node List/Reason column of squeue output then one of the fair-share rules is preventing the job from running now, but once your other running jobs complete they may be able to start (subject to fair-share between users). The text after 'QOS' indicates which rule is preventing the job from running.

Auto-submitting software

Some FSL commands and/or GUIs automatically queue themselves where appropriate, i.e., you do not need to use 'fsl_sub' to submit these programs.

Please note that this list may not be exhaustive, so you may come across more commands which have been adapted to queue themselves. If you do submit one of these tools to the queues then they will still run, but may not be able to make full use of the cluster resources (e.g. not be able to run multiple tasks in parallel).

Other commands run from the terminal command line will need to use the `fsl_sub` command, described below, to submit them to the queue.

Before submitting any tasks make sure you have loaded any shell modules you require - fsl_sub (and the required configuration information fsl_sub_config) are automatically loaded for you - for example to use FSL:

module add fsl

These lines can be added to your .bash_profile to ensure they take effect for every login session you have.

Submitting jobs with fsl_sub

Typing fsl_sub before the rest of your command will send the job to the cluster. By default, this will submit your task to the short partition. fsl_sub can automatically choose a queue for you if you provide information about your job's requirements - we would strongly recommend that you provide at least an estimated maximum run time (--jobtime) to allow SLURM to efficiently schedule job (See Why do I need to specify RAM and Time?). See How much RAM/Time does my job need to assist with setting these.

There are several ways to select a queue:

Use the -R (--jobram) and -T (--jobtime) options to fsl_sub to specify the maximum memory and run-time requirements for your job (in GB and minutes of wall time*) respectively. fsl_sub will then select the most appropriate queue for you.
GPU tasks can be requested using the --coprocessor options (see the Running GPU Tasks section).
Specify a specific partition with the -q (--queue) option. For further information on the available queues and which to use when see the queues section.

Notes:

The command you want to run on the queues must be in your path - this does NOT include the current folder. If it isn't then you must specify the path to the command; commands/scripts in the current folder must be prefixed with './', e.g. ''./script''.
The FMRIB SLURM cluster does not have a 'verylong' or 'bigmem' equivalent queue. See Long Running Tasks below.
Jobs submitted to the FMRIB SLURM cluster do NOT inherit the 'environment' of your login shell, e.g. environment variables such as FSLDIR are not copied over to your job. Load software configuration (such as FSL) from shell modules or use the '--export' option to fsl_sub to copy the variables to your job (see Passing Envrionment Variables to Queued Jobs).
Wall Time: Unlike the FMRIB Jalapeno cluster (which uses CPU time) the SLURM cluster measures job run-time in real time, often called wall time (as in the time on a clock on the wall).
To assess the time necessary for your job to complete you can look at the run-times of similar previous jobs using the 'sacct' command (see Monitoring Tasks).

Example Usage

To queue a job which requires 10GiB of memory and runs for 2 hours use:

fsl_sub -T 120 -R 10 ./myjob

This will result in a job being put on the short partition.

If your software task automatically queues then you can also specify the memory you expect the task to require with the environment variable FSLSUB_MEMORY_REQUIRED, for example:

FSLSUB_MEMORY_REQUIRED=32G feat mydesign.feat

would submit a FEAT task informing fsl_sub that you expect to require 32GiB of memory. If units aren't specified then the integer is assumed to be in the units specified in the configuration file (default GiB).

The different partitions have different run-times and memory limits, when a task reaches these limits it will be terminated; also shorter queues take precedence over the longer ones. It is advantageous to provide the scheduler with as much information about your job's memory and time requirements.

The command you submit cannot run any graphical interface, as they will have no where to display the output.

If you want to run a non-interactive MATLAB task on the queues then see MATLAB jobs.

fsl_sub Options

To see a full list of the available options use:

fsl_sub --help

In addition to the list of options this will also display a list of partitions available for use with descriptions of allowed run times and memory availability. For details on how to use these options see the Advanced Usage section.

Long running tasks

Unlike the Jalapeno cluster, the SLURM cluster does not offer 'infinite' partitions (equivalent to verylong.q, bigmem.q and the cuda.q on the jalapeno cluster). You must break your task up into shorter components or regularly save state to allow restart and submit these parts (or resubmit the job continuing where it left off) using job holds to prevent tasks running before the previous one completes.

Why do I need to specify RAM and Time?

When the SLURM scheduler works out where to run your job if often has to reserve space whilst existing tasks are completing. Ordinarily these RAM and CPU reservations cannot be used for other jobs, but if there are other pending jobs that will fit within this reservation and complete before the existing jobs do then these smaller tasks will be run. This can only work if the scheduler knows how much time and how much RAM all tasks will need.
Without -R/-T, jobs default to requesting the maximum configured for the partition chosen, e.g. -T 5760, -R 15 for the short partition. If all jobs in that partition have these identical values then nothing will be able to fill the reservation and that CPU/RAM will be left idle.

How much RAM/Time does my job need

If the software developer has provided minimum memory requirements for the software you are using then this is a good starting point for the amount of RAM. Time is trickier and would normally require you to run the software once.
Where RAM specifications aren't readily available, you should carry out a test run, possibly just on a named partition, or with RAM/time set to conservative values, e.g. 32GB, 24h. If the test job fails then check why and increase the appropriate resource.
While testing the resource requirements or after succesful completion of the job you can now query the cluster for usage statistics.

sacct -j <jobid> --format=JobID,Elapsed,MaxRSS,State

This will report at least three job 'segments' - some specialist jobs might report more sections:

JobID           Elapsed     MaxRSS      State 
------------ ---------- ---------- ---------- 
233            14:38:06             COMPLETED 
233.batch      14:38:06 160119060K  COMPLETED 
233.extern     14:38:06       112K  COMPLETED

This example show that the job ran for 14 hours, 38 minutes and the '.batch' component (the actual task) used up to 152GB.

With this information in hand, you should round up the hours, possibly adding an additional hour or two if the task takes longer than a day or is very close to the next hour boundary. RAM should be rounded up to the nearest 5GB boundary. So for this example, a minimum of -T 34200 and -R 155. If run time is very long, you may wish to extend these a little higher to avoid lost computation time if your particular run was just over the limit.

How to monitor the progress of your submitted job

Please see BMRC documentation on job monitoring: Checking or deleting running jobs

How to troubleshoot failed jobs

Occasionally tasks will fail. When tasks begin running they generate two files, jobname.ojobid (referred to as the .o file) and jobname.ejobid (referred to as the .e file), which by default are created in the folder from which fsl_sub was run (or where your specified on the command line - FEAT tasks will create a logs folder within the .(g)feat folder).

The .o file contained any text that the program writes to the console whilst running, for example:

fsl_sub ls my_folder

outputs the job id ''12345''. The task would generate a file ls.o12345 containing the folder listing for my_folder.

If your command produces a running commentary of its progress you could monitor this with the tail command:

tail -f command.o12345

This will continue displaying the contents of command.o12345, adding new content as it arrived until you exit (type CTRL-c).

The .e file contains any error messages the command produced. If you still need help then please contact the IT Team.

Killing jobs

If, after submitting a job you release that there is a problem, you can kill the job with the command

scancel job_id

If the job is currently running there may be a short pause whilst the task is terminated.

Available SLURM partitions

There are currently five partitions (often called queues on other platforms) configured in fsl_sub:

Queue	Duration	Max Memory	Default Memory	Purpose
short	1.2 days	1TB	15GB	batch jobs
long	10 days	1TB	15GB	batch jobs
interactive	10 days	256GB	15GB	interactive jobs
gpu_short	4h	94GB/gpu (A30) 48GB/gpu (H100)	48GB	GPU batch jobs
gpu_long	60h	94GB/gpu (A30) 48GB/gpu (H100)	48GB	GPU batch and interactive jobs

How to pass environment variables to SLURM jobs

By default no environment variables from your current shell are passed to your job running on the cluster.

Where the important variables were set by loading an environment module, you do not need to do anything as fsl_sub will automatically load the currently loaded modules in your cluster job, but for other variables you can request that fsl_sub pass a sub-set of variables to your job with the --export option (pass this multiple times to export multiple variables). You can also use this option to set an environment variable in your job that are not already set in your shell.

For very complicated use cases or dynamic variable setting, create a script that sets up all your variables and then calls the software - submit this script to the cluster.

There are two ways to use --export:

--export VARIABLENAME {--export VARIABLENAME} This will copy the current environment variable setting into your job (specify multiple times for multiple variables)
--export VARIABLENAME=VARIABLEVALUE This will set the environment variable to the value after the '=' in the queued job only (not effecting your shell) so is ideal where you need to specify a job specific value

fsl_sub will automatically transfer some internal variables and may have been configured to include some additional useful ones, see the 'exports:' option in the output of

fsl_sub --show_config

for the list of default exports. Any --export passed on the commandline will override these configured options. It is also possible to configure fsl_sub for your account to always copy over particular variables, see Configuring fsl_sub.

Option 2, where you provide the variable with a value is particularly useful if you are scheduling many similar tasks and need to specify a different value for an environment variable for each run, for example SUBJECTS_DIR which FreeSurfer uses to specify where your data sets reside.

Which tools automatically submit themselves to a cluster queue

Which tools automatically use a cluster?

The following programs/scripts are able to self-submit in a HPC cluster and should not be used in conjunction with fsl_sub.

Scripts that self-submit
FDT	bedpostx
FEAT	feat feat_gm_prepare
FIRST	run_first_all
FSLVBM	fslvbm_2_template fslvbm_3_proc
POSSUM	possumX
RANDOMISE	randomise_parallel
TBSS	tbss_2_reg
GUIs that self-submit
FDT	FDT GUI ( probtrackx only )
FEAT	FEAT GUI
FLIRT	FLIRT GUI
POSSUM	POSSUM GUI

Note that all other FSL GUIs will only run jobs on the local machine, to submit to a cluster you must use the equivalent command-line call in conjunction with fsl_sub.

Partition	QOS Limits across partition
short	48 Jobs 96 CPUs (threads) 1.5TB of memory across all jobs 256GB of memory per job
long	16 Jobs 32 CPUs (threads) 768GB of memory across all jobs 256GB of memory per job
gpu_short	4 jobs 40 CPUs (threads) 128GB of memory across all jobs 96GB of memory per job minimum of 1 GPU maximum of 4 GPUs
gpu_long	2 jobs 40 CPUs (threads) 128GB of memory across all jobs 96GB of memory per job 1 GPU
Interactive Jobs	1 of each of Desktop, MATLAB, Jupyter and RStudio 8 CPUs (threads) (minimum of 2) 256 GB of memory

Cookies on this website