Advanced Usage
If your task comprises a complicated pipeline of interconnected tasks there are several options for splitting into dependent tasks or parallelisation of independent portions across many cluster nodes. Information on these techniques and other advance options is in this section.
How to request a GPU for your job
Whilst GPU tasks can simply be submitted to the short.gq or long.gq queues fsl_sub also provides helper options which can automatically select a GPU queue and select the appropriate CUDA toolkit for you.
- -c|--coprocessor <coprocessor name>: This selects the coprocessor with the given name (see fsl_sub --help for details of available coprocessors)
- --coprocessor_multi <number>: This allows you to request multiple GPUs. On the FMRIB cluster you can select no more than two GPUs. You will automatically be given a two-slot openmp parallel environment
- --coprocessor_class <class>: This would allow you to select which GPU hardware model you require, see fsl_sub --help for details
- --coprocessor_toolkit <toolkit version>: This allows you to select the API toolkit your sofware needs. This will automatically make available the requested CUDA libraries where these haven't been compiled into the software
- cuda selects GPUs capable of high-performance double-precision workloads and would normally be used for queued tasks such as Eddy and BedpostX.
- cuda_all selects all GPUs.
- cuda_ml selects GPUs more suited to machine learning tasks, they typically have very poor double-precision performance, instead being optimised for single, half and quarter precision workloads - use these for tasks involving ML inference and development, although training may still be more optimal on the general purpose GPUs depending on the task this involves, ask the developer of the software for advice on this. In the case of the FMRIB SLURM cluster there is no difference in double precision capability for all our GPUs - this partition is only included to allow for straightforward porting of your scripts to BMRC's cluster.
INTERACTIVE JOBS (INCLUDING GPU/MACHINE LEARNING TASKS)
Where your program requires interaction you can select a GPU when requesting a VDI, graphical MATLAB, Jupyter or RStudio session.
Alternatively, within a VDI session, you can request a text only interactive session using:
salloc -p gpu_short --gres=gpu:1 --cpus-per-gpu=2 --mem-per-cpu=8G
(...wait for job allocation...)
srun --pty /bin/bash -l
There may be a delay during the salloc command whilst the system finds a suitable host. Adapt the options as required, the example above requests:
- -p gpu_short - gpu_short partition (1.25 days)
- --gres=gpu:1 - requests a single gpu, for a specific type use `gpu:k40:1` and change the number to 2 to request two GPUs
- --cpus-per-gpu=2 - requests two CPU cores for each GPU allocated.
- --mem-per-cpu=8G - allocates 16GB of memory for the task.
The `srun` command then launches a terminal into this interactive job.
When you have finished, use the command `exit` twice to return to your original terminal.
How to request a multi-threaded slot and how to ensure your software only uses the CPU cores it has been allocated
Running multi-threaded programs can cause significant problems with cluster scheduling software if the clustering software is not made aware of the multiple threads (your job is allocated one slot but actually consumes many more, often ALL the CPUs, overloading the machine).
We support the running of shared memory multi-threaded software only (e.g. OpenMP, multi-threaded MKL, OpenBLAS etc).
To submit an OpenMP job, use the -s (or --parallelenv) option to fsl_sub. For example:
fsl_sub -s 2 <command or script>
2 being the number of threads you wish to allocate to your jobs.
The task running on the queue will be able to determine how many slots it has by querying the environment variable pointed to by FSLSUB_NSLOTS. For example in BASH the number of slots is equal to ${!FSLSUB_NSLOTS}.
In Python you would be able to get this figure with the following code:
import os slots = os.environ[os.environ['FSLSUB_NSLOTS']]
Within MATLAB you can control the number of slots with:
n = getenv("FSLSUB_NSLOTS"); LASTN = maxNumCompThreads(n);
To be able to provide these threads the cluster software needs to reserve slots on compute nodes, this may lead to significant wait times whilst sufficient slots become available on a single device.
How to submit non-interactive MATLAB scripts to the queues
Wherever possible DO NOT run full MATLAB directly on the cluster, instead compile your code (see the MATLAB page) but where this is not possible or you only need to run a quick single job task it is acceptable to run the full MATLAB environment on the cluster.
Any non-interactive MATLAB task needs to be submitted by creating a file (typically with the extension '.m'), eg 'myfile.m' with all your MATLAB commands in and submit it using 'fsl_sub'; once the task is running you can look at the file "matlab.o<jobid>" for any output.
fsl_sub -q short.q matlab -singleCompThread -nodisplay -nosplash \< mytask.m
NB The "\" is very important since MATLAB won't read your script otherwise.
Warning: MATLAB tasks will often attempt to carry out some operations using multiple threads. Our cluster is configured to run only single thread programs unless you request multiple threads. SLURM will enforce these limits so preventing MATLAB from overloading the system.
If you wish to take advantage of the multi-threaded facilities in MATLAB request multiple cores with the -s option to fsl_sub.
Where you must interact with the process see the section on the MATLAB gui within the VDI.
Environment variables that can be set to control fsl_sub submitted tasks
Available Environment Variables
fsl_sub sets or can be controlled with the following shell variables. These can be set either for the duration of the fsl_sub run by prepending the call with the setting of the value:
ENVVAR=VALUE fsl_sub ...
or by exporting the value to your shell so that all subsequent calls will also have this variable set this way:
export ENVVAR=VALUE
Envrionment variable | Who sets | Purpose | Example values |
---|---|---|---|
FSLSUB_JOBID_VAR | fsl_sub | Variable name of Grid job id | JOB_ID |
FSLSUB_ARRAYTASKID_VAR | fsl_sub | Variable name of Grid task id | SGE_TASK_ID |
FSLSUB_ARRAYSTARTID_VAR | fsl_sub | Variable name of Grid first task id | SGE_TASK_FIRST |
FSLSUB_ARRAYENDID_VAR | fsl_sub | Variable name of Grid last task id | SGE_TASK_LAST |
FSLSUB_ARRAYSTEPSIZE_VAR | fsl_sub | Variable name of Grid step between task ids | SGE_TASK_STEPSIZE |
FSLSUB_ARRAYCOUNT_VAR | fsl_sub | Variable name of Grid number of tasks in array | Not supported in Grid Engine |
FSLSUB_MEMORY_REQUIRED | You | Advise fsl_sub of expected memory required | 32G |
FSLSUB_PROJECT | You | Name of Grid project to run jobs under | MyProject |
FSLSUB_PARALLEL | You/fsl_sub | Control array task parallelism when running without a cluster engine (e.g. when a queued task itself submits an array task) | 4 (for four threads), 0 to let fsl_sub's shell plugin use all available cores |
FSLSUB_CONF | You | Provides the path to the configuration file | /usr/local/etc/fslsub_conf.yml |
FSLSUB_NSLOTS | fsl_sub | Variable name of Grid allocated slots | NSLOTS |
FSLSUB_DEBUG | You/fsl_sub | Enable debugging in child fsl_sub | 1 |
FSLSUB_PLUGINPATH | You | Where to find installed plugins (do not change this variable) | /path/to/folder |
FSLSUB_NOTIMELIMIT | You | Disable notification of job time to the cluster | 1 |
Where a FSLSUB_* variable is a reference to another variable you need to read the content of the referred to variable. This can be achieved as follows:
BASH: the number of slots is equal to ${!FSLSUB_VARIABLE}
Python:
import os
value = os.environ[os.environ['FSLSUB_VARIABLE']]
NSLOT_VAR = getenv('FSLSUB_VARIABLE')
N = getenv(NSLOT_VAR)
How to change fsl_sub's configuration for all jobs you run
Some of the operation of fsl_sub can be configured such that all runs will enable/disable features. To configure fsl_sub create a file ~/.fsl_sub.yml and add the configuration to this file - it is in YAML format. To see what the current configuration is use:
fsl_sub --show_config
Take care - the system configuration has been setup to be optimal for the cluster, changing these settings may cause your job to fail.
FSL_SUB.YML SECTIONS
TOP LEVEL
These options control the basic operation of fsl_sub and are keys in a YAML dictionary. To change a setting add 'keyname: value' to your file with no indent.
Key name | Default | Purpose | Examples/Allowed Options |
---|---|---|---|
method | 'shell', 'slurm' (or 'sge') | Define whether to use the cluster ('slurm') or run things without a cluster ('shell') | 'shell' or the name of an installed plugin, e.g. 'slurm' |
ram_units | 'G' | When -R is specified, what are the units | 'K', 'M', 'G', 'T', 'P'(!) - recommend this is not changed |
modulecmd | False | Where 'modulecmd' is not findable via PATH, where is the program | Path to modulecmd |
export_vars | Empty list = [] | List of environment variables (with optional values) to always pass to jobs running on the cluster. List you provide will be added to the default list | [SUBJECTSDIR, "MYVARIABLE=MYVALUE"] The list can also be specified by starting a new line and adding items as ' - SUBJECTSDIR' (note the two spaces before the '-') on separate lines |
thread_control | ['OMP_NUM_THREADS', 'MKL_NUM_THREADS', 'MKL_DOMAIN_NUM_THREADS', 'OPENBLAS_NUM_THREADS', 'GOTO_NUM_THREADS'] | Environment variables to set to ensure threads are limited to those requested by a parallel envrionment. Any values you configure will be added to the default list. | Names of environment variables |
method_opts | {} | Control the method that runs your job | See below |
coproc_opts | {} | Control the coprocessor options | Should not be changed |
queues | {} | Control the queues | Must not be changed |
METHOD_OPTS
method_opts: shell: parallel_disable_matches: - "*_string"
parrallel_disable_matches enables you to specify portions of a command name that should never be attempted to be run in parallel when submitted as an array task but running with the shell backend. The default list contains '*_gpu' which ensures that the FSL GPU enabled tools do not attempt to start up in parallel as they are likely to be unable to access multiple GPUs. fsl_sub supports matching a full program name, a full path to a program and *<name> and <name>* to match the end or start of a program name respectively.
method_opts: slurm: keep_jobscript: True|False
or for the legacy Jalapeno cluster:
method_opts: sge: keep_jobscript: True|False
When the cluster backends submit your job they generate a submission script, the keep_jobscript option will leave a copy of this script in the current folder for reference or for later reuse
You can also control this on a job by job basis with the option --keep_jobscript, but where tasks don't allow this (e.g. FEAT) you can control this here.
Other potentially useful submission options or techniques
Capturing job submission information
fsl_sub can store the commands used to submit the job if you provide the option --keep_jobscript. When provided, post submission you will find a file in the current folder (assuming you have write permissions there) a script called wrapper-<jobid>.sh. This exact submission may be repeated by using:
fsl_sub -F wrapper-<jobid>.sh
The script contents is described below:
#!/bin/bash | Run the script in BASH |
#SBATCH OPTION | SLURM options |
#SBATCH OPTION | |
module load <module name> | Load a Shell Module |
# Built by fsl_sub v.2.3.0 and fsl_sub_plugin_sge v.1.3.0 | Version of fsl_sub and plugin that submitted the job |
# Command line: <command line> | Command line that invoked fsl_sub |
# Submission time (H:M:S DD/MM/YYYY) <date/time> | Date and time that the job was submitted |
<command> |
<command> |
PASSING ENVIRONMENT VARIABLES TO QUEUED JOBS
It is not possible to inherit all the environment variables from the shell that submits a job, so fsl_sub allows you to specify environment variables that should be transferred to the job. This can also be useful if you are scheduling many similar tasks and need to specify a different value for an environment variable for each run, for example SUBJECTS_DIR which FreeSurfer uses to specify where your data sets reside. The --export option is used for this purpose.
SKIPPING COMMAND VALIDATION
By default fsl_sub will check the command given (or the commands in the lines in an array task file) can be found and are executable. If this causes issues, often because a particular program is only available on the compute nodes, not on jalapeno itself, then you can disable this check with -n (--novalidation).
Requesting a specific resource
Some resources may have a limited quantity available for use, e.g. software licenses or RAM. fsl_sub has the ability to request these resources from the cluster (the --coprocessor options do this to automatically to request the appropriate number of GPUs). The option -r (--resource) allows you to pass a resource string directly through to the Grid Engine software. If you need to do this you will be advised by the computing help team or software documentation the exact string to pass.
How to submit pipeline stages such that they wait for their predecessor to complete
If you have a multi-stage task to run, you can submit the jobs all at once, specifying that later stages must wait for the previous task to complete. This is achieved by providing the '-j' (or --jobhold) option with the job id of the task to wait for. For example:
jid=$(fsl_sub -R 3 -T 16 ./my_first_stage) fsl_sub -R 1 -T 8 -j $jid ./my_second_stage
Note the $() surrounding the first fsl_sub command, this captures the output of a command and stores the text in the variable 'jid'. This is then passed as the job id to wait for before running 'my_second_stage'.
It is also possible to submit array holds with the --array_hold command which takes the job id of the predecessor array task. This can only be used when both the first and subsequent job are both array tasks of the same size (same number of sub-tasks) and each sub-task in the second array depends only on the equivalent sub-task in the first array.
How to submit independent 'clone' tasks for running in parallel
An array task is a set of closely related tasks that do not rely on the output of any other members of the set of jobs. An example might be where you need to process each slice of a brain volume but there is no need to know or effect the content of any other slice (the array tasks can't communicate with each other to advise of changes to data). These tasks allow you to submit large numbers of discrete jobs and manage them under one job id, with each sub-task being allocated a unique task id and potentially able to run in parallel given enough compute slot availability.
You can submit an array task with the -t/--array_task option or with the --array_native option:
TEXT FILE ARRAY TASKS
The -t (or --array_task) option needs the name of a text file that contains the array task commands, one per line. Sub-tasks will be generated from these lines, with the task ID being equivalent to the line number in the file (starting from 1). e.g.
fsl_sub -R 12 -T 8 -t ./myparalleljobs
The array task has a parent job id which can be used to control/delete all of the sub-tasks, the sub-tasks may be specified as job id:sub-task id, eg ''12345:10'' for sub-task 10 of job 12345.
NATIVE ARRAY TASKS
The --array_task option requires an argument n[-m[:s]] which specifies the array:
- n provided alone will run the command n-times in parallel
- n-m will run the command once for each number in the range with task ids equal to the position in this range
- n-m:s similarly, but with s specifying the increment in task id.
The cluster software will set environment variables that the script/binary can use to determine what task they need to carry out. For example, this might be used to represent the brain volume slice to process. As these environment variables differ between different cluster software, fsl_sub sets several environment variables to the name of the environment variable the script can use to obtain it's task id from the cluster software:
Envrionment variable | ...points to variable containing |
---|---|
FSLSUB_JOBID_VAR | job id |
FSLSUB_ARRAYTASKID_VAR | task id |
FSLSUB_ARRAYSTARTID_VAR | first task id |
FSLSUB_ARRAYENDID_VAR | last task id |
FSLSUB_ARRAYSTEPSIZE_VAR | step between task ids |
FSLSUB_ARRAYCOUNT_VAR | number of tasks in array (not supported in Grid Engine) |
To use these you need to look up the variable name and then read the value from the variable, for example in BASH use ${!FSLSUB_ARRAYTASKID_VAR} to get the value of the task id.
Important The tasks must be truly independent - ie, they must not write to the same file(s) or rely on calculations in other array jobs in this set otherwise you may get unpredictable results (or sub-tasks may crash).
LIMITING CONCURRENT ARRAY TASKS
Sometimes it may be necessary to limit the number of array sub-tasks runnning at any one time. You can do this by providing the -x (or --array_limit) option which takes a integer, e.g.:
fsl_sub -T10 -x 10 -t ./myparalleljobs
Will limit sub-tasks to ten running at any one time.
ARRAY TASKS WITH THE SHELL RUNNER
If running without a cluster backend or when fsl_sub is called from within an already scheduled task, the shell backend is capable of running array tasks in parallel. If running as a cluster job, the shell plugin will run no more than the number of threads selected in your parallel environment (if one is specified, default is one task at a time).
If you are not running on a cluster then by default fsl_sub will use all of the CPUs on your system. You can control this either using the -x|--array-limit option or by setting the environment variable FSLSUB_PARALLEL to the maximum number of array tasks to run at once. It is also possible to configure this in your own personal fsl_sub configuration file (see below).