Submitting Jobs
How to run tasks on the cluster queues
Auto-submitting software
Some FSL commands and/or GUIs automatically queue themselves where appropriate, i.e., you do not need to use 'fsl_sub' to submit these programs.
Please note that this list may not be exhaustive, so you may come across more commands which have been adapted to queue themselves. If you do submit one of these tools to the queues then they will still run, but may not be able to make full use of the cluster resources (e.g. not be able to run multiple tasks in parallel).
Other commands run from the terminal command line will need to use the `fsl_sub` command, described below, to submit them to the queue.
module add fsl
These lines can be added to your .bash_profile to ensure they take effect for every login session you have.
Submitting jobs with fsl_sub
Typing fsl_sub before the rest of your command will send the job to the cluster. By default, this will submit your task to the short partition. fsl_sub can automatically choose a queue for you if you provide information about your job's requirements - we would strongly recommend that you provide at least an estimated maximum run time (--jobtime) to allow SLURM to efficiently schedule job.
There are several ways to select a queue:
- Use the -R (--jobram) and -T (--jobtime) options to fsl_sub to specify the maximum memory and run-time requirements for your job (in GB and minutes of wall time*) respectively. fsl_sub will then select the most appropriate queue for you.
GPU tasks can be requested using the --coprocessor options (see the Running GPU Tasks section). - Specify a specific partition with the -q (--queue) option. For further information on the available queues and which to use when see the queues section.
- The command you want to run on the queues must be in your path - this does NOT include the current folder. If it isn't then you must specify the path to the command; commands/scripts in the current folder must be prefixed with './', e.g. ''./script''.
- The FMRIB SLURM cluster does not have a 'verylong' or 'bigmem' equivalent queue. See Long Running Tasks below.
- Jobs submitted to the FMRIB SLURM cluster do NOT inherit the 'environment' of your login shell, e.g. environment variables such as FSLDIR are not copied over to your job. Load software configuration (such as FSL) from shell modules or use the '--export' option to fsl_sub to copy the variables to your job (see Passing Envrionment Variables to Queued Jobs).
- Wall Time: Unlike the FMRIB Jalapeno cluster (which uses CPU time) the SLURM cluster measures job run-time in real time, often called wall time (as in the time on a clock on the wall).
To assess the time necessary for your job to complete you can look at the run-times of similar previous jobs using the 'sacct' command (see Monitoring Tasks).
Example Usage
To queue a job which requires 10GiB of memory and runs for 2 hours use:
fsl_sub -T 120 -R 10 ./myjob
FSLSUB_MEMORY_REQUIRED=32G feat mydesign.feat
would submit a FEAT task informing fsl_sub that you expect to require 32GiB of memory. If units aren't specified then the integer is assumed to be in the units specified in the configuration file (default GiB).
The different partitions have different run-times and memory limits, when a task reaches these limits it will be terminated; also shorter queues take precedence over the longer ones. It is advantageous to provide the scheduler with as much information about your job's memory and time requirements.
The command you submit cannot run any graphical interface, as they will have no where to display the output.
If you want to run a non-interactive MATLAB task on the queues then see MATLAB jobs.
fsl_sub Options
To see a full list of the available options use:
fsl_sub --help
In addition to the list of options this will also display a list of partitions available for use with descriptions of allowed run times and memory availability. For details on how to use these options see the Advanced Usage section.
Long running tasks
Unlike the Jalapeno cluster, the SLURM cluster does not offer 'infinite' partitions (equivalent to verylong.q, bigmem.q and the cuda.q on the jalapeno cluster). You must break your task up into shorter components or regularly save state to allow restart and submit these parts (or resubmit the job continuing where it left off) using job holds to prevent tasks running before the previous one completes.