Submitting A Job

Submitting a job to the FMRIB cluster

Some FSL commands and/or GUIs automatically queue themselves where appropriate, i.e., you do not need to use ''fsl_sub'' to submit these programs. For a list of these FSL programs see FSL's auto-submission chart.

Please note that this list may not be exhaustive, so you may come across more commands which have been adapted to queue themselves. If you do submit one of these tools to the queues then they will still run, but may not be able to make full use of the cluster resources (e.g. they may not be able to run multiple tasks in parallel - see advanced usage).

Any other commands run from the terminal command line can be scheduled for running on the cluster using the ''fsl_sub'' command, described below. Of course you are free to use the underlying cluster submission software tools, fsl_sub is an easy to use wrapper around these tools which aims to provide a cluster software agnostic submission method.

Submitting jobs with fsl_sub

Typing fsl_sub before the rest of your command will send the job to the queues. By default, this will submit your task to the queue long.q. If your job requres less time than this or different features/higher memory requirements then you will need to select a more appropriate queue. There are two ways to do this:

Use the -R (--jobram) and -T (--jobtime) options to fsl_sub to specify the maximum memory and run-time requirements for your job (in GB and minutes of CPU time*) respectively. fsl_sub will then select the most appropriate queue for you. GPU tasks can be requested using the --coprocessor options (see the Running GPU Tasks section below).
Specify a specific queue with the -q (--queue) option. For further information on the available queues and which to use when see the queues section.

NB The command you want to run on the queues must be in your program search path (e.g. the list of folders in the variable $PATH) - this does NOT include the current folder. For non-pathed locations you must specify the full filesystem path to the command; commands/scripts in the current folder must be prefixed with './', e.g. ''./script''.

Specifying CPU Time

The FMRIB cluster is configured to limit job run time based on the time your task spends running code, not the actual time that has passed since the job started. In all cases CPU time is less than actual time and will differ between cluster nodes due to differences in hardware generations and the number of jobs running concurrently. We recommend you overestimate by around 20%. You can use the 'qacct' command to get run-time information for previously completed jobs to assist with this (see Monitoring Tasks).

For example to queue a job which requires 10GiB of memory and runs for 2 hours use:

fsl_sub -T 120 -R 10 ./myjob

This will result in a job being put on the short.q.

Requesting memory for automatically queued jobs

If your software task automatically queues then you can also specify the memory you expect the task to require with the environment variable FSLSUB_MEMORY_REQUIRED, for example:

FSLSUB_MEMORY_REQUIRED=32G feat mydesign.feat

would submit a FEAT task informing fsl_sub that you expect to require 32GiB of memory. If units aren't specified then the integer is assumed to be in the units specified in the configuration file (default GiB).

If your task requires more than the allocated RAM for a particular queue then fsl_sub will automatically request a parallel environment with sufficient slots to accommodate the required RAM. This has the side effect of providing additional CPU cores to your job. If the software supports 'thread' parallelism then you may find your job runs faster.

For very large memory tasks this might result in scheduling difficulties if you also have a long run-time. On the jalapeno cluster in these scenarios you may need to manually select the bigmem.q. At this time, if your job would ordinarily need to run on the verylong.q and requires 12GB or more of memory then please do not specify RAM/time and instead manually select the bigmem.q otherwise your task may take several weeks to schedule.

The different queues have different run-times and memory limits, when a task reaches these limits it will be terminated; also shorter queues take precedence over the longer ones. Given this, you should choose queues carefully to ensure your job is allowed to complete and does so in a timely manner.

The command you submit cannot run any graphical interface, as they will have no where to display the output. For most tasks this is not a problem, but some programs (in particular some MATLAB tasks) insist on displaying a progress bar or similar graphical output. In these cases, we provide a virtual X11 display system which can be used to dispose of this unnecessary output. If you want to run a non-interactive MATLAB task on the queues then see the MATLAB section.

FSL_SUB OPTIONS

To see a full list of the available options use:

fsl_sub --help

In addition to the list of options this will also display a list of cluster queues available for use with descriptions of allowed runtimes and memory/parallel environment availability. For details on how to use these options see the Advanced Usage section.

LONG RUNNING TASKS

Whilst we provide queues with infinte run times (verylong.q, bigmem.q and the cuda.q queues) we strongly recommend that you attempt to break your task up into shorter components where possible - there are many more slots on the shorter queues and tasks running for many weeks or months are at risk of loss due to power cuts or server faults. Where chunking the analysis is not possible you should investigate whether it is possible to save job state at regular points (often called checkpointing) in such a way that the job can be restarted at a checkpoint without loosing work carried out to that point. If the program supports this behaviour then you could submit several runs to finite queues with job holds in place to allow the job to run to completion with regular restarts.