Queued Analysis Tasks

How to submit jobs to the FMRIB cluster and how to monitor them

Introduction

WIN operate a compute cluster formed from rack-mounted multi-core computers. To ensure eficient use of the hardware, tasks are distributed amongst these computers using grid scheduling software. This software monitors the utilisation of the computers in the cluster, launching new jobs onto the least used computers, preventing over loading of machines whilst ensuring a fair share of compute resources amongst all users of the system. When you submit a job it will sit in a queue until such time as the scheduler software identifies a viable empty slot and your job has reached the top of the queue. The fair share algorithm in use ensures that heavy users of the system are less likely to reach the top than users who rarely use the system (this is cleared on a regular basis so that you aren't deprioritised forever).

Grid Engine and the queues

WIN's cluster runs the Grid Engine (GE) queuing software (using the Son of Grid Engine distribution). To ease job submission we provide a helper called fsl_sub which sets some useful options to Grid Engine's built-in qsub command.

GE manages a set of queues representing the available resources. Tasks are submitted to GE queues for distribution across the execution hosts. These queues are designed to divide the resources according to usage profiles to ensure that the majority of tasks get done in a favourable time-frame (see Jalapeno Queues).

The Jalapeno Cluster

The jalapeno cluster consists of a main user-accessible computer (some times called a Head Node) 'jalapeno' and a farm of processing nodes which are not directly visible. 'Jalapeno' itself should only be used to submit tasks and view results. All other tasks must be run via the job submission system here; in this way jobs get shared among the available processing nodes.

Any non-trivial jobs found running on Jalapeno will be killed without warning if they impact on others use of the computer.

Using the FMRIB Cluster

Submitting a job to the FMRIB cluster

Some FSL commands and/or GUIs automatically queue themselves where appropriate, i.e., you do not need to use ''fsl_sub'' to submit these programs. For a list of these FSL programs see FSL's auto-submission chart.

Please note that this list may not be exhaustive, so you may come across more commands which have been adapted to queue themselves. If you do submit one of these tools to the queues then they will still run, but may not be able to make full use of the cluster resources (e.g. they may not be able to run multiple tasks in parallel - see advanced usage).

Any other commands run from the terminal command line can be scheduled for running on the cluster using the ''fsl_sub'' command, described below. Of course you are free to use the underlying cluster submission software tools, fsl_sub is an easy to use wrapper around these tools which aims to provide a cluster software agnostic submission method.

Submitting jobs with fsl_sub

Typing fsl_sub before the rest of your command will send the job to the queues. By default, this will submit your task to the queue long.q. If your job requres less time than this or different features/higher memory requirements then you will need to select a more appropriate queue. There are two ways to do this:

Use the -R (--jobram) and -T (--jobtime) options to fsl_sub to specify the maximum memory and run-time requirements for your job (in GB and minutes of CPU time*) respectively. fsl_sub will then select the most appropriate queue for you. GPU tasks can be requested using the --coprocessor options (see the Running GPU Tasks section below).
Specify a specific queue with the -q (--queue) option. For further information on the available queues and which to use when see the queues section.

NB The command you want to run on the queues must be in your program search path (e.g. the list of folders in the variable $PATH) - this does NOT include the current folder. For non-pathed locations you must specify the full filesystem path to the command; commands/scripts in the current folder must be prefixed with './', e.g. ''./script''.

Specifying CPU Time

The FMRIB cluster is configured to limit job run time based on the time your task spends running code, not the actual time that has passed since the job started. In all cases CPU time is less than actual time and will differ between cluster nodes due to differences in hardware generations and the number of jobs running concurrently. We recommend you overestimate by around 20%. You can use the 'qacct' command to get run-time information for previously completed jobs to assist with this (see Monitoring Tasks).

For example to queue a job which requires 10GiB of memory and runs for 2 hours use:

fsl_sub -T 120 -R 10 ./myjob

This will result in a job being put on the short.q.

Requesting memory for automatically queued jobs

If your software task automatically queues then you can also specify the memory you expect the task to require with the environment variable FSLSUB_MEMORY_REQUIRED, for example:

FSLSUB_MEMORY_REQUIRED=32G feat mydesign.feat

would submit a FEAT task informing fsl_sub that you expect to require 32GiB of memory. If units aren't specified then the integer is assumed to be in the units specified in the configuration file (default GiB).

If your task requires more than the allocated RAM for a particular queue then fsl_sub will automatically request a parallel environment with sufficient slots to accommodate the required RAM. This has the side effect of providing additional CPU cores to your job. If the software supports 'thread' parallelism then you may find your job runs faster.

For very large memory tasks this might result in scheduling difficulties if you also have a long run-time. On the jalapeno cluster in these scenarios you may need to manually select the bigmem.q. At this time, if your job would ordinarily need to run on the verylong.q and requires 12GB or more of memory then please do not specify RAM/time and instead manually select the bigmem.q otherwise your task may take several weeks to schedule.

The different queues have different run-times and memory limits, when a task reaches these limits it will be terminated; also shorter queues take precedence over the longer ones. Given this, you should choose queues carefully to ensure your job is allowed to complete and does so in a timely manner.

The command you submit cannot run any graphical interface, as they will have no where to display the output. For most tasks this is not a problem, but some programs (in particular some MATLAB tasks) insist on displaying a progress bar or similar graphical output. In these cases, we provide a virtual X11 display system which can be used to dispose of this unnecessary output. If you want to run a non-interactive MATLAB task on the queues then see the MATLAB section.

FSL_SUB OPTIONS

To see a full list of the available options use:

fsl_sub --help

In addition to the list of options this will also display a list of cluster queues available for use with descriptions of allowed runtimes and memory/parallel environment availability. For details on how to use these options see the Advanced Usage section.

LONG RUNNING TASKS

Whilst we provide queues with infinte run times (verylong.q, bigmem.q and the cuda.q queues) we strongly recommend that you attempt to break your task up into shorter components where possible - there are many more slots on the shorter queues and tasks running for many weeks or months are at risk of loss due to power cuts or server faults. Where chunking the analysis is not possible you should investigate whether it is possible to save job state at regular points (often called checkpointing) in such a way that the job can be restarted at a checkpoint without loosing work carried out to that point. If the program supports this behaviour then you could submit several runs to finite queues with job holds in place to allow the job to run to completion with regular restarts.

How to check on the progress of your jobs

Monitoring tasks

The Grid Engine software provides the qstat command for listing the state of all the queues, by default only listing jobs that you have submitted. Use:

qstat -help

for a list of all the options. If you want to see the overall state of the queues, including everyone else's tasks, then use:

qstat -u \*

INTERPRETING QSTAT'S OUTPUT

The qstat listing has several columns indicating the jobid, priority, owner, taskname, status and which queue the task was submitted to or is running on. The most important details are described below:

Priority - A floating point number in the second column of output indicates the relative priority of each task. The task with the highest priority, shown at the top of the pending list, will be the next one to get access to any available resources.
State - A string of characters EhqwRrdTsS indicating the following conditions:

State characters	Meaning
E	The task is in the Error state. Contact IT Support Staff for help.
h	Job is *held* until completion of some other task. Use qstat -j <jobid> to find out which task(s) it is dependent on.
qw	Job is in the *queued and waiting* or *pending* state. This task will be submitted to an execution host as soon as one becomes available and the task priority is the highest of those in the pending state.
r	This task is *running*. An extra field indicates which actual execution host the task is running on, e.g. long.q@jalapeno01.cluster.fmrib.ox.ac.uk etc.
R	*Re-scheduled*. This will usually mean one of the operators has restarted the task, perhaps because a node crashed. Contact IT Support Staff for a full explanation. For some jobs this will result in failure or corrupted output so please check the output of these tasks carefully.
d	*Deleted*. This job has been scheduled for deletion.
dr	*Deleted but still running*. This happens when a job is deleted but the node which was running the task isn't responding to the request to remove the task. You should contact computing-help@win.ox.ac.uk.
s/S/T	*Suspended*. This job has been temporarily suspended. Probably due to a machine becoming overloaded with higher priority tasks. It will resume once the load reduces sufficiently: s is a job suspend S is a queue suspend T is a suspended task because the node is overloaded
t	Transitioning. This job is starting.

EXAMINING COMPLETED JOBS

Once a job completes, qstat will no longer be able to find the job id. You can now query the cluster software using the qacct command.

N.B. Due to the number of jobs submitted to our cluster for performance reasons the database of completed jobs is regularly rotated. We provide a command qacct-all which will call qacct on all the archived job databases, try this if qacct does not return information on your job.

qacct takes several options but the most useful one is '-j <jobid>' which returns information on the provided job id. Of the information this command provides the most useful entries are:

Entry	Purpose
qname	Name of queue this ran on
hostname	Name of node job ran on - useful for IT Help to troubleshoot issues
start/end_time	Start and end time (real) - useful if there was a known issue at a particular time
slots	How many parallel environment slots/threads your job had
failed & exit_status	Whether the cluster software thinks the job failed and the exit status of the job N.B. There are many ways a job can fail but the cluster will not be aware so this is not necessarily proof that a job completed successfully
ru_wallclock	Real time run-time of job
cpu	CPU time of job (seconds)
maxvmem	Maximum memory job required (units given)

QSTAT EXAMPLES

3 joebloggs@jalapeno $ qstat -u \*
job-ID prior name user state submit/start at queue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
28704 0.55002 Connectivi moonunit r 04/23/2006 08:28:23 long.q@jalapeno10.cluster.fmrib.ox.ac.uk 1
28714 0.55002 feat_aoEky apple r 04/23/2006 05:25:23 long.q@jalapeno23.cluster.fmrib.ox.ac.uk 1
23668 0.55013 feat_aftwD heavenly dr 06/22/2005 22:28:16 long.q@jalapeno23.cluster.fmrib.ox.ac.uk 1
28706 0.55002 Connectivi moonunit R 04/21/2006 20:19:35 long.q@jalapeno01.cluster.fmrib.ox.ac.uk 1
28707 0.55002 Connectivi moonunit qw 04/21/2006 20:19:41 1
28673 0.55002 bedpost brooklyn qw 04/20/2006 15:49:43 11-42:1
27378 0.00000 STDIN geronimo hqw 02/10/2006 16:53:47 1
28544 0.00000 Franklin fuschia Eqw 04/06/2006 20:08:14 1
28674 0.00000 bp_postpro brooklyn hqw 04/20/2006 15:49:43 1

TO SEE A LISTING OF JUST THE "RUNNING" JOBS:

4 joebloggs@jalapeno $ qstat -u \* -s r
job-ID prior name user state submit/start at queue slots ja-task-ID 
----------------------------------------------------------------------------------------------------------------- 
28704 0.55002 Connectivi moonunit r 04/23/2006 08:28:23 long.q@jalapeno10.fmrib.ox.ac.uk 1 
28714 0.55002 feat_aoEky apple r 04/23/2006 05:25:23 long.q@jalapeno23.cluster.fmrib.ox.ac.uk 1
28706 0.55002 Connectivi moonunit R 04/21/2006 20:19:35 long.q@jalapeno01.cluster.fmrib.ox.ac.uk 1

TO SEE THE STATE OF ONLY THOSE JOBS BELONGING TO A PARTICULAR USER:

5 joebloggs@jalapeno $ qstat -u apple
job-ID prior name user state submit/start at queue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
28714 0.55002 feat_aoEky apple r 04/23/2006 05:25:23 long.q@jalapeno01.cluster.fmrib.ox.ac.uk 1
28721 0.55002 feat_afpcn apple r 04/23/2006 18:42:53 long.q@jalapeno1.cluster.fmrib.ox.ac.uk 1

TO SEE THE FULL STATE OF A PARTICULAR TASK:

qstat -j 14000
==============================================================
job_number: 14000
exec_file: job_scripts/14000
submission_time: Thu Apr 27 10:09:43 2006
owner: apple
uid: 123456
group: apple-group
gid: 987654
sge_o_home: /home/apple
sge_o_log_name: dcm
sge_o_path: /tmp/13999.1.short.q:/home/apple/bin:.....
sge_o_shell: /bin/bash
sge_o_workdir: /vols/Scratch/apple
sge_o_host: jalapeno02
account: sge
cwd: /vols/Scratch/apple
path_aliases: /private/automount/ * * /
stderr_path_list: /home/apple/.fsltmp
mail_list: apple@jalapeno02.cluster.fmrib.ox.ac.uk
notify: FALSE
job_name: feat
stdout_path_list: /home/apple/.fsltmp
jobshare: 0
hard_queue_list: long.q
env_list: MANPATH=/home/apple/man:/opt/sge/.....
job_args: /home/apple/.fsltmp/feat_4HlloR,1
script_file: /opt/fmrib/fsl/bin/feat
usage 1: cpu=00:00:00, mem=0.00000 GBs, io=0.00000, vmem=N/A, maxvmem=N/A
scheduling info:

AVAILABLE QUEUES

WHAT QUEUES DOES THE JALAPENO CLUSTER PROVIDE AND WHAT ARE THEY FOR?

The jalapeno cluster provides four primary queues: ''short.q'', ''long.q'' and ''verylong.q'' and two special purpose queues, ''bigmem.q'' and ''interactive.q''. By choosing the most appropriate queue you can gain access to more resources so it pays to think about the right queue to use. Further, if you choose a queue that has resource limits and your job exceeds this (time or memory) then your task will be killed, wasting compute time.

NB To select a particular queue using the ''fsl_sub'' command use the ''-q <queue-name>'' option or use the -T and -R or --coprocessor* options to automatically select the queue to use.

NB The time limits we specify below refer to CPU time - this is '''NOT''' real-time. Because the compute cluster is shared, a job often gets a fraction of the available time on the CPU so a job that actually takes 1 hour to run may only have used 25 minutes of CPU time.

QUEUE DETAILS

veryshort.q - 30 minutes max CPU Provides a set of slots for very quick tasks. Provides plenty of highly available compute power on the cluster. Use these as much as possible to get your jobs off the shared login servers. RAM usage limited to 12GB.
short.q - 4 hrs max CPU Run brief tasks, i.e. less than 4 hours CPU run time, on this queue. The short queues take precedence over all other queues so if your task fits on this queue it would be in your best interests to run it here. RAM usage limited to 12GB.
long.q - 48 hrs max CPU The long.q is the default queue. Tasks can run for a maximum of 24 hours CPU time (see tip above). RAM usage limited to 12GB. Most of the FSL software runs in this sort of time frame with the possible exception of group '''FEAT''' tasks. In this case the '''FEAT''' scripts have been written to ensure the right queues get used.
verylong.q - unlimited CPU time at low priority An unlimited duration queue with the lowest priority. Tasks which will take longer than 24 hours must be run here. These tasks get the lowest priority under the assumption that there will be plenty of spare CPU (esp. overnight) to ensure they run in a sensible time frame. RAM usage limited to 12GB.
bigmem.q - targets machines with larger memory capabilities The bigmem.q is for running large memory footprint tasks. It targets machines with large amounts of RAM and should be picked if you feel your analysis task is going to need unusual amounts of RAM. Currently there isn't a simple way of determining the memory footprint so you'll have to learn the hard way, i.e., through jobs otherwise running out of memory. Please seek assistance if this is the case.
interactive.q - interactive tasks Where you just can't run a task without interaction, for example you have to press a start button in a window, then we offer an interactive queue this cannot be used as a ''fsl_sub'' target. See interactive queue for further details.
cuda.q - targets machines with NVIDIA GPU hardware. Use the --coprocessor options to configure this resource (see GPU tasks in the Advanced Usage section). This queue has no limits but please limit long running tasks as this is significantly more restricted resource.
lcmodel - targets the host with the LCModel spectroscopy software installed.

How to request and use an interactive queued session

Where your program requires interaction we offer an interactive queue which can be used to get a terminal session on one of the cluster nodes.

Request & Warning: You MUST log out AS SOON AS you finish using the interactive session as this uses up a slot on the cluster and so prevents other users from using the cluster.

To request a terminal session, issue the following command on jalapeno.fmrib.ox.ac.uk

qlogin -q interactive.q

There may be a delay whilst the system finds a suitable host. Once one becomes available, if this is the first time you have logged into a particular node you may be asked to accept the host key. Enter `yes` to accept this host key and then you will be presented with a terminal session.

At this point (assuming you enabled X11 tunnelling to ''jalapeno.fmrib.ox.ac.uk'') you should be able to run graphical X11 programs as well as terminal commands.

What queues are available and what to use them for

Available Queues

N.B. To select a particular queue using the ''fsl_sub'' command use the ''-q <queue-name>'' option or use the -T and -R or --coprocessor* options to automatically select the queue to use.

N.B. The time limits we specify below refer to CPU time - this is NOT real-time. Because the compute cluster is shared, a job often gets a fraction of the available time on the CPU so a job that actually takes 1 hour to run may only have used 25 minutes of CPU time.

Queue	Max Runtime	Max RAM (GB)	Usage
veryshort.q	30 mins	16	Very quick tasks. Largest number of slots. Use these as much as possible to get your jobs off the shared login servers
short.q	4h	16	Brief tasks. The short/veryshort queues take precedence over all other queues so if your task fits on this queue it would be in your best interests to run it here
long.q	48h	16	The default queue. Tasks can run for a maximum of 48 hours CPU time. Most of the FSL software runs in this sort of time frame with the possible exception of some large group FEAT tasks.
verylong.q	infinite	12	Lowest priority and limited number of slots. Tasks which will take longer than 48 hours must be run here. These tasks get the lowest priority under the assumption that there will be plenty of spare CPU (esp. overnight) to ensure they run in a sensible time frame.
bigmem.q	infinite	~300	The bigmem.q is for running large memory footprint tasks. There are very few of these slots, which may use any available RAM on a host. Avoid unless necessary, and please seek assistance before using.
cuda.q	infinite	~200	Targets machines with NVIDIA GPU hardware. Use the --coprocessor options to configure this resource (see GPU tasks in the Advanced Usage section). This queue has no limits but please limit long running tasks as this is significantly more restricted resource.
interactive.q	infinite	16	Where you just can't run a task without interaction, for example you have to press a start button in a window, then we offer an interactive queue. This cannot be used as a fsl_sub target. See interactive queue for further details.

More advanced techniques for submitting jobs, e.g. GPU, array and MATLAB tasks and full fsl_sub usage information

If your task comprises a complicated pipeline of interconnected tasks there are several options for splitting into dependent tasks or parallelisation of independent portions across many cluster nodes. Information on these techniques and other advance options is in this section.

Cluster advanced usage

How to terminate jobs and solve submission/runtime problems

Occasionally tasks will fail. Grid Engine provides some tools and logging information for debugging such tasks.

When tasks begin running they generate two files, jobname.ojobid (e.g. feat.o12345) (referred to as the .o file) and jobname.ejobid (referred to as the .e file), which by default are created in the folder from which fsl_sub was run. The .o file contains any text that the program writes to the console whilst running, for example:

fsl_sub ls my_folder

outputs the job id ''12345''. The task would generate a file ls.o12345 containing the folder listing for my_folder. If your command produces a running commentary of its progress you could monitor this with the tail command:

tail -f command.o12345

This will continue displaying the contents of command.o12345, adding new content as it arrives until you exit (type CTRL-c). The .e file contains any error messages the command produced. If you still need help then please contact the IT Team.

KILLING JOBS

If, after submitting a job you realise that you have made a mistake or that the job can't complete (perhaps you have insufficient disk space), you can kill the job with the command:

qdel job_id

If the job is currently running there may be a short pause whilst the task is terminated.

Cookies on this website

Queued Analysis Tasks

​Introduction​

Grid E​​ngine and the​​ queues

​The Jal​​apeno Cluster

Submitting jobs with fsl_sub

Specifying CPU Time

Requesting memory for automatically queued jobs

FSL_SUB OPTIONS

LONG RUNNING TASKS

Monitoring tasks

​INTERPRETIN​​G QSTAT'S OUTPUT

EXAMINING COMPLETED JOBS​​​

QSTAT ​EX​AMPLES

TO SEE ​​A LISTING OF JUST THE "RUNNING" JOBS:

TO SEE THE STATE ​​OF ONLY THOSE JOBS BELONGING TO A PARTICULAR USER:

TO SEE THE FULL S​​​​TATE OF A PARTICULAR TASK:

WHAT QUEUES DOES THE JALAPENO CLUSTER PROVIDE AND ​WHAT ARE THEY FOR?

QUEUE DET​​AILS​