Other potentially useful submission options or techniques
Capturing job submission information
fsl_sub can store the commands used to submit the job if you provide the option --keep_jobscript. When provided, post submission you will find a file in the current folder (assuming you have write permissions there) a script called wrapper-<jobid>.sh. This exact submission may be repeated by using:
fsl_sub -F wrapper-<jobid>.sh
The script contents is described below:
|#!/bin/bash||Run the script in BASH|
|#$ GRID ENGINE OPTION||Grid Engine options |
|#$ GRID ENGINE OPTION|
|module load <module name>||Load a Shell Module|
|# Built by fsl_sub v.2.3.0 and fsl_sub_plugin_sge v.1.3.0||Version of fsl_sub and plugin that submitted the job|
|# Command line: <command line>||Command line that invoked fsl_sub|
|# Submission time (H:M:S DD/MM/YYYY) <date/time>||Date and time that the job was submitted|
Passing Environment Variables to Queued Jobs
On the jalapeno cluster, by default, your entire shell environment (settings) are transferred to your job when it starts up. On some systems (for example the BMRC cluster) this is not possible and so fsl_sub allows you to specify environment variables that should be transferred to the job. This can also be useful if you are scheduling many similar tasks and need to specify a different value for an environment variable for each run, for example SUBJECTS_DIR which FreeSurfer uses to specify where your data sets reside. The --export option
SKIPPING COMMAND VALIDATION
By default fsl_sub will check the command given (or the commands in the lines in an array task file) can be found and are executable. If this causes issues, often because a particular program is only available on the compute nodes, not on jalapeno itself, then you can disable this check with -n (--novalidation).
AVOIDING JOB REQUEUING
If there is a problem with a compute node causing loss of communications with the cluster manager whilst your job is running or discovery of an issue that requires an immediate reboot, admins can request that your job is moved to a new node or on node reboot the cluster will automatically start the job on a new node. This move starts your task again from the beginning which can cause issues if your job modifies the file system in a way that destroys data necessary for earlier stages of the processing pipeline. If this is the case your job should be started with the --no_requeueable option, this prevents the job automatically restarting and potentially wasting time processing a job that cannot complete successfully.
SUBMITTING TO A SPECIFIC HOST OR RANGE OF QUEUES
If you need to submit to specific host you can achieve this by appending '@hostname' to the queue name, for example -q short.q@jalapeno01 would submit to the short.q on jalapeno01.
To submit to a range of queues (not normally necessary on jalapeno's cluster) you can comma separate the queue names, for example -q short.q,veryshort.q.
VIRTUAL X11 SERVER OR HOW TO RUN A SUBSET OF GUI BASED APPLICATIONS ON THE CLUSTER
Some programs insist on displaying something in a window, even it is just a progress bar. If you attempt to run these applications on the cluster they will immediately fail as the machine has no where to display this progress bar. Where possible try to find a way to run the program without this graphical output (maybe it has a command-line option to run it in a textual mode), but in the cases when this is not possible, the cluster nodes have the X11 Virtual Frame Buffer software installed.
To ease the use of this program, we have provided a wrapper script that starts and stops the Xvfb process for you in a more automated manner, returning the display number you have been allocated. You can use this information to set the '''DISPLAY''' environment variable before running your program. The following script creates the dummy display, sets '''DISPLAY''', runs the program ''a_graphical_program'' and then destroys the dummy display.
#!/bin/bash disp=$(/opt/fmrib/bin/virtual-x -f -q) export DISPLAY=":$disp" a_graphical_program virtual-x -k $disp
To use with your own program, replace ''a_graphical_program'' with the path and arguments for your particular program. The resulting script can then be submitted to the cluster using the fsl_sub command.
SUBMITTING GRID ENGINE SCRIPTS
If you have a particularly complicated job that can't be configured using the fsl_sub options then you can write your own script as per the Grid Engine documentation and pass this to fsl_sub with the -F (--usescript) option. All other options will be ignored/overridden. You can use this to resubmit a stored job script generated with the --keep_jobscript option.
CONTROLLING SCHEDULING PRIORITY - JOB URGENCY
If your job is not urgent then you can suggest it only runs when the system is quiet by specifying a lower priority for the task with the -p (or --priority) option. Specify a number between -1023 (lowest priority) to 0 (normal priority). If you have a particularly urgent job then contact email@example.com to discuss raising the priority of your job above 0.
If you have been asked to run your jobs under a specific project name (this would typically be used to allow easy auditing of compute use by a particular project, or potentially to allow access to restricted resources) then you can use the --project option to specify a project. If you can't do this (for example if you are running an auto-submitting program such as FEAT) then you can also set the environment variable FSLSUB_PROJECT and this name will be used by fsl_sub commands within the program, e.g.
FSLSUB_PROJECT=myproject feat mydesign.feat
or where you always use the same project add the following to your .bash_profile:
REQUESTING A SPECIFIC RESOURCE
Some resources may have a limited quantity available for use, e.g. software licenses or RAM. fsl_sub has the ability to request these resources from the cluster (the --coprocessor options do this to automatically to request the appropriate number of GPUs). The option -r (--resource) allows you to pass a resource string directly through to the Grid Engine software. If you need to do this you will be advised by the computing help team or software documentation the exact string to pass.