FMRIB Cluster (Jalapeno)
FMRIB hosts a small scale HPC facility with interactive, visualisation capable servers - instructions on suitability and usage
Introduction
The FMRIB IT department provide several interactive servers and a compute farm for you to use for your analysis. The machines and their uses are detailed below.
High-level overview
- 16 GB per core queued job cluster
- K-series NIVIDIA CUDA GPU cards
- Interactive use nodes, one with 1.25TB RAM
- ca 0.5PB NFS storage
- LTO8 tape archive facilities
Interactive and cluster submission computers
Jalapeno.fmrib.ox.ac.uk/jalapeno.cluster.fmrib.ox.ac.uk (internal networks)
This is our general purpose Linux computer, available to connect to via SSH from a University network. External to the University, use a VPN connection.
Appropriate uses:
- Submitting jobs to the grid engine computing cluster
- Looking at analysis results, e.g. FEAT reports, FSLView
- Graphical desktop sessions using VNC
- The number of connections is limited - please don't leave them running if you don't need them for more than a day or two.
Jalapeno must not be used to do computationally intensive tasks - use the cluster or another machine. For example we do not allow (and actively prevent) the use of MATLAB on this machine.
As this machine is our public facing computer it needs to be kept up-to-date with security updates, so this machine will be rebooted on a regular basis (as required by the updates necessary). The maintenance window is the second Tuesday of the month, 9-10am, unless a pressing security need requires more urgent action. The machine should be considered at risk during this period, but we aim to advise in advance of any required reboot.
Jalapeno00.cluster.fmrib.ox.ac.uk
This is our medium-memory (128GB) general purpose computer. Access to this machine from outside of the FMRIB network is via VPN or via a jalapeno SSH Appropriate uses:
-
Interactive MATLAB sessions:
-
Where possible MATLAB processing should be done on the cluster using the interactive queues or, preferably, the work should be scripted such that you can submit to the normal queues. However, where you have an interactive task that is likely to take a long time you may use this machine.
-
This is a shared computer, so anything requiring a large amount of memory (>8GB) should be run elsewhere (interactive queues, or scripted/compiled and using the normal queues) We would also recommend that any long running process should take steps to save intermediate results - we cannot guarantee 100% up time from any computer.
-
-
-
Normally you would use Jalapeno, but submission is also possible from this machine (for example, perhaps from a workflow that involves MATLAB).
-
-
-
If you have a long running process that can't run on the cluster use this machine as it will be need to be restarted much less regularly than jalapeno.
-
As this machine is not public facing, updates will be installed on a less aggressive schedule. Consequently, the machine will need to be rebooted much less frequently than Jalapeno.
Jalapeno18.cluster.fmrib.ox.ac.uk
This is our high-memory (1.25TB) general purpose computer. Access to this machine from outside of the FMRIB network is via VPN or via a jalapeno SSH Appropriate uses:
-
Interactive MATLAB sessions:
-
Where possible MATLAB processing should be done on the cluster using the interactive queues or, preferably, the work should be scripted such that you can submit to the normal queues. However, where you have an interactive task that is likely to take a long time you may use this machine.
-
This is a shared computer, provided for large memory tasks. Please use jalapeno00 for more general interactive compute tasks. We would also recommend that any long running process should take steps to save intermediate results - we cannot guarantee 100% up time from any computer.
-
-
-
Normally you would use Jalapeno, but submission is also possible from this machine (for example, perhaps from a workflow that involves MATLAB).
-
-
-
If you have a long running process that can't run on the cluster use this machine as it will be need to be restarted much less regularly than jalapeno.
-
As this machine is not public facing, updates will be installed on a less aggressive schedule. Consequently, the machine will need to be rebooted much less frequently than Jalapeno.
Cuda03.cluster.fmrib.ox.ac.uk
If your interactive task requires access to NVIDIA CUDA capable hardware then the computer cuda03.cluster.fmrib.ox.ac.uk provides access to two K80 class GPUs. These are provided on a first-come first-served basis.
Interactive MATLAB
Wherever possible MATLAB should be run on the cluster, where the task can only be run in an interactive fashion we provide interactive queues. Details are available in the queue submission documentation.
How to submit jobs to the FMRIB cluster and how to monitor them
Introduction
WIN operate a compute cluster formed from rack-mounted multi-core computers. To ensure eficient use of the hardware, tasks are distributed amongst these computers using grid scheduling software. This software monitors the utilisation of the computers in the cluster, launching new jobs onto the least used computers, preventing over loading of machines whilst ensuring a fair share of compute resources amongst all users of the system. When you submit a job it will sit in a queue until such time as the scheduler software identifies a viable empty slot and your job has reached the top of the queue. The fair share algorithm in use ensures that heavy users of the system are less likely to reach the top than users who rarely use the system (this is cleared on a regular basis so that you aren't deprioritised forever).
Grid Engine and the queues
WIN's cluster runs the Grid Engine (GE) queuing software (using the Son of Grid Engine distribution). To ease job submission we provide a helper called fsl_sub which sets some useful options to Grid Engine's built-in qsub command.
GE manages a set of queues representing the available resources. Tasks are submitted to GE queues for distribution across the execution hosts. These queues are designed to divide the resources according to usage profiles to ensure that the majority of tasks get done in a favourable time-frame (see Jalapeno Queues).
The Jalapeno Cluster
Summary of available interactive servers
Server name |
Purpose |
Restrictions |
jalapeno.fmrib.ox.ac.uk / jalapeno.cluster.fmrib.ox.ac.uk (internal connections) |
Interactive SSH/VNC/X11 sessions, queue submission. |
No long running or high memory tasks - these MUST be run on the cluster. This includes MATLAB, which should be run in a batch process on the cluster, on a desktop machine or on Jalapeno00/18. |
jalapeno00.cluster.fmrib.ox.ac.uk |
Long running interactive sessions, interactive MATLAB, other interactive compute tasks |
|
jalapeno18.cluster.fmrib.ox.ac.uk |
Large memory requirement, long running interactive sessions, interactive MATLAB, other interactive compute tasks |
|
jalapeno01-11,19-23.cluster.fmrib.ox.ac.uk |
Compute cluster |
No direct logins allowed, jobs should be submitted to the queue system |
cuda03.cluster.fmrib.ox.ac.uk |
Interactive CUDA development (2xK80 cores) |
|
cuda01-05.cluster.fmrib.ox.ac.uk |
Compute cluster (CUDA) |
No direct logins allowed, jobs should be submitted to the queue system |