The BMRC Compute Cluster
Information on compute facilities hosted by BioMedical Research Computing (BMRC)
BMRC Facilities
The WIN Centre recommends the use of the HPC facilities provided by the BioMedical Research Computing (BMRC) team. The BMRC facilities are located in the Big Data Institute and Wellcome Human Genetics building on the Old Road Campus. These pages form the basic information for researchers; BMRC's own documentation is available on their site.
Before you can use the BMRC equipment you and the project grant holder need to register with the BMRC team - see our BMRC account pages.
Data Governance Information
The BMRC cluster storage system utilises RAID technology to minimise risk to data as a result of disk hardware failure, however they DO NOT backup any data stored on this system and should be treated the same way as /vols/Scratch is on FMRIB's systems. Do not rely on this storage for your only copy of data you create, unless it is trivially regenerated. Please ensure you copy important results to storage hosted at FMRIB/OHBA or to another University approved secured storage location.
Raw MRI/MEG scan data is already protected at source so does not need additional protection and public datasets can be re-downloaded if required so may not need additional protection. You must document your proposed protection methodology in your data management plan and DPA/DPIA.
The BMRC storage is encrypted at rest (using AES256), but any project processing sensitive, identifiable human data will need to be separately assessed for suitability using the University's DPS/DPA/DPIA scheme. Until this approval is obtained from the Department/University you must not upload such data to their cluster.
Server Resources
In addition to the shared BMRC cluster the WIN Centre has exclusive and/or priority access to several machines. These are detailed on our BMRC Servers page.
Interactive/GUI Access
In addition to the BMRC shared interactive nodes (cluster1.well.ox.ac.uk and cluster2.well.ox.ac.uk), WIN have five dedicated interactive nodes accessible via the BMRC's Virtual Desktop Infrastructure service.
Initial setup of your account
Getting started
When you get your new account you should first take a look at the BMRC help pages.
Access to their facilities is either via SSH to a pair of machines cluster1.bmrc.ox.ac.uk and cluster2.bmrc.ox.ac.uk or via WIN/BMRC's Virtual Desktop Infrastructure running on one of our dedicated servers.
The cluster1/cluster2 machines must not be used for any processing, they are purely for you to manipulate your files, upload/download data and submit tasks to the queues and to gain access to other resources. You may use the VDI hosts for any compute purpose.
On first connecting your SSH client will ask if you trust the remote server - check that the 'fingerprint' given matches these values - your version of SSH will decide whether you see the RSA, ECDSA or ED25519 fingerprint.
Host | Fingerprint |
cluster1.bmrc.ox.ac.uk | 2048 SHA256:zCwHn9ElYYevVCWIgKUmSDefdHrOqnXWUzzOnfowBRg cluster1.bmrc.ox.ac.uk (RSA)
256 SHA256:jbKUwq+x8n/zfNZZhvM+oQR5ip66od+RSYvmmlA2SZo cluster1.bmrc.ox.ac.uk (ECDSA)
256 SHA256:xNDl7NG8ovL99Su3kqTF9b9bmczxLlsXo+AGxoK4jeQ cluster1.bmrc.ox.ac.uk (ED25519)
|
cluster2.bmrc.ox.ac.uk | 2048 SHA256:zCwHn9ElYYevVCWIgKUmSDefdHrOqnXWUzzOnfowBRg cluster2.bmrc.ox.ac.uk (RSA)
256 SHA256:un7QAKQJR+HBwEsnGkLZYjQp2J7ZMPy5kYxCecJj2+I cluster2.bmrc.ox.ac.uk (ECDSA)
256 SHA256:LdZ54ymATu6YbgNLC0RUYCVBROmLkx1qqnFRCUSYpbw cluster2.bmrc.ox.ac.uk (ED25519)
|
Instuctions on getting started with your BMRC account, including WIN specific details are on our BMRC usage pages with instructions on using the BMRC cluster can be found on our BMRC cluster page along with details on the compute resources at BMRC.
Setting up your account
Once you have an account on their system you should add some WIN specific settings to your account, you can do this with:
/well/win/software/config/setup_win
After running this command, please log out and back in. You will then have access to our WIN specific software modules - see Modules. An important module is fsl_sub, use:
module add fsl_sub
to activate (you might want to put this line after the WIN Config section in your .bash_profile).
Storage locations
User home folders (at /users/<groupname>/<username>...) are intentionally limited in size so do not store any project data in this location. Users have a dedicated folder within their group at /well/<groupname>/users/<username> - you should store all data and files there.
Graphical Programs
WIN Centre users have access to a web-based Virtual Desktop Infrastructure that can be used to run graphical software on any of our cluster nodes. This is to be preferred over X11 forwarding over SSH, particularly when accessing from a non-University ethernet network and/or long running tasks.
Submitting to the Cluster
For information on submitting jobs to the CPU or GPU cluster nodes see our BMRC cluster pages.
Introduction to the BMRC Cluster
Introduction
WIN@BMRC operate a compute cluster formed from rack-mounted multi-core computers. To ensure efficient use of the hardware, tasks are distributed amongst these computers using queuing software. Compute tasks are distributed amongst the cluster in the most optimum manner with available time shared out amongst all users. This means you usually have to wait a while for your jobs to complete, but there is the potential to utilise vastly more compute resources than you would have available to you on a single computer such as your laptop or a workstation.
The BMRC User guides are available at https://www.medsci.ox.ac.uk/divisional-services/support-services-1/bmrc with some additional information, particularly about the fsl_sub submission program we provide detailed in this section.
The BMRC Cluster
Submitting and monitoring tasks
For details on how to submit and the monitor the progress of you jobs see the BMRC cluster page.
Long Running Tasks
Unlike the Jalapeno cluster, BMRC's cluster does not offer 'infinite' queues (equivalent to verylong.q, bigmem.q and the cuda.q on the jalapeno cluster). You must break your task up into shorter components or regularly save state to allow restart and submit these parts (or resubmit the job continuing where it left off) using job holds to prevent tasks running before the previous one completes.
Submitting jobs to the compute cluster
Submitting and monitoring tasks
For details on how to submit and the monitor the progress of you jobs see the BMRC cluster page.
The UKBB data set is available on the BMRC facilities. To gain access you need to be added to one or more of the following groups:
- win-ukb-imaging - this group gives access to the MRI imaging data
- win-ukb-genetics - this group gives access to the Genetics information
- win-ukb-life - this group gives access to the lifestyle data
Approval for addition to these groups must come from Professor Smith or WIN IT.
Once you are in the appropriate groups the data is available in the following locations:
- Imaging - /well/win-biobank/projects/imaging/data/data3
- Genetics - /well/win-biobank/projects/genetics/data/v3/21k
- Lifestyle - /well/win-biobank/projects/life/data/... (not currently available?)
N.B. You must 'cd' directly to the appropriate final folder (e.g. data3 in the case of the imaging dataset) as the permissions on the parent folders will not allow you to view the contents of the 'data' folder only pass through it (so no tab-completion).
The BMRC website includes various user-guides, hints and tips. New users should start by reading the new users welcome page.
If your problem relates to a WIN software package (e.g. FSL, fsl_sub error messages) then contact computing-help@win.ox.ac.uk in the first instance. For issues with running tasks on the cluster (assuming fsl_sub managed to submit successfully) then contact bmrc-help@medsci.ox.ac.uk.
If you have a particular problem please use the following table to determine the most efficient route to obtaining support.
Issue is with... | Who to contact |
---|---|
Problems logging in | bmrc-help@medsci.ox.ac.uk |
fsl_sub gives an error | computing-help@win.ox.ac.uk |
fsl_sub submits job but it never runs | bmrc-help@medsci.ox.ac.uk and then computing-help@win.ox.ac.uk if they confirm fsl_sub has submitted incorrectly |
Unable to access a dataset | computing-help@win.ox.ac.uk to request group membership |
Error/crash when running programs | bmrc-help@medsci.ox.ac.uk |
Issue with using VNC | bmrc-help@medsci.ox.ac.uk |
Run out of disk space | Ask your PI to contact bmrc-help@medsci.ox.ac.uk |
Problem with WIN maintained software (in /well/win/software) | computing-help@win.ox.ac.uk |
BMRC Servers - WIN Priority Servers
In addition to the shared servers the BMRC facilities provide, WIN owns several machines which are either for our exclusive use or can be given priority access, details of the resources appear here. N.B. Access to these machines is via rescomp.well.ox.ac.uk - you cannot SSH directly to these devices.
Interactive Machines
To run interactive tasks, e.g. MATLAB or non-queued software WIN has five machines available for use as detailed in the table below. These machines should be accessed either by direct login, the Virtual Desktop Infrastructure or via srun -p win --pty bash as per the table entry. Please ONLY use win000 for large memory tasks.
Hostname | Purpose | Access via | Memory | CPU cores/model |
---|---|---|---|---|
win000 | High memory tasks (>128GB per job) | ssh, VDI | 3TB | 112 Intel Xeon Platinum 8280 @ 2.7GHz |
win001 | small jobs | ssh, VDI | 1.5TB | 56 Intel Xeon Gold 6258R @ 2.7GHz |
win002-004 | Medium memory tasks (<128GB) | VDI, srun -p win --pty bash | 1.5TB | 56 Intel Xeon Gold 6258R @ 2.7GHz |
GPU Machines
These machines are notially shared with all users of the BMRC facilities (in return to access to the other GPU nodes). If you are experiencing difficulties in getting your jobs to be scheduled or you know you have a large number of jobs to run/have a deadline, then please contact WIN IT support as we can request exclusive access to one or more nodes.
Hostname | Queues | GPU Card and Quantity | Memory | CPU cores/model |
---|---|---|---|---|
compg010-011 | gpu_short, gpu_long | P100 (16GB) x 4 | 384GB | 24 Intel Xeon Gold 5118 @ 2.3GHz |
compg016 | gpu_short, gpu_long | V100 (32GB) x 2 | 768GB | 24 Intel Xeon Gold 6136 @ 3.0GHz |
compg028-030 | gpu_short, gpu_long | Quadro RTX8000 (48GB) x4 | 768GB | 32 Intel Xeon |
compg031-033 | gpu_short, gpu_long | A100 (40GB) x 4 | 384GB | 32 Intel Xeon |
Research Group Dedicated Machines
In addition to the shared resources some research groups have purchased their own hardware. These devices would typically be accessed via SSH from rescomp.well.ox.ac.uk. Access is by arrangement with the head of the research group.
Hostname | Research Group | GPU Card and Quantity | Memory | CPU cores/model |
---|---|---|---|---|
compg017 | Mark Woolrich | V100 (32GB) x 2 | 768GB | 24 Intel Xeon Gold 6136 @ 3.0GHz |
- Access to a queued job cluster of close to 10,000 x86_64 compute cores (16GB/core)
- Ability to rent space on a 5PB + of high-performance cluster storage platform
- ca 72 NVIDIA CUDA GPUs (36 of which have priority or dedicated access to WIN members), including the latest A100 cards
- 4x 1.5TB, 56 core interactive Linux hosts dedicated to WIN users
- 1x 3TB, 112 core interactive Linux host dedicated to WIN users