User Driven Tape Archive
How to store and retrieve data from the WIN tape archive.
The WIN Centre maintains a self-service archive facility which allows you to transfer data to/from high-capacity LTO data tape (12-16TB per tape) from/to the FMRIB file servers. At the present time this service is only available on the FMRIB hosted server jalapeno.fmrib.ox.ac.uk, so data hosted elsewhere will need to be transferred to that server first.
When archiving, by default, your folder will be removed from the file server once the archive process completes successfully, but you can opt to keep the disk copy, so allowing for a point-in-time backup of a particular folder.
Each night a duplicate copy of your dataset will be made on a separate tape.
At this point in time, data is retained indefinitely from the user archive, so you can create multiple 'versions' of a data set on tape.
WIN IT do not take steps to catalogue this data in any way, for example an index of the files in an archive is not automatically taken; it remains the responsibility of the researcher to maintain sufficient records of what archives have been made and what they represent. On request, WIN IT can obtain lists of archives made by a particular user (with date and folder path information).
All raw data sets from the OHBA and FMRIB MRI scanners are automatically curated, so under normal circumstances there is no requirement for you to also store your own personal copies of the raw DICOM or NIFTI files on User Archive. Where data is collected at other locations (which at present includes the OHBA MEG, BSB scanners, EEG datasets) it may be prudent to make an archive copy unless you are confident of being able to obtain the data from the original source in the future.
- The User Archive tool can only operate on folders - you cannot archive individual files. You should determine what the most appropriate unit to archive within a project is. Aim to archive the smallest item you are likely to want to retrieve at once, e.g. a subject's data, the analysis results for an experiment, rather than all the files for a project. Whilst the system can, theoretically, archive any amount of data, if you subsequently wish to recover only part of this archive the system still needs to read all data stored prior to your items of interest and so may take much longer that recovering smaller archives.
- The self-service archive system is not able to recover files not owned by your account at the time of writing to tape - the system will not stop you archiving if you have read-access on these files, so it is fairly easy to mistakenly do this. Similarly, the removal of your data from disk after a successful tape archive will fail if you do not have write permissions on a sub-folder. See the troubleshooting section for more details.
- Archive/retrieve can take time, the system runs several tasks, competing for limited resources.
- Archiving needs a small amount of space, to generate an index file before archiving and to write the archive retrieve file after pruning. If your file system is full, neither of these processes will complete.
- Avoid 'nesting' archives - archiving a folder structure containing archive retrieve files. This quickly becomes frustrating when retrieving the complete folder structure at a later date.
Committing a directory to tape
On jalapeno cd into the parent directory and issue the following command:
The directory mydir will be written to tape and then deleted once the process has succeeded and a script, mydir.retrieve will be created in its place.
Before writing your folder to tape, the script will generate a compressed text file containing a list of all the files and folders in the archive. This will be written to mydir/.archive_index.txt.gz. For folders with many files/folders this can take a significant amount of time, so may be omitted by using the -i option to the archive command. We recommend that you do allow this to be generated.
N.B. The indexing functionality was added to the archive command in January 2023, so this file will not be available on archives made before 27/1/2023.
Recovering a directory from tape
When a directory is written to tape you will find a file called
in its place. This is an executable shell script which can be used to recover the directory. Simply type:
and a while later the directory will reappear. By default, retrieve files will recover the archive to its original location, thus there is no requirement to store the retrieve file in its original location.
To change the location to which the folder will be recovered you can use the command:
retrieve_here <path to >/mydir.retrieve
whilst in the folder you wish to recover to.
Using the tape archive service to create a backup of a folder
If you wish to make a permanent copy of a folder but continue to work on/with it then you can use the archive command with the -k option to write the folder to tape and not delete the folder, eg:
archive -k mydir
will store the contents of mydir on tape, create a .retrieve file and leave mydir on disk. Any pre-existing .archive file of the same name will be overwritten, so if you wish to be able to easily recover older archives of this folder you should rename the existing .retrieve file before running archive again (see Creating Retrieve Files below).
Recovering part of an archive
It is possible to request only part of an archive to be recovered from tape, assuming you know the full path of the folder(s)/file(s) you are interested in.
Whilst the archive system's database does not store a list of files within an archive your archive may contain an index file (this is the default for archives made after January 2023). You can use these instructions to request a copy of this index.
To recover file(s)/folder(s) from an archive you need to know the original full path of the items you wish to selectively recover.
The syntax of the command is:
retrieve_just [-d] <retrieve_file> <path> [<path> ...]
Where -d will recover the folders/files to the current folder, retrieve_file is the retrieve filename (and path if necessary) and path is the full path to the folder or file you wish to recover. You can add as many files/folders to the end of the command as you need.
The archive system sees your scratch folder as /vols/Scratch/<username> (~/scratch is just a convenience short-cut) so when specifying paths for files/folders archived from scratch, the path needs to begin with /vols/Scratch/<username>.
Recovering old versions from tape
Each time you write a directory to tape a new data set is created. This means that old versions of the directory remain available for retrieval.
This process will often require the assistance of one of the archive managers so e-mail email@example.com with the path name of the directory and an approximate date for the previous archive (if there are more than two copies).
It may be possible to self-service this by renaming the existing .retrieve file (so that you don't loose it) and use the following command:
Where <foldername> is the folder that you archived multiple times. If the folder name is fairly generic, e.g. 'data', then it unlikely to be successful, but for unique names you should be presented with a list to choose from. Enter the number of the archive session you are interested in and a new .retrieve file will be generated that you may then use to recover your data with.
A few things can go wrong. Here is a list of some common problems and what you can do.
- What if I'm already over quota?
- The most common problem occurs when a user has already exhausted all their available quota and tries to archive. The archive program will still operate but may well fail to write the retrieve script and will also fail to delete the files (a limitation of the file system we use). In these cases a suitable retrieve script can be created by running the command
mkretrieve directory_nameWhere directory_name is the name of the directory you archived. If there is more than one save-set matching this directory name then you will be presented with a list of options - choose the most appropriate item. Please note this can be used to generate retrieve files for older versions of the archived save-set.
- It's taking a long time. What's happening?
- The tape jukebox is shared between several tasks: user archive; scan archive; backup etc. When you start a task a suitable tape needs to be mounted in the drive. In the case of a retrieve the correct tape needs to be located first. If a required tape isn't in the jukebox then an operator needs to intervene to manually load the tape. The situation can also be complicated if another long running task (say a large archive/retrieve) is underway and blocks access to one or more tape drives. There is very little that can be done in these cases and we ask that you try to patient. Try also to plan archive/retrieval in advance, especially if you are going to require files at the weekend when there may not be a member of staff around.
- What if I need to recover the directory to a different location?
- To recover to the current directory use:
retrieve_here full_path_to_retrieve_fileFor more advanced relocation tasks, the retrieve script can easily be altered. Here is an example of a retrieve script:
/usr/etc/mminfo -q'annotation=Joe Bloggs (My_Directory) Sun Jul 6 12:00:00 2003' /usr/etc/nsrretrieve -s cocoa.fmrib.ox.ac.uk -A "Joe Bloggs (My_Directory) Sun Jul 6 12:00:00 200"To retrieve to a specific directory add a
-d full_path_to_retrieve_locationjust before the '-A' on the nsrretrieve line. ie:
/usr/etc/nsrretrieve -s cocoa.fmrib.ox.ac.uk -d /usr/people/jbloggs -A "Joe Bloggs (My_Directory) Sun Jul 6 12:00:00 2003"
WIN scan archive
A separate archive is maintained of original scans. This data is written to two separate tapes for added safety. Access to scans via this archive is through the Calpendo interface. This only includes data acquired on the human MRI scanners.
The scan archives will not be overwritten. They will be periodically refreshed by duplicating them onto new media as and when new tape technology is employed. It is our intention, within the caveats imposed by the Data Protection Act 2018/GDPR, to maintain a permanent record of all scanner activity at the Centre.
The user archives are also expected to be kept, however, there is currently no commitment to refreshing these tapes (although we have done this in the past) and consequently users with very long term archive requirements should take extra measures to ensure their data remains available.