File versioning and curation with Git and GitLab
Why you should and how to use version control software
Introduction
Version control software is very useful for any kind of development of computer based media, curating your files to allow easy browsing of the history of the project or reporposing previous tools/work, for example for recording your analysis or stimulus workflow scripts.
It is mainly used to manage software source code but is ideal for managing any text file content and can even have a role in maintaining a history for binary data files (e.g. images), although the software doesn't always manage these file types as efficiently (see below).
Before continuing it's useful to define some of the terms that version control uses. Files are stored in a repository which could be on your computer, or if you are using the editors built into most GIT web services it will be on the remote server. Once you are happy with the changes you have made you commit them into the repository, with each commit being a point in time you can return to at a later date. You can also tag commits with a textual description of that version for future reference and you can also branch your repository to allow you to change the files yet maintain a working copy of the earlier code. Branching might be used to produce variations on the basic analysis workflow script for different projects, each branch then serving their specific project.
In addition to your local repository GIT also allows you to send and receive changes from other developers/devices, this might be a collaborator or it can be a central resource, e.g. a GIT service like GitHub or FMRIB's GIT service https://git.fmrib.ox.ac.uk. To download a repository from a remote service, you clone the respository. Where you don't own the remote repository, you are said to have taken a fork, you are now free to modify the code locally, diverging from the original. If you then wanted the owner to incorporate those changes you would issue a merge request, this may also be necessary if you have multiple copies of your own repository and change them concurrently, merging incorporates these different changes.
Sending changes to a remote repository is termed a push, retrieving the current version a pull and retrieving the details about all remote changes/branches etc a fetch.
WIN GitLab
We provide a GIT service using the GitLab software, available at https://git.fmrib.ox.ac.uk and it is strongly recommended that you use this, even if it is just to backup your local repository. All user's with a FMRIB computer account have access to this with 10 projects per-person initially (more available on request).
By linking your local repository to this central respository you can easily backup your work, share with you collaborators (non-IT account holders would need to have accounts created) or even the world by making the project public.
Externally hosted public repositories are also available, e.g. GitHub, GitLab.com, Bitbucket etc. Use of these may be appropriate for public projects but where you want to create private repositories they often charge a fee (students are usually free).
Tools
Whilst GIT is easy to operate from the command line, if you would prefer to use a GUI on your local GIT repositories the SourceTree product is free (requires registration) offers support for Windows and macOS and incorporates support for the binary file management mentioned in the next paragraph.
To connect to a repository hosted on our GitLab server you need to choose File > New/Clone, then click on the +New Repository button and choose Clone from URL. The Source URL should then be set to the URL given for the project in GitLab.
Binary Files
GIT was not designed to store large (and/or binary) files (e.g. MRI datasets, behavioural datasets) as taking a copy of the repository transfers all versions of a file - rapidly changing binary files would quickly result in operations that take many minutes to complete.
Various GIT add-ins exist to help with handling these files, the GitLab product includes the add-in GIT-LFS (Large File Support) which is also used by GitHub. Details on how to use this (you need to install a tool on your computer) are in the binary files section below - please read this first if your project contains binary files.
Using Git
For details on how to use the GIT tools please read the documentation at http://git-scm.com/doc and https://docs.gitlab.com/ee/topics/git/
To be able to send repositories on your computer to our GitLab, you need to add SSH keys for your devices to your git.fmrib.ox.ac.uk account. Once you have these key files created, access GitLab by visiting https://git.fmrib.ox.ac.uk in a web browser and logging in (using the LDAP tab) with your WIN computer account. When you have logged in you will be presented with a list of your projects.
At the top right of the page is your 'Avatar' (starts as a circle with coloured pattern within). Click on this and choose Edit Profile from the menu. Click on SSH Keys in the menu on the left. On the right you will see a large text-entry box, copy the contents of your id_XXX.pub (NOT the file without .pub - this must be kept secret at all times) into this box, give it a name and consider setting an expiry date. If you do set one then you will have to generate a new key when this one expires. Click 'Add key' to complete the process.
Large Files
When you clone a GIT repository, you pull in the complete change history for all the files, if you have stored large binary files in there this may make the clone process take a very long time (especially if you have changed these files as all complete copies of the file have to be downloaded). For smaller files, GIT only stores the changes between file versions, but where the file changes significantly (often the case if it is in a compressed format) or is considered too large (by default >512MB) this will result in every new version being a complete copy of the file.
GitLab includes support for GIT-LFS (GIT Large File Support) and add-on package which uses a separate facility for handling large/binary files. This is enabled by default on all projects stored on git.fmrib.ox.ac.uk but before you can use it you need to install the GIT-LFS add-on. This adds a new GIT subcommand lfs which allows you to tag specific files or file types as being handed by LFS rather than GIT.
Our GitLab install supports HTTPS with SSH authentication. On macOS you can use MacPorts or Homebrew to install git-lfs or download the binary release from the GIT-LFS site. If using this latter installer, uncompress and install to your home folder:
cd ~/Downloads tar zxf git-lfs-darwin-amd64-1.x.y.tar.gz cd git-lfs-1.x.y PREFIX=~/bin/ sh ./install.sh
This will install into $HOME/bin - make sure this is in your path by default, e.g. add
PATH=$PATH:$HOME/bin export PATH
to your .bash_profile/.zprofile
Now read the GIT-LFS documentation on the GIT-LFS site or GitLab's LFS documtentation.
If you are importing binaries into GIT it is worthwhile reading about the '.gitattributes' file and how this can be used to optimise GIT's handling of binary files, or even text files that should be handled as binaries.
Recommended contents for '.gitignore'
In the root of your GIT repository you should create a text file called .gitignore, containing a list of files that GIT should not try to manage. We recommend that this contains (at least) the following:
*.o *.pyc __pycache__
~* .DS_Store
This instructs GIT to ignore any C/C++ object files, compiled python scripts, vi temporary files and macOS Finder database files.