Recently, I had to spend some time on [re]-designing a fresh new repository for the management of environment modules via Easybuild and the extensive work performed by Fotis prior to his departure. It was the occasion for me to familiarize with a new(ish) git feature called git-subtree and I wanted here to share some notes on it
Pre-requisite: Git
Of course, You should become familiar (if not yet) with Git. Consider these resources:
For advanced users, you may like these slides from Anthony Baire
Nested Sharing of Git code
Often when you develop a project tracked under Git, you wish to include code/elements coming from another Git repository (let’s say a library project) as a dependency.
At this level, a classical option was to rely on Git submodules to refer to a specific commit within the library project, and manage the updating of these commit pointers as the library project evolves. On issue with this approach is that a submodule does not automatically become a part of the main repository. Thus your collaboration have to take particular actions (git clone --recursive [...]
, git submodule init && git submodule upgrade
), not mentioning the trouble that intervene when you upgrade the submodule(s) to a newer version – if you don’t see what I mean, you shall probably take a look at code induced in my generic Makefile for the targets make {setup | update | upgrade}
)
My personal feeling on git submodules is that they are sufficient if you don’t expect/foresee frequent changes in the commit you refer to (as submodule). You can make you own opinion by checking the following resources:
- Git Submodules: Core Concept, Workflows And Tips
- man page
- Why your company shouldn’t use Git submodules
In all other cases, a better alternative is to rely on git subtree. Subtree enables modules to be a part of both the parent repo and their own repo at the same time. A subtree module’s code is automatically merged in allowing you to manage the main repo, a simple homogenous project, yet you can commit/push/pull specific modules to their own repo. You can even keep distinct history so the module maintains a log of all the changes that directly pertain to it.
Resources on Git subtree
:
Why using git subtree
rather than git submodules
?
Why use subtree instead of submodule? Quoting atlassian, there are several reasons why you might find subtree better to use:
- Management of a simple workflow is easy.
- Older version of git are supported (even before v1.5.2).
- The sub-project’s code is available right after the clone of the super project is done.
subtree
does not require users of your repository to learn anything new, they can ignore the fact that you are using subtree to manage dependencies.subtree
does not add new metadata files like submodules doe (i.e. .gitmodule).- Contents of the module can be modified without having a separate repository copy of the dependency somewhere else.
The drawbacks are:
- You must learn about a new merge strategy (i.e.
subtree
). - Contributing code back upstream for the sub-projects is slightly more complicated.
- The responsibility of not mixing super and sub-project code in commits lies with you.
[TIP] In order to keep your commit messages clean, we recommend that people split their commits between the subtrees and the main project as much as possible. That is, if you make a change that affects both the library and the main application, commit it in two pieces. That way, when you split the library commits out later, their descriptions will still make sense. But if this isn’t important to you, it’s not necessary. git subtree will simply leave out the non-library-related parts of the commit when it splits it out into the subproject later.
IMPORTANT: There is not a ton of documentation about the subtree command. What makes things even more complex is that there is a somewhat similar concept call the Git subtree merge strategy. While similar in purpose, these techniques are different and it can make Googling for help a bit tricky.
The general workflow when operating with git subtree
is depicted in the below figure:
- Maintain a
remote
tracking branch which tracks the remote shared library project - From this remote, create a
subtree
as a subdirectory within a branch of your main project. The new sub-directory contains a copy of the shared library source code. - Once this is done, we can pull new change sets down from the shared library remote as needed, and merge them into our sub tree using
git subtree {pull | merge | push}
.
In the sequel and in the attempt to make things concrete, I will duplicate here the configuration I setup when dealing with the various Easybuild repository as subtrees.
Git subtree
workflow
Step 1: create the remote(s)
Here nothing special to subtree since the idea is to setup git remote(s) toward the url of the (furture) subtree(s). You can list the currently defined remotes by
$> git remote -v
(normally, you shall see at least the origin
remote). To add (and immediately fetch) a new remote, use the following command:
Terminal
|
Ex: we will add the forked Easybuild easyblocks, Easybuild easyconfigs and Easybuild framework as git remote, together with the official Easybuild wiki:
$> git remote add -f easybuild-easyblocks https://github.com/ULHPC/easybuild-easyblocks.git
$> git remote add -f easybuild-easyconfigs https://github.com/ULHPC/easybuild-easyconfigs.git
$> git remote add -f easybuild-framework https://github.com/ULHPC/easybuild-framework.git
$> git remote add -f easybuild-wiki https://github.com/hpcugent/easybuild-wiki.git
Step 2: create the subtree with git subtree add
The general format of the command is as follows:
Terminal
|
the --squash
option is generally what you want since it will create a single commit for the full commit history of the remote repository - it avoid to somehow pollute your local commit history.
Ex: (using the previous remotes):
$> git subtree add --prefix easybuild/easyblocks --squash easybuild-easyblocks/develop
$> git subtree add --prefix easybuild/easyconfigs --squash easybuild-easyconfigs/uni.lu
$> git subtree add --prefix easybuild/framework --squash easybuild-framework/develop
$> git subtree add --prefix easybuild/wiki --squash easybuild-wiki/master
This way you end with a clean directory layout where easybuild/*
hold in separate directories the latest version of the corresponding repository.
Step 3: pulling the latest changes with git subtree pull
If you want later on to pull the latest changes operated within each subtrees, simply use:
Terminal
|
Ex:
$> git fetch easybuild-easyblocks # fetch latest changes of the remote before merging
$> git subtree pull --prefix easybuild/easyblocks easybuild-easyblocks develop --squash
Of course, you have to repeat this for each subtree remotes:
$> git fetch easybuild-easyconfigs
$> git subtree pull --prefix easybuild/easyconfigs easybuild-easyconfigs uni.lu --squash
$> git fetch easybuild-framework
$> git subtree pull --prefix easybuild/framework easybuild-framework develop --squash
$> git fetch easybuild-wiki
$> git subtree pull --prefix easybuild/wiki easybuild-wiki master --squash
Step 4: pushing (split) commits
You can push changes commited in the subtree sub-directory back to their respective remotes using
Terminal
|
You might want to filter (split
) the commit history related only to the subtree: see this blog for details.
Optional steps
To see the differences between the local subtree and the remote, just use:
Terminal
|
Ex (assuming you are in the master
branch):
$> git diff easybuild-easyblocks/develop master:easybuild/easyblocks
$> git diff easybuild-easyconfigs/uni.lu master:easybuild/easyconfigs
$> git diff easybuild-framework/develop master:easybuild/framework
$> git diff easybuild-wiki/master master:easybuild/wiki
Concluding remarks
In case you’re interested to embedded the code of the first steps (corresponding to the main actions you wish to perform with subtrees) into a GNU Makefile, you might be interested to see the code of my generic Makefile for the targets make subtree_{setup,up,diff}
– It assumes you complete the variable GIT_SUBTREE_REPOS