Logo

HPC @ Uni.lu

High Performance Computing in Luxembourg

Understanding Git Subtree

Recently, I had to spend some time on [re]-designing a fresh new repository for the management of environment modules via Easybuild and the extensive work performed by Fotis prior to his departure. It was the occasion for me to familiarize with a new(ish) git feature called git-subtree and I wanted here to share some notes on it

Pre-requisite: Git

Of course, You should become familiar (if not yet) with Git. Consider these resources:

For advanced users, you may like these slides from Anthony Baire

Nested Sharing of Git code

Often when you develop a project tracked under Git, you wish to include code/elements coming from another Git repository (let’s say a library project) as a dependency.

At this level, a classical option was to rely on Git submodules to refer to a specific commit within the library project, and manage the updating of these commit pointers as the library project evolves. On issue with this approach is that a submodule does not automatically become a part of the main repository. Thus your collaboration have to take particular actions (git clone --recursive [...], git submodule init && git submodule upgrade), not mentioning the trouble that intervene when you upgrade the submodule(s) to a newer version – if you don’t see what I mean, you shall probably take a look at code induced in my generic Makefile for the targets make {setup | update | upgrade} )

My personal feeling on git submodules is that they are sufficient if you don’t expect/foresee frequent changes in the commit you refer to (as submodule). You can make you own opinion by checking the following resources:

In all other cases, a better alternative is to rely on git subtree. Subtree enables modules to be a part of both the parent repo and their own repo at the same time. A subtree module’s code is automatically merged in allowing you to manage the main repo, a simple homogenous project, yet you can commit/push/pull specific modules to their own repo. You can even keep distinct history so the module maintains a log of all the changes that directly pertain to it.

Resources on Git subtree:

Why using git subtree rather than git submodules?

Why use subtree instead of submodule? Quoting atlassian, there are several reasons why you might find subtree better to use:

  • Management of a simple workflow is easy.
  • Older version of git are supported (even before v1.5.2).
  • The sub-project’s code is available right after the clone of the super project is done.
  • subtree does not require users of your repository to learn anything new, they can ignore the fact that you are using subtree to manage dependencies.
  • subtree does not add new metadata files like submodules doe (i.e. .gitmodule).
  • Contents of the module can be modified without having a separate repository copy of the dependency somewhere else.

The drawbacks are:

  • You must learn about a new merge strategy (i.e. subtree).
  • Contributing code back upstream for the sub-projects is slightly more complicated.
  • The responsibility of not mixing super and sub-project code in commits lies with you.

[TIP] In order to keep your commit messages clean, we recommend that people split their commits between the subtrees and the main project as much as possible. That is, if you make a change that affects both the library and the main application, commit it in two pieces. That way, when you split the library commits out later, their descriptions will still make sense. But if this isn’t important to you, it’s not necessary. git subtree will simply leave out the non-library-related parts of the commit when it splits it out into the subproject later.

IMPORTANT: There is not a ton of documentation about the subtree command. What makes things even more complex is that there is a somewhat similar concept call the Git subtree merge strategy. While similar in purpose, these techniques are different and it can make Googling for help a bit tricky.

The general workflow when operating with git subtree is depicted in the below figure:

git subtree workflow

  • Maintain a remote tracking branch which tracks the remote shared library project
  • From this remote, create a subtree as a subdirectory within a branch of your main project. The new sub-directory contains a copy of the shared library source code.
  • Once this is done, we can pull new change sets down from the shared library remote as needed, and merge them into our sub tree using git subtree {pull | merge | push}.

In the sequel and in the attempt to make things concrete, I will duplicate here the configuration I setup when dealing with the various Easybuild repository as subtrees.

Git subtree workflow

Step 1: create the remote(s)

Here nothing special to subtree since the idea is to setup git remote(s) toward the url of the (furture) subtree(s). You can list the currently defined remotes by

$> git remote -v   	

(normally, you shall see at least the origin remote). To add (and immediately fetch) a new remote, use the following command:

Terminal

   git remote add –f {remote_name} {repository_url}

Ex: we will add the forked Easybuild easyblocks, Easybuild easyconfigs and Easybuild framework as git remote, together with the official Easybuild wiki:

    $> git remote add -f easybuild-easyblocks  https://github.com/ULHPC/easybuild-easyblocks.git
    $> git remote add -f easybuild-easyconfigs https://github.com/ULHPC/easybuild-easyconfigs.git
    $> git remote add -f easybuild-framework   https://github.com/ULHPC/easybuild-framework.git
    $> git remote add -f easybuild-wiki        https://github.com/hpcugent/easybuild-wiki.git

Step 2: create the subtree with git subtree add

The general format of the command is as follows:

Terminal

   git subtree add --prefix={path/to/subdir} {remote} [branch] --squash

the --squash option is generally what you want since it will create a single commit for the full commit history of the remote repository - it avoid to somehow pollute your local commit history.

Ex: (using the previous remotes):

    $> git subtree add --prefix easybuild/easyblocks  --squash easybuild-easyblocks/develop
    $> git subtree add --prefix easybuild/easyconfigs --squash easybuild-easyconfigs/uni.lu
    $> git subtree add --prefix easybuild/framework   --squash easybuild-framework/develop
    $> git subtree add --prefix easybuild/wiki        --squash easybuild-wiki/master

This way you end with a clean directory layout where easybuild/* hold in separate directories the latest version of the corresponding repository.

Step 3: pulling the latest changes with git subtree pull

If you want later on to pull the latest changes operated within each subtrees, simply use:

Terminal

   git fetch {remote}
   git subtree pull --prefix={path/to/subdir} {remote} [branch] --squash

Ex:

    $> git fetch easybuild-easyblocks    # fetch latest changes of the remote before merging
    $> git subtree pull --prefix easybuild/easyblocks easybuild-easyblocks develop --squash

Of course, you have to repeat this for each subtree remotes:

    $> git fetch easybuild-easyconfigs
    $> git subtree pull --prefix easybuild/easyconfigs easybuild-easyconfigs uni.lu --squash
  		
    $> git fetch easybuild-framework
    $> git subtree pull --prefix easybuild/framework easybuild-framework develop --squash
  		
    $> git fetch easybuild-wiki
    $> git subtree pull --prefix easybuild/wiki easybuild-wiki master --squash

Step 4: pushing (split) commits

You can push changes commited in the subtree sub-directory back to their respective remotes using

Terminal

   git subtree push --prefix={path/to/subdir} {remote} [branch]

You might want to filter (split) the commit history related only to the subtree: see this blog for details.

Optional steps

To see the differences between the local subtree and the remote, just use:

Terminal

   git diff {remote}/{branch} {current_branch}:{path/to/subdir}

Ex (assuming you are in the master branch):

    $> git diff easybuild-easyblocks/develop  master:easybuild/easyblocks
    $> git diff easybuild-easyconfigs/uni.lu  master:easybuild/easyconfigs
    $> git diff easybuild-framework/develop   master:easybuild/framework
    $> git diff easybuild-wiki/master         master:easybuild/wiki

Concluding remarks

In case you’re interested to embedded the code of the first steps (corresponding to the main actions you wish to perform with subtrees) into a GNU Makefile, you might be interested to see the code of my generic Makefile for the targets make subtree_{setup,up,diff} – It assumes you complete the variable GIT_SUBTREE_REPOS