Splicing git repositories together

I’ve developed a tool called splice-repos to combine our two parallel git repositories together (I’ve separately described the background of our awkward repository setup). There are a few existing ways to do this (with some interesting discussions on Stack Overflow):

  • Just start afresh with a new repository, and commit a fresh copy of the source code for each branch. I hate losing history, so that’s a no-go.
  • A single merge point per branch (as used in subtree-merge) – join the two git repositories into a single one (with a standard git process, or something like git-stitch-repo), and then do a merge commit for each branch that joins the two branches together. This gives a single point to work on from into the future, but doesn’t allow you to easily see what the state of the two repositories was at a particular date.

Since we’ve been developing on these two projects in parallel, and building from the current point in time of each project on a branch, they really are one project and should have been in one repository, so I wanted to recreate the git history in such a way that it looked like they were.

I found a Python library called git_fast_filter from the jbosstools-gitmigration project which allows fairly intuitive processing of the formats produced by git-fast-export and git-fast-import. They even had an example script for splicing two repositories together – but using a predetermined order of four commits. So I’ve adapted it to do what I wanted:

  • Each branch contains a sequence of commits from both repositories, interleaved in the order they were originally committed
  • The process is repeatable, and generates the same hashes for commits when run again
  • The process can be done incrementally (so that we can start working on the new repository for some branches, while older branches are still maintained in the old repos)

Since we also need to rename branches etc, I do this in two passes – one to export the repositories, filter them and rename/exclude branches, files etc to prevent conflicts (using a renamer tool that I wrote with git_fast_filter), and then the splice_repos program which interweaves the history of branches together into the new target repository. Handling an incremental import was interesting as it needs to be able to refer to previously spliced commits that come from the original repositories – this is currently handled through a large JSON file which records the state of the merge for reuse. We need incremental import as there’s lots of infrastructure to switch, so we plan to do that gradually (switching one branch at a time to be managed through the new repository, and allowing commits from the other to keep on being imported…)

This is currently all living in the St James Software fork of the jbosstools-gitmigration repository; there’s a pull request to get it merged but I suspect it’s diverging from the maintainer’s purpose and may become a separate tool in its own right…

Tagged with: , , ,
Posted in Coding, Tools
0 comments on “Splicing git repositories together
1 Pings/Trackbacks for "Splicing git repositories together"
  1. […] David Fraser's splice-repos and Philippe Bruhat's git-stitch-repo solve the same problem. However, they use git-fast-import/export to move all the data around, which makes them less efficient than they could be, as they need to shuffle all the data from the target directories directly. […]

Leave a Reply