Monday, February 08, 2010

Splitting a git repo

Its been almost an year since I last blogged and what can I say, micro-blogging killed blogging for me. Even micro-blogging has got to a point, I don't do as much as I used to. Perhaps the end of web 2.0 or perhaps I'm getting too old for this :)

Anyhow, getting back to the subject of this quick post, it seems splitting a git repo into two separate git repos is somewhat obscure and required a bit of googling around. Fortunately I came across this great blog post, but wanted to summarize it in one place (the author made me look at several pages to put it together).

Say I have a git project called foo.repo which had a subdirectory called bar, that I now want to make its own separate git project called bar.repo.

Current state

foo.repo/
.git/
bar/
abc/
xyz/


target state

foo.repo/
.git/
abc/
xyz/

bar.repo/
.git/



Step 1 : Clone existing repo as desired repo on the local clone


$ git clone --no-hardlinks foo.repo bar.repo


Step 2: Filter-branch and reset to exclude other files, so they can be pruned:


$ cd bar.repo
$ git filter-branch --subdirectory-filter bar HEAD -- --all
$ git reset --hard
$ git gc --aggressive
$ git prune


Step 3: Create new empty repo on git server


$ mkdir /var/git/bar.git
$ cd /var/git/bar.git
$ git --bare init



Step 4: Back on the local machine, replace remote origin to point to new repo


$ cd bar.repo
$ git remote rm origin
$ git remote add origin git@git-server:bar.repo
$ git push origin master


Step 5: Remove bar directory from original foo.repo


$ git filter-branch --tree-filter "rm -rf bar" --prune-empty HEAD

or supposed to be faster, I haven't tried
$ git filter-branch --index-filter "git rm -r -f --cached --ignore-unmatch bar" --prune-empty HEAD




Reference


13 comments:

Gaveen said...

When I started Git, I had to lookup for instructions for this several times. A very useful guide to a new Git user.

Bud said...

Thanx. Yes I've kept putting this off myself but was compelled to finally do this. Documented it so I can find it in the future.

Lakshan said...

I assume you needed to split bar.repo from the main foo.repo as a project architectural decision.

In case, if you only wanted to manage bar.repo as a separate repo from the main repo you could've achieved it using Git Submodules.

Bud said...

Thanks Lakshan. Didn't know you could do that with git :) But my requirement was to split and yes its due to early architectural decisions. You start something small as part of one project and then realize you want to spin that part off as a separate project without loosing commit history.

driving lessons solihull said...

I have a git repository containing several modules, each in their own sub folder, and I'd like to split them into independent repositories, ideally preserving as much of their individual histories as possible. Is there a canonical way of doing this?

Benjamin Margolis said...

Thanks for posting this! I was trying to sort out exactly which combination of StackOverflow posts I needed to follow. Having the explicit commands and brief explanation was just what I needed!

Bud said...

@Benjamin I'm glad you found it useful

Anonymous said...

Why don't we just clone the repo and then do 'git rm' and 'git commit' , i.e. split these repos going forward , and therefore preserving as much history as possible?

Anonymous said...

Thanks for your instructions. Very clear and mostly working for me. I'm getting stuck at the final stage :)

How do I commit the changes which have been made to the original foo repo. This is what I get at home after cleaning it out:

$ git s
# On branch master
# Your branch and 'origin/master' have diverged,
# and have 473 and 487 different commits each, respectively.
#
nothing to commit (working directory clean)

Do I just push it back or do I have to do anything extra such as a commit?

Bud Siddhisena said...

@Anonymous Yup just push foo to origin. As you can see it says

"# Your branch and 'origin/master' have diverged"

So you should be able to

$ git push origin master

Hopefully no one else has committed to origin where you'd have to pull & possibly deal with a merge conflict.

Anonymous said...

Thanks Bud for your reply. I'm still having some problems though and wouldn't mind a bit more much appreciated advice.

I try the push as advised but get knocked back (see below). When I do a pull to try to fix the conflicts I just pull down the folder which has been just removed with filter-branch. I gather that I may be able to force the push with -f. Any comments?

$ git push origin master
To git@bitbucket.org:xxx/yyy.git
! [rejected] master -> master (non-fast-forward)
error: failed to push some refs to 'git@bitbucket.org:xxx/yyy.git'
To prevent you from losing history, non-fast-forward updates were rejected
Merge the remote changes (e.g. 'git pull') before pushing again. See the
'Note about fast-forwards' section of 'git push --help' for details.

Many thanks.

Bud Siddhisena said...

@Anonymous Well it looks like you'll have to pull before you push. Make a backup of the directory and try a pull. If there is a conflict you'll have to fix it locally and then commit those changes before pushing.

Don't force it!

Have a read of my other blog post where I talk about using fetch instead of pull

http://www.geekaholic.org/2012/05/peer-to-peer-collaborative-development.html

Miro Spönemann said...

The filter-branch command in step 5 prunes commits that are empty after removing unwanted files. However, it does not remove merge commits that have become useless:

C
|\
B|
|/
A

In order to eliminate the useless merge commit C, a filter-branch command with parent-filter is required.