As I wrote in git-pull-mishap-and-git-clean, the problem with git is that, operations are non-atomic. For example, if a git-pull is interrupted, or fails due to lack of disk space or network problem, the resulting repository will be in an in between state. This is problematic. It would be nice if git operations are atomic, in this sense.
A solution could be to have the git repository inside another git repository. For example, have the linux-2.6.git repo inside another git repo, say git-o-git. When you do a pull in linux-2.6.git, and if it succeeds do a `git commit -a` in the outer git-o-git. If it fails for some reason, one can go back to previous version of linux-2.6.git repo by doing a `git clean -d -f` and `git checkout -f` in the outer git-o-git.
It should be possible to add wrapper scripts to `git` and do this auto magically. May be it could be called as git WC, as it is built on top of git porcelain, which is built on top of git plumbing commands. ;-)
Has anyone tried this already?
Showing posts with label git. Show all posts
Showing posts with label git. Show all posts
08 July 2009
Atomic git
Labels:
computer,
English,
git,
idea,
planetsuse
GIT pull mishap and git clean
When I was doing a `git pull` from a remote repo, I ran out of disk space, which resulted in errors like
Updating ce8a742..faf80d6
error: git checkout-index: unable to write file drivers/usb/gadget/s3c-hsotg.c
error: git checkout-index: unable to write file drivers/usb/gadget/u_audio.c
error: git checkout-index: unable to write file drivers/usb/gadget/u_audio.h
Freed some disk space and re-ran the `git-pull`. But it failed saying
$ git pull
Updating ce8a742..faf80d6
error: Untracked working tree file 'Documentation/ABI/testing/sysfs-bus-pci-devices-cciss' would be overwritten by merge.
Some of the files were created by the previous pull, but they are considered untracked files as the previous pull failed. `git pull -f` didnt help as git was reluctant to delete my untracked files.
Deleting huge list of files one by one was a pain. I was thinking of doing a git status to get the list of untracked files and deleting them. But Jony rescued me by telling me about `git clean` which can delete all the untracked files!
But I would really like to see a way to pull/checkout and over-write the untracked files, so that other untracked files, which will not be over-written, need not be deleted. Is there a way to do it?
Updating ce8a742..faf80d6
error: git checkout-index: unable to write file drivers/usb/gadget/s3c-hsotg.c
error: git checkout-index: unable to write file drivers/usb/gadget/u_audio.c
error: git checkout-index: unable to write file drivers/usb/gadget/u_audio.h
Freed some disk space and re-ran the `git-pull`. But it failed saying
$ git pull
Updating ce8a742..faf80d6
error: Untracked working tree file 'Documentation/ABI/testing/sysfs-bus-pci-devices-cciss' would be overwritten by merge.
Some of the files were created by the previous pull, but they are considered untracked files as the previous pull failed. `git pull -f` didnt help as git was reluctant to delete my untracked files.
Deleting huge list of files one by one was a pain. I was thinking of doing a git status to get the list of untracked files and deleting them. But Jony rescued me by telling me about `git clean` which can delete all the untracked files!
But I would really like to see a way to pull/checkout and over-write the untracked files, so that other untracked files, which will not be over-written, need not be deleted. Is there a way to do it?
13 May 2009
Cloning multiple git repos
Many of the maintainers of linux-kernel maintain a git repository. I usually have clones of various such repositories.
For example I have clones of
Linus Torvalds's repo:
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
Block Maintainer, Jens Axboe's repo:
git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-2.6-block.git
...
If I do individual clones of all these repositories, it downloads and maintains duplicate copies of same objects wasting disk space, and network bandwidth.
git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
git clone git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-2.6-block.git
So I was looking for a way to share the common objects so that duplicate objects wont waste disk and network. And no surprise, git has a way to do that. Just that I was unaware of a simple option, "--reference".
git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
git clone --reference linux-2.6/ git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-2.6-block.git
Difference between cloning Jens' git with and without --reference to Linus's git.
# git clone git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-2.6-block.git
Initialized empty Git repository in /home/knikanth/labs-sw/linus/linux-2.6-block/.git/
remote: Counting objects: 1180249, done.
remote: Compressing objects: 100% (295444/295444), done.
remote: Total 1180249 (delta 984716), reused 1073684 (delta 878311)
Receiving objects: 100% (1180249/1180249), 289.32 MiB | 496 KiB/s, done.
Resolving deltas: 100% (984716/984716), done.
Checking out files: 100% (27842/27842), done.
# du -sh linux-2.6-block/
714M linux-2.6-block/
# git clone --reference linux-2.6/ git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-2.6-block.git
Initialized empty Git repository in /home/knikanth/labs-sw/linus/linux-2.6-block/.git/
remote: Counting objects: 111061, done.
remote: Compressing objects: 100% (19021/19021), done.
remote: Total 100463 (delta 84138), reused 95679 (delta 79959)
Receiving objects: 100% (100463/100463), 23.21 MiB | 1209 KiB/s, done.
Resolving deltas: 100% (84138/84138), completed with 8189 local objects.
Checking out files: 100% (27842/27842), done.
# du -sh linux-2.6-block/
468M linux-2.6-block/
--reference automatically sets up .git/objects/info/alternates to obtain objects from the reference repository. Now I wonder whether it is possible to have circular references, multiple references, etc.. The plural file name, "alternates" suggests it should be possible, but "git clone" ignores multiple --reference on the command line!
BTW git uses SHA-1 digests to identify objects. I wonder what is the chance of a SHA-1 collision and how git handles it? The SHA-1 digest has 40 Hex-digits == 160 bits.. So at most, only 2160 objects are possible. :-)
For example I have clones of
Linus Torvalds's repo:
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
Block Maintainer, Jens Axboe's repo:
git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-2.6-block.git
...
If I do individual clones of all these repositories, it downloads and maintains duplicate copies of same objects wasting disk space, and network bandwidth.
git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
git clone git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-2.6-block.git
So I was looking for a way to share the common objects so that duplicate objects wont waste disk and network. And no surprise, git has a way to do that. Just that I was unaware of a simple option, "--reference".
git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
git clone --reference linux-2.6/ git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-2.6-block.git
Difference between cloning Jens' git with and without --reference to Linus's git.
# git clone git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-2.6-block.git
Initialized empty Git repository in /home/knikanth/labs-sw/linus/linux-2.6-block/.git/
remote: Counting objects: 1180249, done.
remote: Compressing objects: 100% (295444/295444), done.
remote: Total 1180249 (delta 984716), reused 1073684 (delta 878311)
Receiving objects: 100% (1180249/1180249), 289.32 MiB | 496 KiB/s, done.
Resolving deltas: 100% (984716/984716), done.
Checking out files: 100% (27842/27842), done.
# du -sh linux-2.6-block/
714M linux-2.6-block/
# git clone --reference linux-2.6/ git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-2.6-block.git
Initialized empty Git repository in /home/knikanth/labs-sw/linus/linux-2.6-block/.git/
remote: Counting objects: 111061, done.
remote: Compressing objects: 100% (19021/19021), done.
remote: Total 100463 (delta 84138), reused 95679 (delta 79959)
Receiving objects: 100% (100463/100463), 23.21 MiB | 1209 KiB/s, done.
Resolving deltas: 100% (84138/84138), completed with 8189 local objects.
Checking out files: 100% (27842/27842), done.
# du -sh linux-2.6-block/
468M linux-2.6-block/
--reference automatically sets up .git/objects/info/alternates to obtain objects from the reference repository. Now I wonder whether it is possible to have circular references, multiple references, etc.. The plural file name, "alternates" suggests it should be possible, but "git clone" ignores multiple --reference on the command line!
BTW git uses SHA-1 digests to identify objects. I wonder what is the chance of a SHA-1 collision and how git handles it? The SHA-1 digest has 40 Hex-digits == 160 bits.. So at most, only 2160 objects are possible. :-)
Subscribe to:
Posts (Atom)