Git has become the most popular version control system in the Open Source world, and more and more companies are also using it.
The source code history when managed by Git is supposed to be immutable, because Git uses a content addressed database. The Git objects are indexed by their SHA-1 hash.
When mistake have been made, or to make some history based features more useful or more reliable, though, it can be interesting to transform the Git source code history. To do that it is a good idea to use git replace.
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
New Views on your History with git replace
1. New Views on your History
with git replace
Christian Couder, Murex
chriscool@tuxfamily.org
OSDC.fr 2013
October 5, 2013
2. About Git
A Distributed Version Control System
(DVCS):
● created by Linus Torvalds
● maintained by Junio Hamano
● since 2005
● prefered VCS among open source
developers
3. Git Design
Git is made of these things:
● “Objects”
● “Refs”
● config, indexes, logs, hooks,
grafts, packs, ...
Only “Objects” and “Refs” are
transferred from one repository to
another.
4. Git Objects
● Blob: content of a file
● Tree: content of a directory
● Commit: state of the whole source code
● Tag: stamp on an object
5. Git Objects Storage
● Git Objects are stored in a
content addressable database.
● The key to retrieve each Object is the
SHA-1 of the Object’s content.
● A SHA-1 is a 160-bit / 40-hex / 20-byte
hash value which is considered
unique.
6. Blob
SHA1: e8455...
blob = content of a file
blob
size
/* content of this blob, it can be
anything like an image, a video,
... but most of the time it is
source code like:*/
#include <stdio.h>
int main(void)
{
printf("Hello world!n");
return 0;
}
7. Example of storing and
retrieving a blob
# echo “Whatever…” | git hash-object -w --stdin
aa02989467eea6d8e0bc68f3663de51767a9f5b1
# git cat-file -p aa02989467
Whatever...
11. Example of storing and
retrieving a commit (1)
# TREE=0625da548ef0a7038c44b480f10d5550b2f2f962
# ME=”Christian Couder <chriscool@tuxfamily.org>”
# DATE=$(date "+%s %z")
# (echo -e "tree $TREEnauthor $ME $DATE";
echo -e "committer $ME $DATEnnfirst commit")
| git hash-object -t commit -w --stdin
37449e955443883a0a888ee100cfd0a7ba7927b3
12. Example of storing and
retrieving a commit (2)
# git cat-file -p 37449e9554
tree 0625da548ef0a7038c44b480f10d5550b2f2f962
author Christian Couder <chriscool@tuxfamily.org> 1380447450 +0200
committer Christian Couder <chriscool@tuxfamily.org> 1380447450 +0200
first commit
13. Git Objects Relations
SHA1: e84c7...
Commit
SHA1: 0de24...
size
tree
29c43...
parents
()
author
Christian
committer
Christian
Blob
size
SHA1: 29c43...
int main() { ... }
Tree
Initial commit
blob
tree
size
hello.c 0de24...
doc
98ca9...
SHA1: 98ca9...
Tree
size
blob readme 677f4...
blob
SHA1: 98ca9...
Commit
tree
install
23ae9...
size
5c11f...
parents
(e84c7...)
author
Arnaud
committer
Arnaud
Change hello.c
SHA1: 5c11f...
SHA1: bc789...
Tree
blob
tree
size
hello.c bc789...
doc
98ca9...
Blob
size
int main(void) { ... }
14. Git Refs
● Head: branch,
.git/refs/heads/
● Tag: lightweight tag,
.git/refs/tags/
● Remote: distant repository,
.git/refs/remotes/
● Note: note attached to an object,
.git/refs/notes/
● Replace: replacement of an object,
.git/refs/replace/
15. Example of storing and
retrieving a branch
# git update-ref refs/heads/master 37449e9554
# git rev-parse master
37449e955443883a0a888ee100cfd0a7ba7927b3
# git reset --hard master
HEAD is now at 37449e9 first commit
# cat whatever.txt
Whatever...
16. Result from previous examples
master
commit 37449e9554
tree 0625da548e
blob aa02989467
17. Commits in Git form a DAG
(Directed Acyclic Graph)
● history direction is from left to right
● new commits point to their parents
18. git bisect
B
● B introduces a bad behavior called "bug" or
"regression"
● red commits are called "bad"
● blue commits are called "good"
19. Problem when bisecting
Sometimes the commit that introduced a bug
will be in an untestable area of the graph.
For example:
W
X
X1
X2
X3
Y
Z
Commit X introduced a breakage, later fixed
by commit Y.
20. Possible solutions
Possible solutions to bisect anyway:
● apply a patch before testing and remove it
afterwards (can be done using "git cherrypick"), or
● create a fixed up branch (can be done with
"git rebase -i"), for example:
X+Y
W
X
X1'
X1
X2'
X2
X3'
X3
Z'
Y
Z
Z1
21. A good solution
The idea is that we will replace Z with Z' so that
we bisect from the beginning using the fixed up
branch.
X+Y
W
X
X1'
X1
$ git replace Z Z'
X2'
X2
X3'
X3
Z'
Y
Z1
Z
22. Grafts
Created mostly for projects like linux
kernel with old repositories.
● “.git/info/grafts” file
● each line describe parents of a
commit
● <commit> <parent> [<parent>]*
● this overrides the content in the
commit
23. Problem with Grafts
They are neither objects nor refs, so
they cannot be easily transferred.
We need something that is either:
● an object, or
● a ref
24. Solution, part 1: replace ref
● It is a ref in .git/refs/replace/
● Its name is the SHA-1 of the
object that should be replaced.
● It contains, so it points to, the
SHA-1 of the replacement object.
25. Solution, part 2: git replace
● git replace [ -f ] <object> <replacement>:
to create a replace ref
● git replace -d <object>:
to delete a replace ref
● git replace [ -l [ pattern ] ]:
to list some replace refs
26. Replace ref transfer
● as with heads, tags, notes, remotes
● except that there are no shortcuts and
you must be explicit
● refspec: refs/replace/*:refs/replace/*
● refspec can be configured (in .git/config),
or used on the command line (after git
push/fetch <remote>)
27. Creating replacement objects
When it is needed the following commands
can help:
● git rebase [ -i ]
● git cherry-pick
● git hash-object
● git filter-branch
28. What can it be used for?
Create new views of your history.
Right now only 2 views are possible:
● the view with all the replace refs enabled
● the view with all the replace refs disabled,
using --no-replace-objects or the
GIT_NO_REPLACE_OBJECTS
environment variable
29. Why new views?
● split old and new history or merge them
● fix bugs to bisect on a clean history
● fix mistakes in author, committer,
timestamps
● remove big files to have something lighter
to use, when you don’t need them
● prepare a repo cleanup
● mask/unmask some steps
● ...
30. Limitations
● everything is still in the repo
● so the repo is still big
● there are probably bugs
● confusing?
● ...
31. Current and future work
● a script to replace grafts
● fix bugs
● allow subdirectories in .git/refs/replace/
● maybe allow “views” as set of active
subdirectories
● ...
32. Considerations
● best of both world: immutability and
configurability of history
● no true view
● history is important for freedom
33. Many thanks to:
● Junio Hamano (comments, help, discussions,
reviews, improvements),
● Ingo Molnar,
● Linus Torvalds,
● many other great people in the Git and Linux
communities, especially: Andreas Ericsson,
Johannes Schindelin, H. Peter Anvin, Daniel
Barkalow, Bill Lear, John Hawley, ...
● OSDC/OWF organizers and attendants,
● Murex the company I am working for.