Git — On Disk - odigity/academy GitHub Wiki
We're going to discuss how Git stores data, and in particular, the contents of the following Git files:
.git/HEAD
.git/objects/e6/
.git/objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391
.git/refs/heads/
.git/refs/heads/master
.git/refs/heads/foo
.git/refs/tags/
.git/refs/tags/foo_demo
The Object DB
Git stores all committed data as objects in an on-disk object database under .git/objects
.
The object's name is the SHA-1 of the object's contents, and determines where the object will be written:
- the first two digits are mapped to a subdirectory
- the last thirty-eight digits are used as the filename
Thus, an object that hashes to e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
will be stored in .git/objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391
.
Object Types
There are four types of Git objects:
- Blob (Binary Large OBject) — Stores the contents of a file in your project. (Does not store filename, timestamp, or any other metadata.)
- Tree — The equivalent of a directory; contains a list of file names, each with some type bits and a reference to a blob or tree object.
- Commit — Contains:
- author name & email
- timestamp
- log message
- zero or more references to parent commit objects
- a reference to the root tree object
- Tag — Contains a reference to another object + some metadata. (Only used when creating an annotated tag, not a lightweight tag.)
A commit has:
- zero parents if it's the first commit in the repo
- one parent if it's a regular commit
- two or more parents if it's a merge commit
Diagram
Commit -> Tree (/) -> Tree (/images/) -> Blob (logo.png)
v -> Blob (index.html)
ParentCommit -> Blob (main.css)
Refs
Refs (branches and tags) are named references to a commit, and are stored in files named after the ref under .git/refs/heads/
(for branches) and .git/refs/tags
(for tags).
Their contents is simply the SHA-1 of the commit they point to.
For example:
$ cat .git/refs/heads/master
e102490b28eb803c482f167cf2cc0a974c92e963
HEAD
The branches directory is named heads
because a branch is the "head" of a chain of commits — not be confused with HEAD
, which is the ref name that points to the current branch and is stored in .git/HEAD
:
$ cat .git/HEAD
ref: refs/heads/master
Diagram
HEAD -> master -> commit4 -> tree4 (/) -> other trees and blobs
v
some_tag -> commit2 -> tree2 (/) -> other trees and blobs
v
commit1 -> tree1 (/) -> other trees and blobs
^ (like commits)
some_branch -> commit3 -> tree3 (/) -> other trees and blobs
Tags
You can tag any Git object.
There are two kinds of tags: lightweight and annotated.
Lightweight
A lightweight tag is just a ref that points to a commit.
$ cat .git/refs/tags/foo
9fbf088968f8c5668ea52bcc07f53042e33bd946
This is the default tag created by git tag
.
Annotated
If you create an annotated tag, Git creates a tag object with metadata about the tag, then a ref that points to that tag object.
The metadata is similar to a commit, and includes:
- timestamp
- author name & email
- log message
- a reference to the tagged object (usually a commit)
Example:
$ cat .git/refs/tags/bar
9585191f37f7b0fb9444f35a9bf50de191beadc2
$ git cat-file -p 9585191f37f7b0fb9444f35a9bf50de191beadc2
object 1a410efbd13591db07496601ebc7a059dd55cfe9
type commit
tag bar
tagger Ofer Nave <[email protected]> Sat May 23 16:48:58 2009 -0700
my log message for tag 'bar'