Git — On Disk - odigity/academy GitHub Wiki


We're going to discuss how Git stores data, and in particular, the contents of the following Git files:

.git/HEAD
.git/objects/e6/
.git/objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391
.git/refs/heads/
.git/refs/heads/master
.git/refs/heads/foo
.git/refs/tags/
.git/refs/tags/foo_demo

The Object DB

Git stores all committed data as objects in an on-disk object database under .git/objects. The object's name is the SHA-1 of the object's contents, and determines where the object will be written:

  • the first two digits are mapped to a subdirectory
  • the last thirty-eight digits are used as the filename

Thus, an object that hashes to e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 will be stored in .git/objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391.

Object Types

There are four types of Git objects:

  • Blob (Binary Large OBject) — Stores the contents of a file in your project. (Does not store filename, timestamp, or any other metadata.)
  • Tree — The equivalent of a directory; contains a list of file names, each with some type bits and a reference to a blob or tree object.
  • Commit — Contains:
    • author name & email
    • timestamp
    • log message
    • zero or more references to parent commit objects
    • a reference to the root tree object
  • Tag — Contains a reference to another object + some metadata. (Only used when creating an annotated tag, not a lightweight tag.)

A commit has:

  • zero parents if it's the first commit in the repo
  • one parent if it's a regular commit
  • two or more parents if it's a merge commit

Diagram

   Commit -> Tree (/) -> Tree (/images/) -> Blob (logo.png)
      v               -> Blob (index.html)
ParentCommit          -> Blob (main.css)

Refs

Refs (branches and tags) are named references to a commit, and are stored in files named after the ref under .git/refs/heads/ (for branches) and .git/refs/tags (for tags). Their contents is simply the SHA-1 of the commit they point to.

For example:

$ cat .git/refs/heads/master
e102490b28eb803c482f167cf2cc0a974c92e963

HEAD

The branches directory is named heads because a branch is the "head" of a chain of commits — not be confused with HEAD, which is the ref name that points to the current branch and is stored in .git/HEAD:

$ cat .git/HEAD
ref: refs/heads/master

Diagram

HEAD  ->  master  ->  commit4  ->  tree4 (/)  ->  other trees and blobs
                         v
        some_tag  ->  commit2  ->  tree2 (/)  ->  other trees and blobs
                         v
                      commit1  ->  tree1 (/)  ->  other trees and blobs
                         ^ (like commits)
     some_branch  ->  commit3  ->  tree3 (/)  ->  other trees and blobs

Tags

You can tag any Git object.

There are two kinds of tags: lightweight and annotated.

Lightweight

A lightweight tag is just a ref that points to a commit.

$ cat .git/refs/tags/foo
9fbf088968f8c5668ea52bcc07f53042e33bd946

This is the default tag created by git tag.

Annotated

If you create an annotated tag, Git creates a tag object with metadata about the tag, then a ref that points to that tag object.

The metadata is similar to a commit, and includes:

  • timestamp
  • author name & email
  • log message
  • a reference to the tagged object (usually a commit)

Example:

$ cat .git/refs/tags/bar
9585191f37f7b0fb9444f35a9bf50de191beadc2

$ git cat-file -p 9585191f37f7b0fb9444f35a9bf50de191beadc2
object 1a410efbd13591db07496601ebc7a059dd55cfe9
type commit
tag bar
tagger Ofer Nave <[email protected]> Sat May 23 16:48:58 2009 -0700

my log message for tag 'bar'