Internals - GradedJestRisk/git-training GitHub Wiki
A great introduction, in video What about this ?
To allow offline mode in distributed systems, all identifiers in Git must share the same strategy. Therefore, content-based identification has been used, specifically sha1 hash.
In git, every time on object should be specified, its reference is its SHA-1 hash, shortened SHA1
If you save diff, you use far less space. But you have high computational complexity: to check-out version N of a file, you have to sum add N-1 diffs
If you save file content, you use far more space. But you have low computational complexity: to check-out version N of a file, you have to check-out one file. Git save file content and can save space because:
- identical file content are not duplicated (same BLOB);
- you can use compression on BLOBS (packed-file)
Stored in .git/objects
, use watch -n .5 tree .git
to see them created
3 objects:
- BLOB;
- tree;
- commit.
- BLOB is raw data (no metadata, like filename)
- tree is a set of entries (hash-tree)
- filename + permissions + BLOB's reference
- reference a its sub-folder
- commit (aka snapshot, highest-level object in the repo)
- folder's reference
- parent commit's reference (backward in time)
- commiter metadata (name, email)
- 2 files with same content, different filename
- = 2 entries in the tree, each with its filename, with same reference
- = one BLOB (stored once), pointed by the reference
- this is surprising, because you really have two files
- 2 folders with the same content:
- have same reference if all actions were committed by same people, in the same sequence, no matter the filesystem
- have different reference if one step in the sequence action has been splitted in 2 commits
- this is surprising, because everything look the same if you look at the final state
A folder contains 2 files:
- README.md (empty)
- index.js, content
console.log("hello, world !");
BLOB 648dda
console.log("hello, world !");+ 1 empty BLOB (README.md)
Tree
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 README.md 100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 index.js <= why same ???
Commit 0bc16
commit 0bc16eaa1e1782e399b9cb069e41ff0224a99234 tree 5145704886f0914c21d7c6f93856accde5ee80a0 parent c5a6b0640bcc130a14490c31bb40129e2b224bb1 author Pierre TOP <[email protected]> 1585207825 +0100 committer Pierre TOP <[email protected]> 1585207825 +0100 Add index.js
List:
- see object content
git show <REF>
- see "pretty print" object content, eg. commit
git show --pretty=raw <REF>
- see tree "pretty print"
git ls-tree <REF>
General:
- reference to a commit ("tip of the branch")
diff show difference between working directory and staging diff show --staged difference between staging area and repository
A branch is pointer to one commit, kind of "named reference" (string) to reference (hash). This commit is known as the "tip of the branch".
HEAD is another pointer to the last commit created, so git reset --hard HEAD checkout all in the previous commit (discarding all changes). This will discard any unreferenced commit over time, but you can use git reflog to get the commit hash to reset again, or use the HEAD{N} notation (the reference pointed to by HEAD N steps ago
git rm is same as:
- rm
- git add