Git — Core Model - odigity/academy GitHub Wiki
Git is a revision control system usually used for source code, though storing binary assets is acceptable and common (preferably small ones).
Git has a radically different architecture than RCSes that came before it, so be careful when trying to apply your understanding of another system (like CVS, Subversion, or MS SourceSafe) to learning Git. Even some commands that seem similar probably don't behave the way you expect.
It is Git's innovative architecture that makes possible Git's two "killer" features:
- Cheap Branches — In most previous RCSes, creating a branch was an expensive and slow operation, which discourage casual branching. In Git, creating a branch is almost zero effort, which has significantly informed the kinds of workflows that Git users have evolved and adopted.
- Fully Peer-to-Peer — Most previous RCSes relied on a central repository to provide access control and control commit acceptance / rejection. Git is completely peer-to-peer, which means no repository is more special than any other — all can operate independently, and all can communicate with each other to share data.
Git is fundamentally a content-addressable filesystem with a RCS user interface written on top of it.
Commits
Git thinks about its data more like a stream of snapshots. A Git repository stores a representation of many snapshots of a tree of files (usually a tree of source code) which we will call the "project".
Every time you perform a commit, you creating a new snapshot of your project as it exists at that point. Each snapshot (commit) has a timestamp, commit message, and a pointer to the previous snapshot in the stream. This builds a chain of snapshots all the way back to your initial commit.
C1 <- C2 <- C3 (you are here)
You can view the history of commits, compare any two (or more) commits, jump back to a previous snapshot, etc.
Branches
A branch is simply a lightweight movable pointer to a commit.
While commits are identified by a checksum (aka hash) called a SHA-1, branches are usually given nice, meaningful names like edge
and feature7
.
SHA-1 is a cryptographically-secure hashing algorithm that turns any value into a 40-hexdigit string that looks like this: 8bb42b072078fe86a1b7568393bcd47929d3a784
master
Every new repository starts with a default branch called master
.
There is nothing special or magical about this branch other than the fact that it's created for you, and that nearly every repository you encounter will have it.
HEAD and Current Branch
Your git repository keeps track of your current branch.
(In a newly-created repository, that will obviously be master
.)
There is a special pointer called HEAD that always points to the current branch.
At any time, you can switch to another branch by checking out that branch.
When you do, HEAD
will be re-pointed at the new branch.
C1 <- C2 <- C3 <- master <- HEAD
Committing on a Branch
Every time you perform a commit, the current branch pointer will be advanced (re-pointed) to that new commit.
C1 <- C2 <- C3 => C1 <- C2 <- C3 <- C4
^ ^
master <- HEAD master <- HEAD
Creating Branches
You can create as many branches as you want, whenever you want, pointed at any commit you want. Just specify the commit when creating the branch. (The default is to create a branch pointed at your current commit.)
Let's say you're on the master
branch at C3 and want to branch off from there to work on a feature, so you create a branch called feature1
and switch to it.
(These kinds of branches are often called "feature" or "topic" branches.)
C1 <- C2 <- C3 <- master
^
feature1 <- HEAD
You can make commits on that branch:
C1 <- C2 <- C3 <- master
^
C4 <- feature1 <- HEAD
You can switch back to master
and make more commits there:
C1 <- C2 <- C3 <- C5 <- master <- HEAD
^
C4 <- feature1
Then do some work on yet another feature:
C6 <- feature2 <- HEAD
v
C1 <- C2 <- C3 <- C5 <- master
^
C4 <- feature1
And so on.
Merging
Three-Way Merge
In general, when you merge branch B into branch A, Git:
- finds the nearest common ancestor, which is the point at which the branches diverged (this is why it's called a three-way merge)
- applies the changes made along branch B since that point to branch A
- if a conflict occurs, it pauses to let you resolve it
- finally performs a commit on branch A (called a merge commit) that has pointers to both previous commits (one on each branch)
If we merge feature1
into master
:
C6 <- feature2
v
C1 <- C2 <- C3 <- C5 <- C7 master <- HEAD
^ │
feature1 -> C4 <────────────┘
Fast-Forward Merge
If we instead merge feature2
into master
, it will result in a fast-forward merge, which is a special case that is much simpler to handle than the general case.
Because the branch to be merged in is directly ahead of master
(there have been no changes on master since feature2
was created), Git can simply "fast-forward" master
to point to the same commit as feature2
:
feature2
v
C1 <- C2 <- C3 <- C5 <- C6 <- master <- HEAD
^
C4 <- feature1
This results is a simple linear history, unlike the more general merge commit. Some people prefer that, which is why they use rebasing instead of merge commits, but we will not be covering that here. (It's a matter of taste, and rebasing is an advanced topic that we can safely skip for now.)
At this point, you can safely delete the feature2
branch without losing any work, or continue to make more commits on it and merge it again later.