Rebasing - aakash14goplani/FullStack GitHub Wiki

Topics Covered

Introduction

Branching and merging are standard features for any revision control system, but rebasing is way less common. Only a handful of version control system have it, and Git is by far the most popular of them. In a way, rebasing can been seen as Git's signature feature. Let's see how it works.

What a Rebase Looks Like

  • Here is our cookbook project again.
D:.
|   menu.txt
|
\---recipes
        apple_pie.txt
        README.txt

There two branches- The master branch, the other branch, spaghetti.

  • We have two branches that diverged. To make diverge simpler, I also used different colors for the commits in the two branches. The apple pie commits are yellow, and the spaghetti commits are blue. Also, because spaghetti is the current branch, I drew it into green instead of drawing a separate HEAD pointer.
    image 1

  • Now, we are on spaghetti branch and we want to put the content of the two branches together. We already know one way to do this. We can merge the two branches. We are already on the spaghetti branch, so we could easily merge it with master.

$ git merge master

I will not do this, however, but if I did, here is what would happen. We would have a new commit, and the parents of this new commit would be the former tips of the two branches. Also, the current branch would move to this new merge commit. This is the usual merge thing that we already know about. In this case, it should also be an easy merge because we're not expecting any conflicts.
image 2

  • However, I will not complete this merge. Instead, I will use another way to put the two branches together. I will rebase the current branch over the other branch. If we rebase spaghetti over master, then here is what happens.
    • Git looks for the first commit in spaghetti that is also a commit in master. It's this commit here. This is the base of this spaghetti branch.
    • All the history before this commit is already shared between the two branches, so it's not relevant here.
    • Now Git detaches the entire spaghetti branch from this commit and moves it on top of master, so it changes the base of this branch. That's why it's called a rebase.
ADMIN@AAKASH-PC MINGW64 /d/GitProjects/chapter-3 (spaghetti|REBASE 1/1)
$ git rebase (OR git rebase --continue)
Applying: Boiled eggs added to menu.txt

image 3

  • Like in a merge, we might have to solve conflicts to complete the rebase.

  • Now the spaghetti branch contains all the commits from the master branch plus the spaghetti stuff, which is what we wanted. What happens if we want it to work the other way as well and we want the stuff from spaghetti in the master branch? Just like in the merge, we can just checkout master and rebase the other way. Let's checkout master here. Master is the now the current branch. It switched to green in the diagram. And now let's rebase. Actually, in this particular case I could either rebase or merge, and it would make no different whatsoever.

ADMIN@AAKASH-PC MINGW64 /d/GitProjects/chapter-3 (spaghetti)
$ git checkout master
Switched to branch 'master'

ADMIN@AAKASH-PC MINGW64 /d/GitProjects/chapter-3 (master)
$ git rebase spaghetti
First, rewinding head to replay your work on top of it...
Fast-forwarded master to spaghetti.
  • In both cases, Git can just fast-forward branch. A rebase can be fast-forwarded just like merge.
    image 4

  • So this is what we have now. Just like in a merge, we have all the commits that deal with the spaghetti and all the commits that deal with the pie in the same history. However, different than a merge, we got that result not by letting multiple branches flow together, but by rearranging the branches so that they look like one single branch.

An Illusion of Movement

  • I didn't tell you the whole story about rebases. Let's take a small step back. I told you that when you rebase Git detaches the current branch from its base and moves it to the top of the target branch. But actually this process cannot happen literally like that. That would be impossible in Git.

  • You cannot detach a commit from another commit and move it elsewhere because commits are database objects, and database objects are immutable. If you change anything in a commit, then you get a different hash, a different SHA1, which means a different commit. And if you want to move commits around, then you must change at least one piece of data inside the commit, its parent, so you cannot do that.

  • The parents SHA1 is stored inside the commit, so the commit data must change, and the commit must get a new SHA1.
    image 5

  • Now that this commit has a new SHA1, this other commit also has to change because its own parent has changed, so it gets a new SHA1 as shown for all the commits in the branch. image 6

  • So Git cannot just move the commits. The commits in the rebase branch must have different SHA1s, so there must be different objects in the database. In other words, new commits, and indeed that's what they are.

  • Here is how rebasing really works. When you rebase, Git makes copies of the commits. It creates new commits with mostly the same data, actually exactly the same data except for their parents. So these new commits look almost exactly like the original commits, but they are new objects with new SHA1s, so they are new files with new file names in the database directory. image 7

  • And finally, Git moves the rebase branch to the new commits leaving the old commits behind. Keep this in mind because as we will see in the rest of this training sometimes rebases can be tricky, and you can avoid some confusion if you remember that rebasing is an operation that creates new commits.
    image 8

Taking out the Garbage

  • Rebasing copies the data in the old commits to create new commits, but what happens to the old commits then? That's an interesting question. It depends on the case. In this case here, these commits are not very useful. There is no branch pointing at them. So the only branch that was pointing at them has moved over to the new commits, so these old commits are impossible to reach, almost impossible, because there are a few ways to retrieve them.

  • For example, if you had written down their hashes, then you could still checkout them, but it's more likely that you will just lose track of them. So, why would Git waste disk space to keep around commits that cannot even be reached?

  • In fact, Git doesn't keep them around. Every now and then when you run a command that is likely to generate this kind of unreachable commits, Git takes some time to look at the objects in the database, identify unreachable objects, commits, but also blobs and trees in some scenarios, and delete them.
    image 9

  • So, if I keep working on this project and at some point in the future I look into the Git database, these commits might well have been deleted. This is a form of garbage collection.

  • In most modern programming languages, a value that can't be reached through any reference, for example an object that cannot be reached through any variable, is considered dead and removed by a garbage collector. Well, the same thing happens in Git.

The Trade-offs of Merges

  • Now we know what a rebase looks like and how it actually works under the hood; however, you might still wonder why rebases even exists. I mean we already have merging. Rebasing and merging seem to do something very similar. They both enroll existing commits in the history of a branch.

  • So, if I'm working on the apple pie recipe and I want to also get the spaghetti recipe, I can have both in the same history by merging or by rebasing. So why do we have two ways of doing something similar? The reason why we have both merging and rebasing is that they have different tradeoffs. Let's focus on merging first.

  • The whole point of merging is that it perseveres history exactly as it happened. In this case, for example, you can clearly see that the yellow commits and the blue commits were created independently, possibly in parallel, and then they were merged into one single timeline. If there were any conflicts during the merge, then this merge commit would include fixes to the conflicts.
    image 10

The Trade-offs of Rebases

  • Now let's look at rebasing. A rebased history looks really simple and neat. There is no reason for commands such as git log to arbitrarily squash commits into a single timeline because commits are arranged in a single timeline already.
    image 11

  • So, a project that uses a lot of rebasing generally looks more streamlined and clean than a project that uses a lot of merging, history-wise. Essentially, rebasing helps you refactor your project history so that it's always nice to look at.

  • This neatness, however, comes at a cost. This nicely designed history is not real. It was forced by rebasing, which is a distractive operation.

  • Rebasing creates new commits and leaves behind existing commits that might get garbage collected. So a rebase history looks cleaner, but it is a lie its own way. For example, in this case, it looks like the yellow commits were created first and blue commits were created later on top of them, but this is not what really happened. The yellow and blue commits were created in parallel in different branches. So in contrast to merges, rebases change the project history.

  • This might not sound like a problem at all. You might say who cares what the history looked like originally. Surely you only care about the final result. Well, actually there are a few situations when you do care about history. There are some advanced Git commands, for example, that become less useful if you tamper with project history. Also, changing history means creating new commits and moving branches, and there are some scenarios where all the trickery carries out in confusing situations, like multiple commits with the same commit message in the same branch.

  • Rebases make your history cleaner, but they can also cause unwanted side effects. If I had to condense the differences between merges and rebases in just a single recommendation, it would be this. When in doubt, just merge. Rebasing is a power tool. It is quite useful, but you should only use it if you know what you're doing and you understand the consequences.

Tags in Brief

  • Tags are one of the four types of objects in the database, together with commits, trees, and blobs.

  • In Git there are actually two types of tags.

    • annotated tags
    • The other kind of tag doesn't have a specific name, so people sometimes call them non-annotated tags or lightweight tags.
  • Let's create one. Let's say that I want to mark the current point in my project history. For example, let's look at the very latest commit.

$ git log -l

In this commit we have both spaghetti and an updated apple pie. Let's say that we want to tag this commit with a tag named dinner.

  • We could create an annotated tag. Maybe you still remember that we can do that with tag -a.
$ git tag -a mytag -m "myTag"

$ git cat-file -p mytag
object eff82d1e671ffac429f504ccf69a9927ffa139cb
type commit
tag mytag
tagger aakash14goplani <[email protected]> 1514321131 +0530

myTag

This tag would contain a lot of useful information such as the date that the tag was created, who created it, a description, and so on. However, in some cases I could decide that there is no reason to have all that information.

  • I might just want to mark this commit with a simple label for my own use. If that is the case, then lightweight tag is enough. I can create such a tag by skipping the -a option in the tag command.
$ git tag dinner

There, now we have a tag. There it is. I did not have to provide the message or anything.

  • Now, let's peek inside the .git refs directory. There is heads directory that contains the branches, and then there is a tags directory that contains the tags. There are two tags in there, the one we already had and the one we just created. They are two simple files that contain the SHA1 of an object in the database. See. A tag is a reference to an object, in this case a commit just like a branch. I could actually turn this tag into a branch just by moving it to the refs heads directory.

  • A lightweight tag contains the SHA1 of a commit. An annotated tag is similar, but it contains the SHA1 of a tag object in the database, and that object in turn is referencing a commit besides containing all the extra information like the tag description.

  • If tags look just like branches, then what's the difference between a tag and the branch? Simply enough, while branches move, tags don't. If I create a new commit right now, then master will move to track it because it's the current branch, but the tag will just stick to the same object forever. And that's all I had to tell you about tags.

⚠️ **GitHub.com Fallback** ⚠️