Finding your Workflow - aakash14goplani/FullStack GitHub Wiki

Topics Covered

Workflows and Pain

  • Git toolset is so powerful and flexible that you have too many options sometimes and that can cause a few headaches when multiple people come together to work on a shared project. To avoid those headaches, you and your team need to take a few decisions about how to work together and about three things, in particular.

    • First, you should decide which kind of distribution model to adopt, how many repositories do you have, and how do they interact. Do you maybe have one share the repository that is visible to all developers? Maybe many share the repositories. Can all developers push their commits to the shared repository or do some developers have a read only access and so on.
    • Also, you need to take decision on branches. Which branches do you have in your project and what do you use them for? Which branch do you use to integrate your work with the work of other developers? Which branch do you use to package a release from and the like?
    • And finally, you need to define more general constraints for everybody who contributes to the project. These are all those additional rules that don't really belong to either the distribution model or the branching model. Like for example, when you get a bunch of new commits from a remote, should merge those commits into your own repository or should you rebase them? Can you push your code to a remote if the tests are broken or should you make sure that the code is stable and working before pushing and so many other possible decisions.
  • So distribution model, branching model, and constraints. These three things together define what you can call a distributed workflow and that's what this model is all about, distributed workflows, how to use Git in practice on the real world problem.

Distribution Models

  • The first thing that you should decide when you set up a Git workflow, the decision that influences most other decisions down the line is your distribution model.

1. Peer to Peer model

  • The simplest distribution model is probably the peer to peer model.

image 1

  • You mention that you are a developer in a team of three. All of these repos contain the same project, of course, that they were originally cloned from each other. But they do divert just the developers commit to their own copies of the repo.

  • The trick here is that each developer can see the other developer's repos as the remotes. So if you are Nick in this diagram and you know that Ralph here has some new commits that you want, then you can just go out and pull those changes into your repo. And if you know that there is new stuff in Jane's repo, then you can pull in those changes as well. And the other developers can do the same, so change is spread around like that.

  • Now it's true that a peer to peer model is simple, but it's not necessarily easy, specifically, because no repo is any more important than the others, that's the definition of peer to peer, right.

  • It can become a challenge when you want to do something as simple as releasing the project, for example. Then you need to decide which repo to release from and each repo might contain slightly different stuff, so it might be hard to decide which one is right, so to say. And if that can be hard with three developers, it can be much harder with four or five.

  • For this reason, unless your project is literally two people in a room, you probably want to take a few additional decisions up front.

2. Centralized Model

  • In particular, you might decide that one of the repos is special. It's blessed as they say. This blessed repo is accessible to everybody in the project, both for pulling and pushing. So you can commit data to your own local repo, but you also have a remote that is pointing at the blessed repo. Now you can push data to that remote.

image 2

  • Most developers call this remote origin by convention because that is the default name that Git gives to the remote that you clone the project from.

  • The blessed repo is often bare repo, which in Git terms means that it's just a repository without a working area or an index. Nobody's working on that machine directly. It's just used to share data and maybe to host a build machine, the system that runs the unit tests, packages, releases, and so on.

  • So let me put a robot in here to mean that this is the shared build machine repo, not a repo where a human is working on day in and day out.

  • Now in this module, you don't access your teammates repos and see their data like in the peer to peer model. Instead, you only care about the data on the blessed repo. That's the official state of the project so to say.

  • So everybody will pull his or her data from the blessed repo and the developer who has new commits, like Ralph in this case, must push those commits to the blessed repo so that the rest of the team can pull them.

  • Everything is centralized. Indeed, you could just call the blessed repo the server and the module itself is often called the centralized model.

  • Essentially, the centralized model is the same model that people use with non-distributed versioning systems like Subversion or Team Formation Server. Most companies these days have replaced those versioning systems. They switch to Git for a number of reasons, but they don't necessarily change their distribution model. Many of them still use a pure centralized model.

3. Pull Request Model

  • Then there is yet another model that is twist on the centralized model in that you still have developer repos and you still have a central blessed repo, but most developers cannot write to the blessed repo directly. They cannot push to it. They can only pull from it. Only one person or a few people in the project have that power to push things to the blessed repo. They're usually called the maintainers.

image 3

  • In this example here only Nick can push to the blessed repo and the other developers, the ones who don't have push access to the blessed repo, are sometimes called the contributors. I made their arrows gray to mean that those are read only or pull only connections.

  • So the team still uses the blessed repo as a communication hub and when a commit is on the blessed repo, then it's official and everybody will pull it eventually.

  • Now let's say that Jane has new commits to contribute, but she's not a maintainer. She doesn't have the power to push to the central repo. So how can she contribute those commits to the project? The trick here is that the maintainer can also see the contributor's repos. So Jane can say hey Nick, please pull my changes, see if you like that, and feel free to push them to the blessed repo. Maybe Jane can just walk to Nick's desk and tell him about the changes. Or maybe Jane and Nick don't even share the same office, so she must let him know in some other way, for example, by sending him a message that says, look I have new commits in my repo that you might want to pull. This message could be a mail, for example, or it could be managed by some kind of service, such as GitHub, and this is usually called a pull request.

  • Now Nick got the pull request, so he knows about Jane's changes. If he is okay with those changes, he can pull them into his own repo, solve any merge conflicts he might have, and then push them to the blessed repo and now Jane's changes have been made an official part of the project and other contributors like Ralph can also pull down.

  • The pull request is the defining mechanic of this model, so you can call it the pull request model. Git has a few features that make it easier to prepare a pull request. There is actually a Git request pull command. But actually sending the pull request, that's not a Git feature. You have to use old fashioned mail or some other means of doing that. And that is one of the reasons why services, such as GitHub is so popular because they automate the sending of pull requests so much so that the pull request has become the defining feature of GitHub and the pull request model has become the most popular model for open source development.

  • The most significant advantage of the pull request model is that it helps manage trust. You can get contributions from trusted sources like Nick, in this case, and also from less trusted sources like Ralph and Jane and this is necessary in open source projects, of course, because in open source, anybody on the internet could a contributor and you probably don't trust the entire internet to push directly to your project's repository.

  • So you can have a few trusted maintainers and any number of contributors. But this trust management concept also comes useful for closed source projects, whenever you have a project to where you don't want to grant push access to the mail right before the entire development team. In that case, you can appoint a new terminal maintainer, maybe call her an integration manager and the developers must ask to disperse on all their changes.

  • One last thing about the pull request model before we move on. If you look at projects on GitHub, they use these model with slightly more complex variation. The added complexity here is that on GitHub and other services like GitHub, all these repos are actually in the cloud, so the developers are not working directly on these repos.

  • Instead, each developer has two repos, a private repo on their own computer and the public repo in the cloud. So the concept is pretty much the same as the basic pull request model, but the mechanics of pushing, pulling, and the like are slightly more involved.

4. Dictator and Lieutenants Model

  • In the pull request model, you have one blast directly for the entire project with one or more maintainers who can access it.

image 4

  • Here, I'm showing the blast repo and Jane is the maintainer. In this other model, however, the project is split into subprojects and each subproject has its own blast repo and its own maintainer or maintainers and there they are the general contributors.

  • In this model anyone can pull data from pretty much anywhere, but only the maintainers can push data to the blast repos. So here we have three levels. You might have even more if you want to have multiple layers of subprojects and there are pull requests, of course.

  • Right now, contributors generally send PRs (pull request) to subprojects and the subproject maintainers send PRs to the main project maintainer and everybody pulls data to their own repos, so data spreads upwards in response to pull requests and it spreads downwards as people pull it from the upper levels.

  • This model is used sometimes in very large projects that are too big for single team of maintainers to handle. The classic example is the Linux caret now. In Linux, the subproject maintainers are called the lieutenants and the global maintainer is called the benevolent dictator. So you can call this model the dictator and lieutenants model. It's also popular in a large enterprise companies who have huge projects or sometimes just really like hierarchies.


  • So to recap, we've seen four distribution models,

    • the peer to peer model, now blessed repo, now centralized control is purely distributed.
    • The centralized model is like to be shown non-distributed configuration management. You have a central blast repo server and everybody pushes to that server.
    • The pull request model still has a centralized repo, but most people can only pull from it. Someone who has the rights to push to the central repo must pull changes from other repos in response to some kind of pull request.
    • And the most complicated model is dictator and lieutenants where you have multiple subprojects, each once is like a pull request model project, and then a higher level, or maybe even multiple higher levels of integration across the subprojects also based on pull requests.
  • One important point to close, remember that these are patterns. They're not recipes for how to structure your own project. So you might want to have a mixed approach in your own project, many projects do that. Maybe most developers are in need of project to work in a centralized model, except for that offshore development team who's following the pull request model and sending pull requests to the internal developers or maybe you work centralized, except that sometimes two developers are working together on a feature and then they choose to synchronize their repos as if they were working in a peer to peer model. That's fine. You don't need to stick with a model religiously. Use whatever works for you. These names I gave you are just labels. They are useful to conceptualize and discuss your options. So for example, you can easily say we're using a centralized model in here and your teammate can reply, no let's go for a pull request model. That's the point of patterns.

Branching Models

  • Right after the distribution model, the second important element that we'll share the Git workflow is some kind of policy managed branches. Every project has such a policy whether or not it's an explicit one. So let's look at a few common patterns for branches.

  • First, let me make a distinction between stable and unstable branches. A branch is stable when the tip of the branch always contains a working version of the project. That is the test are green, there are no known show stopper bugs that's all. In here, you could just package whatever is on a stable branch and reuse it. I will use this green checkmark to mean working version of the project. So if any someone adds new commits to the branch, for example, by pushing them, the tip of the branch still contains a working system.

image 5

  • In an unstable branch, you don't necessarily have that. When somebody pushes one or more new commits to the branch, the tip of the branch might be working or it might be broken. There is no guarantee.

image 6

1. Integration Branch

  • Almost every project has a main branch that you use to put everything together. This is usually the branch that people consider the most important branch in the system. People might work on other branches, but the other branches stand to branch out from this main branch.

  • They tend to stay reasonably aligned with the main branch and they ultimately tend to flow back into the main branch. By using different colors for the commits here to make the segments so branches stand out from each other. The colors don't have any specific meaning. These all impart a branch, the one that we were red in the picture is usually called the master, but whatever its name, you can call it the integration branch because it's the place where things come together and usually the place where you solved the conflicts that you might have when things come together.

  • People also call it the main branch or the main line sometimes. or the development branch, or just the master branch, but we have to pick one name, so I will call it the integration branch.

  • Now is the integration branch stable or unstable? Well this depends on the project. In most cases, the more stable the integration branch is, the better. This is the central branch in the project after all. Mostly everybody is working on it and nobody likes to work on an unstable code base.

  • On the other hand, it's hard to keep a branch stable when you are constantly integrating your stuff over it, so in practice and most projects seem fairly mostly stable integration branch. Actually, that's what build machines are for. The primary job of a build machine with Jenkins or some other kind of automated build system is it checks whatever is on the integration branch, probably it runs the tests on it, and it tells you whether the current build is working or broken.

2. Release Branch

  • Another important question on most projects is which branch do you release your software from? At some point, you have to deploy the software to a web server or maybe package it and distribute it on an app store or whatever your distribution method is. Some projects do that from the integration branch, maybe putting a tag on specific commits to mark the points where they are released from.

  • Other projects prefer to have a separate branch for releases, a release branch. What's the point of having a separate branch for releases? Well there are a few advantages, but most obvious one is that you can keep the code in that branch more stable than in the integration branch. For example, you can merge the integration branch and the release branch only after checking that it's stable.

image 8

  • Essentially, a separate release branch provides a buffer to keep releasable changes separate from not yet quite or releasable changes. I talked about a release branch, but in some projects you have to maintain multiple releases at the same time, and in that case, you might have multiple release branches, maybe they branch out from the integration branch at the moment when a release happens and then they proceed onwards.

  • If you have to add specific documentation or fix a bug on release 1.1, but not on release 1.2, well you have a specific release branch to work on then.

3. Feature Branch

  • There is another type of branch that is a staple of many, many projects. Let's say that you have two developers working on two different features. In some projects, they would both push directly to the integration branch by using two different colors for the two features here.

image 9

  • With this way working, you can have very frequent integration, which I like in general, but sometimes, it can be hard to do right. And besides, your history becomes hard to make sense of because the commits belonging to different features can be all interweaved as in this case.

  • And with this approach, you also have to lead with half-developed features on the integration branch most of the time. So for example, right now maybe the green feature is done, but the blue feature is still ongoing, so you have half a blue feature on the integration branch.

  • One possible determinate is to create a new branch for each feature, so for example, one branch for feature A and one for feature B. And as people start pushing to those branches, they diverge a progress in parallel until they eventually flow back into the integration branch. In this case, we're integrating by merging, but you can also rebase if you wish. And after this, you can delete those branches if you wish or leave them there for future reference. These branches are called feature branches.

  • Some people prefer the main topic branches, which is also good and we'll use feature branches here just because it seems to be a tad more common.


  • A lot of projects use feature branches. We've seen a few styles of branches already and you might think that once you have an integration branch, maybe one or more of these branches, and a bunch of feature branches, you have all you need. But there is an additional issue that can result in even more branches for some projects.

  • Imagine that you have a situation like this with a couple of branches. Let me give them names so it's easier for us to reference that. It doesn't matter which branches they are exactly. What matters is that these branches have been diverging for a while, but branch one has one commit, this red commit here, that you also want to have on the branch 2.

image 10

  • You don't want any of the other commits from branch 1 and branch 2, just this one commit. So how do you do that? How do you copy a single commit from one branch to the other?

  • Well Git has a special command that is just about that and it's called a cherry-pick. You can cherry-pick a single commit or a few specific commits from a branch and copy them on top of another branch. Problem solved, except for one detail.

  • A cherry-pick is just like a tiny rebase. It's the rebase of a specific commit and some projects don't like rebases. They want to use merge everywhere instead. So cherry-picks are not an option for merge-based projects.

4. Hotfix Branch

  • How can you still have the same commit in two separate branches without cherry-picking? Well there is another solution to the problem. When you want to have the same commit shared by two separate branches, just have a third branch and you place that shared commit on the third branch, and then you merge the third branch into both the first and the second branch like this and that's it.

  • Now you have one place to put shared commits and you have just merges, no rebases. This is actually a common situation, especially when you have a bug fix. Let me change the names of these branches. You have a release branch and an integration branch and they've been diverging. You just found a nasty bug in your latest release and you want to fix it immediately and prepare another release, but you also want the same bug fix to be on the integration branch so that you will have the bug fixed in the next releases as well.

  • Well just put the bug fix on another branch called say hotfix and then merge hotfix into both the release and the integration branch. Now this fix is in the history of both branches. That's about what? You kept the branches separate and you still shared data between them.

  • There are also other use cases for the coupling branch like this, but bug fixes are a common one, so don't be surprised if you see projects that have so-called hotfix branches for this kind of stuff. So let's see.

Constraints

  • The third and last element of a distributed workflow after the Distribution Model and the Branching Model is that of Constraints.

  • Every project I worked on had its own rules sometimes very specific and bit of surprising sometimes, and so I cannot show you any specific patterns here because there are so many possible constraints, but I can come up with a few examples of constraints that I've seen in real life projects.

  • One of the most common examples, the choice between merges and rebases. In some projects, people prefer to always merge. So if you just finish working on a feature branch, for example, you merge it into the integration branch. On the other hand, some projects would've rebased the entire feature branch on top of the integration. It's up to you whether to use merge or rebase, but you should probably try and be consistent about that and you probably don't want half the team using merge and the other half using rebase. That would be confusing. So this is usually a project-wide decision. You probably want it to be an efficient constraint.

  • Another question that is important to some projects is who can do what to which branches. Maybe some developers are expected to only commit to some branches and not others or maybe a project is using tags under release branch to mark a new release, so only the person in charge of releases can tag the release branch. Other developers should avoid doing that, even though, technically they could do that.

  • Here is yet another example. This one is not necessarily common, but it's interesting. I was working on a project with a lot of developers doing continuous integration and the build machine run in the unit tests on the integration branch. Every now and then, the test could be broken, they could be red. So we had the rule in place that said if you see that the build is red, then stop pushing to the server. Wait until the person who broke the build or realizes that there is a problem so that she can fix the build and make it green again. Otherwise, if you keep pushing, you will make this person's job really hard. You will change the code under his feet and the three or more builds. So that was the constraint. Don't push when the build is red. Wait for it to be green again instead. The interesting part was that sometimes people forgot to check the build before pushing so they violated this constraint, even though they didn't really mean to. So the team did something smart. They used a feature of Git that is called hooks. It's basically a way of running a script whenever a specific event happens. So they had this hooking server that started script to whenever somebody started a push to server and the script to check the build machine, and if the build was red, the script gave you a warning and the opportunity to abort the push.

  • Here is another decision, which some teams choose to regulate. What kind of history refactoring should you do on feature branches before you merge them into the integration branch? Some projects like to squash the entire feature into one single commit, for example, while other projects prefer to keep the small highly granular commits in the main line history.

  • The point I'm making is most distributed workflows do have a few constraints and there are so many diverse constraints, so it's good if you make your project's constraint explicit when you describe your workflow.

We Need to Talk About GitFlow

  • Details by Vincent Driessen

  • In a nutshell, Gitflow was based on the centralized distribution model. Every developer can push to a central blast or repository. But the Gitflow also encourages developers to Exchange data directly when appropriate, so this is a mixed distribution model, centralized with peer to be your elements.

  • The core of Gitflow is its branch model, which is very detailed. It defines a number of branches. To understand Gitflow branches, you can broadly partition them into two groups, the unstable branches used for development work and the stable branches used mostly for releases. And most the unstable branches you have an integration branch called develop and the feature branches, one per feature.

  • In the second category, stable branches, you have another integration branch called the master. This is different from develop because develop is not stable in general while master must be stable. So you only merge develop into master when you know that you have a working system. And then you have the release branches, a separate one for each release, and you also have a hotfix branch.

  • And then, there are the constraints, quite a few of them. They mostly define which branches can branch off from which other branches and which branches can merge into whichever branches. By the way, I say merge because it's always merging, never rebasing. Gitflow believes in maintaining a truthful and trackable history. You are not supposed to change that history by rebasing.

  • And then there are morals about what tag, and when, and so on.

  • Gitflow also gives you the naming conventions that you should use for some of your branches. All these constraints mean that Gitflow is pretty tightly defined and this procedure has some definite advantages. For example, you can find Git extensions on the internet that provide specific commands for Gitflow.

  • Operations like creating a feature branch or merge and hotfix are well-defined enough that they can be automated. At the very least, the procedure makes for a very good documentation. These are additional reasons why Gitflow became popular. In any case, it is very popular. It became the go to Git workflow for many people and many teams just adopted as is.

  • Imagine that your product is a web app rather than a packaged application, so you only maintain one production release at a time. Then maybe you don't need multiple release branches. You can get away with one. Or maybe some of the rules in Gitflow might be counterproductive for you. Maybe you are on a cutting edge project that does continuous deployment. It deploys to production every time someone integrates a stable feature and all those branches introduced too much indirection from integration to deployment.

  • Some of the constraints in Gitflow can even encourage damaging behavior in projects that don't fit them. For example, Gitflow mandates feature branches and you are supposed to merge a feature branch into develop once the feature is done. But some big legacy projects can have dozens of features in development at any given time and each feature can take months to be implemented and it tends to contain a lot of code. So when this process is done and you finally merge a feature in the develop branch all at once in a single big merge, that can cause huge conflicts for other people in the team. In such projects, I would encourage people to integrate more often.

Growing a Workflow

  • How do you come up with your own workflow? Doing that is more art than science, so there are no hard and fast rules, no recipes, but I can give you one important guideline here and that is avoid the temptation to just sit down and design your workflow.

  • That approach tends to generate an overdesigned, over complicated workflow that still doesn't address the specific problems that your project might have in the future. Even if you are very smart, it's still hard to forecast every possible situation that your project will get into, and even if you are quite experienced, it does help, but it might still not be enough because every project is different often in subtle ways, so your past experiences might actually mislead you.

  • So instead of designing a workflow, you should strive to grow your workflow. Start small. I'm thinking really small here. Something like this. The details don't really matter here. This is just an example what I mean by small. Distribution model, centralized model in this case, but pick whatever fits the size of your project. A simple branching model, not complexities unless you know for a fact that those complexities are needed. And a handful of constraints. Three constraints like in this example could actually be enough. You can always set more constraints later. Maybe you might need more than this if you have a big project or if you work in a very structured traditional organization. But in any case, try to stay on the small side.

  • You might find that your very simple workflow is almost all you need actually, and when it proves insufficient and you find out that you need something more complex, well just add that complexity as you go.

  • You can add constraints, you can add new branches, incorporate ideas from other distribution models. That's what I mean when I say grow your workflow. Make it evolve to fit your project.

  • Only add the rules in response to real problems that become visible and always be ready to remove rules that are not having a positive impact on the project.

  • If you want complex intelligent behavior, then come up with simple principles because simple principles give people the space they need to be flexible and smart to solve the unexpected problems that always arise in a complex environment. By contrast, the complex rules and complex regulations create bureaucracy, they create an environment that is unfit to problem solving.

⚠️ **GitHub.com Fallback** ⚠️