Architecture - arxanas/git-branchless Wiki

Crate documentation: docs.rs badge

git-branchless is implemented in Rust. The package is called git-branchless, and it is implemented in a Rust crate called branchless.

Aside: in retrospect, the crate name branchless could be confused with some kind of library for high-performance branchless programming. Unfortunately, this library only aids high-velocity software development.

This page is intended for git-branchless developers or curious users. The main concepts are implemented in branchless::core.

Event log

git-branchless watches for events in the Git repo by installing various hooks (see branchless::hooks). These hooks add events to the event log, which is an ordered sequence of events stored on disk in a SQLite database. See the Event documentation for details about the types of events which can be recorded.

At present, on startup, git-branchless loads all events into memory, and then replays them to determine the current state of the repository (see EventReplayer). This could be slow if the user has done many operations and the event log is long.

Undo implementation

The undo feature is implemented by taking recent events from the event log and then applying their inverses. For example, if a commit A was rewritten to B, then the inverse operation is to rewrite B to A.

This might not be the best implementation, since some inverses don't make sense. For example, if the user rewrites a draft commit A into its upstream version contained in the main branch, should the inverse really rewrite a main branch commit into a draft commit? That results in the case of main branch commits being obsolete.

It might be best to introduce a dedicated "undo" event type, rather than attempt to invert previous events.

Checkpoints

Not yet implemented: To avoid performance problems when the event log is long, it should be possible to add "checkpoints" to the event log. A checkpoint would be a synthetic event that contains a copy of the repository state. Rather than replay all events in the event log, we can find the most recent checkpoint, load the repository state, and replay events only from that point. In this way, we can arbitrarily bound the number of events that need to be read and replayed in the worst case.

So far, I haven't hit performance problems with a few thousand local events, so I haven't prioritized this.

Comparison with the reflog

Git has a concept called "reference logs", or "reflogs" for short. A reflog is a history of events that happened to a single Git reference. This is pretty similar to our event log. In fact, the first version of git-branchless attempted to infer the repository history from the reflog for HEAD.

So why don't we use reflogs? Unfortunately, they have a number of shortcomings:

Related work

The reader might also be interested in Jujutsu, an experimental Git-compatible VCS which also has an "operation log".

I'm not aware of other source control systems which also use a general-purpose event log. Please update this section if you know of another one.

Commit evolution

git-branchless implements a basic version of Mercurial's Changeset Evolution feature.

For the implementation details, there's a good technical document here: https://www.mercurial-scm.org/doc/evolution/concepts.html

Normally, when a commit is amended or rebased, the result is an entirely new Git object, which has no direct relation to the old one. By leveraging the event log and the post-rewrite hook, we can record these relationships.

These are the important situations:

Recording these events allows us to update the smartlog with the latest version of the commit, as well as undo these operations in a principled manner.

Segmented changelog

As of https://github.com/arxanas/git-branchless/commit/f6c540fea8392223d604c4994b081b603b3df850, the commit graph is based on Eden SCM's segmented changelog data structure. See the thread at https://github.com/quark-zju/gitrevset/issues/1 for more details. Some resources: