Presentation - notify-rs/notify GitHub Wiki

~~Notify vNext's first alpha is out.~~ ~~Notify vN's beta is out.~~ ~~Notify vN is out.~~ None of these yet, this is a draft.

If you don't particularly care more about why vNext is and why I think it's cool, then just know that it's easier for you to use, better for your users, and vastly more maintainable for me. However, it is a fairly severe break, so upgrade may not be simple. Details are on the repo and extensive documentation.

🚨 This is the first alpha release, so you may try it, but I recommend not to use it in published code yet, unless you a) like living on the edge, b) don't mind unfinished or broken parts, c) want to help develop it. The backend interface is expected to be close to stable, and other components will get there as more alphas ship.

🚦 This is the beta release. All interfaces are considered to be ready, and unless critical defects are found, this is what will ship. Consider this the final trial. If you're starting a new project, I recommend using it.

~~💚 This is the stable release. Semver is now in effect, and confidence is high. If you were holding out before, now's the time! 🚀~~

If you're interested in more or merely curious, though, read on!

The background for this is that filesystem notification is a very diverse area. There exists about a dozen different kernel modules that fit the description, and several ways to achieve it without special access, and several other ways to tap into special filesystems or filesystem-like structures. Each of these was designed for a (sometimes wildly) different purpose. Interestingly, very few were designed for the purpose of “watching a file tree and doing something when one changes.”

FSEvent, for example, is a tool for archival and indexing. It was designed and built for two macOS systems: Time Machine and Spotlight. Many of its features and behaviours are wholly unsuited for general file watching (and yet that is what is used by most filesystem notification libraries and tooling out there). Rather, it is meant to be queried or streamed at long-ish intervals and used as an indication that something somewhere has changed, and that the consumer should rescan, reindex, rebackup, etc the whole thing.

Fanotify, largely hailed as the “successor” to inotify, is only incidentally useful for filesystem notification tasks: its main purpose and design is to intercept access calls to files and let a userspace daemon allow or deny those accesses at its leisure. The Linux Audit system is also incidentally useful, and was designed for, well, auditing.

Kqueue and kevent and such are general kernel object watching mechanisms. To watch a tree, one opens a handle (which is a kernel object) for every single file and directory the tree contains and places a kevent watch mask on it.

Even systems designed and purposed for file tree watching are amazingly different in how they do it, how they behave in various interesting cases, and how they report back.

Everything makes it hard to abstract the systems into something remotely coherent. (Do we need something coherent? Of course we do. We want to do things when files change, not care about all this trivia.)

Fortunately, filesystems are mostly similar, from the end-user’s perspective. They have files and folders. Files can be read and written, and sometimes executed. Files and folders have names, and some amount of metadata. Files and folders are created, modified, deleted, accessed.

So the foundation of Notify vNext is recognising this truth and redesigning the event system from that standing. Notify events have a kind, which is a hierarchical classification of both what generally the event is, and what exactly it’s about. Three examples:

Modify(Data(Size)) tells us the data of an object was modified, and we know that because its size changed.
Create(Folder) tells us a folder was created.
Remove(Any) tells us an object was removed but we don’t know specifics.

That classification allows a consumer to quickly filter what they’re looking for, as grossly or precisely as needed, while allowing producers to describe events as precisely as they can… but no more precisely than that.

Notify events also carry the path the event concerns, and an arbitrary metadata bin to store related rich information, where available (such as a reference to the process that made the change, how the event was collected, or additional known precisions to the event that don’t fit in the classification).

That’s the event problem solved. The second problem the current/previous Notify and many other such wrappers have is fallback. When the platform doesn’t have a native API to gather the relevant events, we must fall back to polling. That is simple enough to do. However, a related issue is runtime fallback: what if we know that the platform has a native API, but upon querying it we observe it’s not available, or at capacity, or some other thing makes it useless for us?

This is a frequent issue with inotify, because the number of watches is limited, and that limit is fairly low by default (to keep kernel memory manageable). Right now, consumers look for that error themselves and fall back to polling on their own. Often, they fall back for the entire set of paths they want to watch.

A more clever approach, and that is what Notify vNext does, is to manage the selection of event sources (“backends”) internally and not bother the user unless it really is impossible to watch a path. Notify itself watches for that error and falls back.

And that opens up interesting avenues: for one, there’s no need to fall back for the entire set of paths we want to watch. If inotify has enough capacity, we can use that for a set of paths, and use polling for the remainder. For another, we can use more than two backends at once. macOS has two kernel APIs. Linux has a staggering five. They all have different capabilities and restrictions, but if they’re available, nothing stops us from using them all, at once. For last, and as an example, inotify being at capacity now does not mean it always will be: we can check again later and switch back some of the watch set to the more efficient backend as it becomes available.

(Notify currently does some of the first, and some of the second, and the third not actively, but all that’s there to explore further in the future.)

The next two problems are solved together in my design. One: because different backends have different capabilities, we need some way of bridging the gap for the missing ones, in order to provide a coherent experience. One point five: because we might have several backends in play, whatever solution is used needs to apply only to those backends that need it! Two: event debouncing, where similar events close together are held back to avoid hammering effects.

For this, I introduced processors. They declare which capabilities they require, and which they supply, if any. They have access to some of the internal state. They can ask to add or remove watches. And they can let go, modify, create, or discard events along the stream they're hooked onto.

A consumer can add in their own processors, or enable provided but optional ones, or bring in a third party’s. Notify manages the lot, weaving the streams correctly, splitting and recombining where needed, and maintaining the watch list.

That’s how it works. At the end of the day, it’s definitely a more complicated set-up, but it delivers:

A more cohesive and less surprising filesystem notification interface.
Lots of potential. There’s many super-exciting things to explore, that the architecture encourages, rather than stifling customisation or expansion.
Maintainability. It is more modular, and the pieces are less complex. A key driver for the design was how it compartmentalises domain knowledge and contains the effect any piece, special or banal, has on the rest.

I hope that was an interesting overview, and that you enjoy using Notify vNext!