3. More on drift correction - MouseLand/Kilosort GitHub Wiki

NOTE: This Wiki is kept here for reference for previous versions of Kilosort. For updated information about Kilosort4, please refer to kilosort.readthedocs.io instead.

Here we get into the details of drift: why it happens, what it looks like, how you can characterize it for your own recordings, and how you can tell if Kilosort2 is fixing it. The examples in this section are from acute Neuropixels 1.0 recordings in head-fixed mice.

Why drift happens

Some amount of drift is unavoidable. The brain floats inside the skull, and moves when the animal moves, which can be very fast. At slower timescales, physiological changes happen that change the tissue or move it relative to the probe. Some of these physiological changes may be induced by the presence of the probe itself, for example due to an inflammatory response. Thus, there are two main types of drift, which we treat differently: slow (10s of minutes) and fast (10s of seconds).

Slow drift is not a huge problem if the recording is short (<10 minutes). Fast drift is not a huge problem if the animal is not behaving. However, neither is true in a typical neuroscience experiment: the animal is performing a task which involves some motor actions, and it takes at least 30 minutes to characterize the neural activity. In a typical recording of 1-2 hours, the amount of slow and fast drift we'd expect is comparable, and can be anywhere from 5 to 20um and more if your preparation is unstable.

What drift looks like

To recognize drift, look for changes in spike amplitude over time, or changes in feature amplitudes, where the feature can be the projection on the principal components of a channel. More dramatically, drift may be recognized by a cluster that "drops out" because all its spikes are lost when the drift is too large. Here is are three examples tracked by Kilosort2, all from the same recording. These clusters appear to have different amounts of drift, which is primarily because they have different sizes: the unit that drifts the most can only be seen on one channel. However, the moments at which drift happens for these units are similar.

Slow drift is generally easier to recognize and fix. A well-known version of slow drift happens after probe insertion in an acute experiment, which is why it is typical to wait 20-30 minutes for the tissue to relax before recording. However, even after that initial phase, there is a smaller amplitude, slow timescale drift that continues for at least a few hours, and has been reported to us even in chronic implants. Cumulative over time, slow drift can have a significant impact on spike sorting.

Fast drift is harder to diagnose but potentially more dangerous, because it may introduce behavior-dependent biases into the data. The bias may be produced if every time an animal performs a certain motor action, the probe moves a little, in which case the spikes from a small neuron may be completely lost. It will then appear as if this neuron was inhibited by movement. Conversely, a neuron may only come into the range of the electrodes during the movement, in which case it will appear as if that neuron is activated by movement. This behavior also makes fast drift harder to diagnose, because most neurons in the brain genuinely have movement-related spiking activity.

How to diagnose drift

The main way to diagnose drift is, in fact, to run the first step of Kilosort2, which produces an interpretable picture of the similarities between any two moments in the recording, based on the shape of all detected waveforms at those times. To do this, Kilosort2 splits the data into batches of size ops.NT, and for each batch performs a full spike sorting routine. This summarizes the spikes in that batch with a small (Nchan/2) set of templates, which we then compare with the templates in every other batch. The final result of this procedure is a large Nbatches by Nbatches matrix of similarities between batches. If the similarity between batches is high, that indicates little movement. If the templates vary a lot over batches, and in a structured manner, that is likely to indicate drift. Here is an example:

The spike sorting routine for each batch is a very fast, lite clustering algorithm, where the spikes are first detected as threshold crossings in PCA space, their PCA features are then extracted and the spikes are clustered with scaled k-means. To get enough information from a single batch, there should be many spikes inside that time interval. The default time interval is 2s, which may be too short for <32 channels, in which case we recommend increasing the batch size.

Did Kilosort2 fix my drift?

Kilosort2 does not explicitly register drift, the way you might register a calcium imaging movie for example. This is because the electrical fields are severely under-sampled by the channels on the probe. In a calcium imaging recording, you might have 10 pixels/dimension of an imaged neuron, but in electrophysiology, even on a Neuropixels probe you get a range between 1 and 5 vertical pixels per neuron (median ~2), and the finer-scale features of the waveforms are often even more spatially-localized.

To illustrate our approach to drift tracking, we first consider a simple situation, where all the drift is slow. In this case, we could just initialize the templates at one end of the recording, and then sweep the recording in time, adjusting the templates to keep track of the slightly-changing waveforms of each neuron. Even if the cumulative changes are large, it is easy to track the neurons if the changes are small everywhere, and there are no abrupt jumps.

But what if fast drift happens all of a sudden? We could reduce our timescale of tracking, but that strategy is brittle: a single mistake during a very fast drift and we risk switching tracks from one neuron to an adjacent one with similar waveform but offset in the direction of the drift. This problem is further compounded for low firing rate neurons, which may only fire a few spikes during an entire fast drift period.

The solution we found was to re-order the batches of data in time, so that consecutive batches represent similar absolute levels of drift. In other words, we wanted to order batches with similar drift positions close to each other, because that would ensure there is no single timepoint where a lot of drift happens very quickly. If the batch re-ordering is successful, we can then just go back to using our slow tracking strategy of incremental small updates to a cluster's template as we progress through the batches. Because the waveform changes smoothly as a function drift position, tracking failures can be dramatically reduced after time re-ordering. Here is what a successful re-ordering of the above matrix looks like.

Time re-ordering for fast drift

To find this re-ordering, we needed to develop a travelling-salesman like algorithm. Intuitively, the problem is easy to describe: can we reorder the rows and column of the matrix above, so that blue values (high similarity) are close to the diagonal while yellow values are far from the diagonal? One of the first things we tried was to just take the top principal component of the matrix and try to sort by its weights. However, it quickly became clear that the first PC only captured the mid-range of drift values in the data, likely because the drift was too large to be well approximated by a linear transformation throughout its range. We therefore needed to model quadratic and higher-order components, which is a significantly harder problem. Confirming the need for extra components, the similarity matrices had more than one significant principal component, and looked like a "curved manifold", in which similarities at long distances could not be linearly extrapolated from similarities at short distances.

Luckily, we were working on an algorithm to solve just this kind of problem, in a different context. That algorithm is called "rastermap", and it has its own repository here: https://github.com/MouseLand/rastermap. Long story short, we developed a version of rastermap that can correctly model the warped matrix of similarities between batches. The algorithm infers a single position for each batch along a 1D continuum, such that distances between batches are nonlinear functions of the Euclidian distance along that 1d continuum. Sorted by this position value, the similarity matrix typically becomes smooth, and the fast drift falls into place, like in the case above.

Notice that we have inferred a "position" variable without every explicitly modelling the probe geometry. This is great, because it doesn't depend on the specific configuration of sites you have. But it means we do not really have guarantees that this "position" variable has anything to do with drift. In practice, we think the inferred ordering is almost always drift, because drift contributes by far the most to the changing of the waveform shapes over time. Drift is just by far the most salient, coordinated modification to the shapes of the recorded waveforms. Even without guarantees that the inferred position is the absolute drift position, the re-ordering gets to the heart of the problem: nearby batches after ordering have similar waveforms. This lays the groundwork for our tracking algorithm to move through the ordered data and track the neurons.

Based on our experience, we believe the drift similarity matrix almost always clarifies what the drift was to begin with, and whether the re-ordering succeeded. Assuming a smooth re-ordering of the batches, the final results almost always look very good. If some jumps or discontinuities remain in the sorted similarity map, like in the example below, Kilosort2 may lose tracking over the jumps. Even jumps as abrupt as this example seem to be tracked reasonably.

Drift tracking improves cluster separation

As a result of drift tracking, units are no longer split into multiple small pieces like in Kilosort1 and other algorithms. Merging back together these small pieces was the most time consuming part of the manual curation in Kilosort1, which is no longer required. On top of the automation benefit, an additional benefit of drift correction is the overall improvement in cluster separation. Why would that be the case? Imagine two neurons that have very similar waveforms but are slightly shifted along the probe. As the probe drifts up, the bottom neuron starts taking the place of the top neuron, and since they have similar waveforms, it is impossible to tell apart spikes of the bottom neuron at this time from spikes of the top neuron before the drift started. Unless, that is, the spike sorting algorithm is aware of the time in the recording when the different spikes happened. Algorithms like Kilosort2, which track units as they drift, maintain a constant separation between two such units, because the separation is only for spikes at a particular drift position. Here is an example from a real recording.

First, the amplitude timecourses of two very similar units:

Now the PCA feature projections across all times, for the two channels where these units are biggest:

Finally, here are the projections on the time-varying templates of Kilosort2, showing that the units maintain a constant high separation throughout the recording: