Tracker Technology - jhhl/AUMI-Together GitHub Wiki

Tracker Technology

There are some trackers out there, not much better than my color tracker. Obviously we want a motion tracker as well, and a purely touch tracker. Future trackers will include ML recognition like faces and I hope a configurable tracker for gestures. Gestures may not involve motion at first. There may also be a fiducial tracker.

Each tracker takes in a frame from the Camera and returns a list of coherently tracked objects. the metadata will describe

which tracker
how many tracked objects
debugging frame with tracking points, colored areas, labels, and more.

for each tracked object:

ID (may be different from iota. This is to correlate objects that may disappear) most easy with fiducials, but also for multi faces or multi colors.
zo XY if possible. Motion tracker will infer this
relative motion from the last spot (more natural with motion tracker
angle, size, motion velocity, size velocity
confidence level?

Each tracker will need specialized parameters, which can change between tracks. This will meed to be sent in a message.

motion needs number of points, closeness, and sensitivity
color tracker needs the list of rgb4s of tracked colors, possible parms for cleaning up the image or color fuzziness
fiducial may need to limit which fiducials to detect
face may limit number faces (?)
face might go the other way and recognized faces and assign them IDs
gesture detect needs to produce ids and pseudo coordinates. there is no "motion" , but there may be proportions of confidence and the ability to create a pseudo coordinate derived from the coordinate.

Some trackers need a configuration phase (color, gesture, face rec) which is different from realtime processing parameters. These configurations should be maintained in a database like the color tracker does. The database needs full CRUD support and may be hosted in the cloud (?) or just locally in the cache. Tracker Configs should be able to be shared and imported/exported.

Gesture tracker

I want to add in a gesture tracker that would actually be simpler than the usual "detect limbs, train on limb movement" sequences. The goal is to associate a gesture (or position, at first) with a sound box id (which is basically the sound file of an instrument, but may be abstracted if we have round robining or other sound production methods). Either that, or an even more basic "associate it with the cursor being at X,Y". So:

clear previous training (x on screen?) and get into configure mode
double click a spot on the screen - variant: double click a sound box.
screen gives a countdown clock to get into position
when time runs out, 5 seconds of images are captured
the roughly 50-150 frame images are transformed in two ways:
- color to rgb4, like color tracker. This might also be normalized to compensate for light levels.
- actual geometry is warped like a fisheye with more info in the center, making a 64x64 (say) guide image
the set of images is turned into 64x64=4096 x,y, rgb4 bitmaps strings. If there's not much variation, or it's dark, the match matrix coordinate's value is set to 000. That said, surrounded 0's are filled in with a median value, or not even stored.

the storage format could be:

 x,
 y,
 array: [
 64x64 coord, (might be x<<6+y), rgb4 bitmap [512 bytes],
 64x64 coord, rgb4 bitmap
]

or the color ranges may be small enough that simple rgb4-style code strings would do.

Repeat to train on other coordinates. thus: there's an array of trained map -> coordinate pairs. Now, can this be indexed in a meaningful way? possibly,

Triple click or checkmark gesture ends training, this can be saved (with a name though) like the color sets.

Real training would also train on false images as well: no coordinate here with high confidence!

At run time, the image needs to undergo the same cheapening process and then run through the list of patterns to find the best match. The trick is to do the pattern matching efficiently. Fuzziness is built into the cheapening and the model ranges, so there must be an exact hit. Poor matches will not move the cursor, that way you can move from one gesture to another. Confidence is based on matches vs. set size. Another trick would be to average the results based on confidence levels. That's something like Wekinator.

Adaptive Color Tracker.

This is a color tracker that may still use the rgb4 4096 color system , but analyze the scene as a histogram and choose the tracking color automatically OR present a choice rather than the sometimes clumsy "picks a set of colors off the screen" method in use now. The histogram would replace the video image, which should have the tracking object in it. It would actually combine histograms from several seconds of video because lighting on the tracking object is so inconsistent. The histogram will probably be 2D in pseudo HSV space even though rgb4 is in RGB space. Selecting a peak (or blob) would automatically propagate the color set to the related colors in that peak. It may even have color names to click on that would seek the peak nearest that color and get the color set from that.

This may need a prototype to see if this makes sense. The color set could be in a kind of HSV4 mapping if that makes sense, although it didn't work too well last I tried it.

Technology links:

Tensorflow in JS
Optical Flow in JS
Face tracking JS
Another face tracker JS
JS image tools
Brain.js
ML Web's Javascript
Here is a hand tracking ML tracker:
Viasge technologies Face tracking and they also have a gaze tracker (multi platform) but $$$