Tracking - RicoJia/notes GitHub Wiki

========================================================================

Tracking

========================================================================

  1. Challenges with optical flow:

    1. Hard to compute somewhere?
    2. Sometimes the object might be moving fast. So we need to use dynamics to make predictions.
    3. Occlusion
    4. Errors may drift
  2. General Method

    1. Shi-Tomasi Tracking: "track the good features", find their displacements. (good for small displacements) TODO.
    2. Use ransac to find points globally that might be good candidates, then use Sift descriptors (To tell if a point is a good match)
    3. Tracking with dynamics model. Use the model to predict where the thing might be on the next frame. That will restrict the search, and will reduce noise due to trajectory smoothness.
      1. get a estimate of the velocity, accleration
      2. Predict.
  3. Tracking as inference (based on evidence and reasoning), Bayes Filter

    • hidden state x_t, estimate state: y_t.

    • old belief (prior), new belief (posterior)

    • goal: estimate the probability distribution of x_t

      caption
    • Assumptions Markovian Assumptions on prediction and states (HMM): only the immediate past matters

    • Prediction

    • Correction

      Model for all Bayes Filters
    • What each param is. Note the existence of the scalar.

    caption
    • Kalman Filter:

      1. first shift is the prediciton
      2. second stochastic diffusion is the update on miu, which takes into account the noise involved in the prediction.
      3. Third step is the blending of observation and the prediction. The variance should be smaller.

      Probability Densitiy in Prediction and Correction
      1. Pros: really simple model
      2. Cons: the model is a unimodel gaussian, too simple. Also, EKF may somehow solve the linearity problem
  4. Particle Filter

Particle Filter can work with multi-modal distributions 1. Basic Idea: weighted particles represent the distribution. Note now observations is zt

Weighted Particles can Represent The Distribution - is a type of Bayes Filter, also an optimized version of reinforcement learning. computation load is much higher than that of kalman, L [good article for tracking and its implementation](https://blog.csdn.net/jinshengtao/article/details/30970733), the original filter is [here](https://www.cs.mcgill.ca/~fkaeli/publications/particle_filter.pdf)
  1. Extension of Bayes Filter Framework: control input added

    • Given:

      1. Previous state
      2. Action model P(x_t | u_t, x_t-1), or aka "perturbation"
      3. Sensor Model: The likelihood of z given x., also a distribution p(z|x)
      4. Series of actions and observations (z_t, u_t ...)
    • Wanted:

      1. Current State Estimate
      2. The Belief (or posterior) bel(x_t) = P (x_t | u_0, z_1, u_1 ... z_t, u_t)
    • Goal: we want to get the belief.

      1. The assumptions are: x_t <- x_t-1, u_t-1, z_t <- x_t

      1. Simply put: it's a representation of Bayes Rule. Note P(z) is a normalization factor, cuz the numerator is P(Z=z, X), which yields P(Z=z)after summing over X

      1. To Further write out the formulation, we see that belief is an iteration

  2. Example: a blind robot pokes a stick out to see if there's a hole on the wall. Sometimes, the stick might be poked at a slight angle (noise).

    1. Initially, there's an even distribution of particles along s axis.

    1. It has a sensor model, P(Z="hole" | x). Because there's a slight error in angle, we see small hills along s.

    caption
    1. To get started, multiply P(s) * P(O|s) and get P(s), a new distribution for belief. Note that P(s) is already the previous belief. The weighted particles represent the likelihood of each location being where the robot is. After this we need to normalize the weights to make the total probability

    caption
    1. For the second iteration, before the robot moves, we resample the particles.
      • It's ok to resample the same particles.
      • All particles will be assigned weight 1.

    Resampling
    1. Then we complete the prediction prior bel^-(x_t) with the control input and some preset noise. Peaks in the resulting particle distribution is where we predict the robot will be at.

    caption
    1. After we take in the observation, we multiply the sensor model for the given z_t, then the peak will represent the posterior, or belief of where we think the robot is. This is like: tell me where do you predict we are, and does this corresponds to where the maximum possibility to yield what we see? We use world coords here.

    Multiplication of Sensor Model and Prediction Prior, to get belief
    1. Then we can repeat the same procedure, 3-6, for the third iteration.

    Resampling for the next iteration
4. Algorithm
    <p align="center">
    <img src="https://user-images.githubusercontent.com/39393023/123560959-2bb9c700-d76b-11eb-9300-30799495a703.png" height="400" width="width"/>
    <figcaption align="center">One iteration - Algorithm is a simplified version of the example</figcaption>
    </p>
  
    - The states are ```<x_t, w_t(weight)>``` 
    - Step 3 is sampling from the previous belief set ```S_t-1```, **How do we sample - Bins?**  
    - Step 4 is to apply control model: we can use a gaussian with gaussian noise. So that is, for every ```<x_t, w_t>``` , do determinstic shift + noise. 
    - Then apply control, and get an observation
    - Step 5 is: for the current ```<x_t, w_t>```, what's the probability to observe a given ```z_t```? Note, we use this weight directly as the belief, since we **simplify the weight of the resampled point to 1/n**  Resampling is a "computational trick" that goes well with the states: the states may duplicate, but with resampling their distributions are lossly reserved. 

    - Step 6 is to calculate that ```eta```, the scalar. 
    - Step 7 is to insert the point into the set. 
    - Finally, we normalize the weight. 

5. Implementation
    - Sensor Model: Take measurements with the laser sensor, you may get max value, and a distribution, maybe gaussian. 
    - State: ```x_t = (x,y,theta)```
    - For a robot: 
        1. Initially there're particles everywhere. 
        <p align="center">
        <img src="https://user-images.githubusercontent.com/39393023/123562219-f1542800-d772-11eb-9439-8b0bc30b4d54.png" height="300" width="width"/>
        </p>
        2. As the robot moves around, its localization converges, but there might be blobs too, cuz this environment is kinda symmetric. 
        <p align="center">
        <img src="https://user-images.githubusercontent.com/39393023/123562222-f2855500-d772-11eb-9dfe-a641fb77f3bb.png" height="300" width="width"/>
        </p>
    - Optimization: 
        1. You can make the smallest weight 1, and scale all weights accordingly. Then for each particle, we can calculate **how many times it should be resampled**

6. Pros and cons: 
    1. Pros:
        - simple way to do multi-hypothesis tracking. 


7. Cool little example: a robot can read brightness of lights to localize itself, given the locations of the lights. 
    - Sensor model is brightness over its distance to its origin.  
    - This is vision + odometry. vision can improve the overall quality!
    1. Particle filter is really:
    - where do you guess we are? take a look, calculate the possibilities of all those points.
    -  Next iteration, get the most likely points, take an action, guess where we are, then take a look around, calculate each candidate's possibility...
  1. Particle Filter problem:
    1. we have only so many particles. So we may not be able to catch all of them. So particle impoverishment is if we miss a narrow likelihood region:

    2. One reason is we keep adding particles far from the most likely area.

Tracking For Real

  1. Resampling matters: how do you resample?

    • A roullete wheel can be used: you search for the range the spoke ends up in using binary search.

    You can generate examples one by one, but you have n samples to sample from, and n samples to generate, nlog(n)!
    • So a better method was invented: "stochastic universal sampling"

    Now you can just traverse thru all the values, see how many spokes have gone past it.
    • Algorithm

      caption
  2. Practical Considerations

    1. If weights are already uniform, we don't have to resample.
    2. A huge spike on weight distribution: that means your observation model has a peak. and the other weights on observation is almost 0. Add some noise to the model, and make sure the initial proposal is evenly distributed, so there're no initial peaks.
      • Think of **a particle as a Gaussian on its own. **
      • **Overestimating noise is better than underestimating it. **
    3. Failure Recovery: if next iteration meausurements suddenly jump to somewhere with no particles, you'll get zero weights everywhere.
      • standard thing to do is to evenly distribute some points, or randomly.
    4. When the robot doesn't move: you should suspend resampling. Because over time, because of randomness, if not moving, only some samples will survive.
      • when weights are uniform, its variance of is 0, no resampling should be perfomed.
      • when weights are at peaks, its variance is higher, then resample.
  3. Tracking for real (Particle Filter Implementation https://answers.opencv.org/question/6985/syntax-for-particle-filter-in-opencv-243/)

    1. How to represent contours?
      1. what's the contour for tracking? (Condensation Paper in conference, Isard 1998)
        • contours are hand-initialized first
        • a contour is an affine deformation ? 6 parameters: 3 point to 3 point mapping, rotation, scale, translation, scale, sheer. So each particle is a 6d vector.
    2. Action Model?
      • dynamics model: measure the floating head?????
    3. Sensor Model?
      • edge detector + PCA.
      • 12 states in total.
  4. Other Considerations Head tracking: 2002, "Head tracking with contour models"

    • when there's occulusion there's speed is not being tracked. Because you can't say where the thing is. So overtime it the particles will disperse.
    • mean-shift filtering.
    • Simplest sensor model possible: mean squared difference of intensities. Or normalized correlation.
  5. Easy-to-implement example An Adaptive Color-Based Particle Filter

    • A nice explanation in Mandarin

      1. Distribution is the "weighted" number of pixels that fall into an RGB range.Each R,G,B has 8 bins.
      2. K is a "kernel function", also the weight of each pixel in the distribution. The further from the center of the box, the smaller. Pixels outside of the circumscribed circle of the box will be assigned 0.
      3. Bhattacharya Distance of the ROI, and the corresponding region observertion of each particle, will be the likelihood.

    • Steps

Mean Shift

  1. Mean Shift algorithm: find the mode of a distribution.

    Finding the mode of the distribution
    • in region of interest, is the center of mass the same as the origin? If not, lets move it, then let's see. Repeat this process

      caption
    • Most of the time it will converge
    • Tracking: you need the initial position, the location of the project and the size of the region. A simple mean shift will iteratively do this.
    • steps
      1. choose your target
      2. choose your feature space, here the feature space is the color distribution of the region.

      usually in RGB 3. normalize the histogram. 4. in the second iteration, choose a candidate? then get the histogram, normalize it.
  2. Meanshift Algorithm theory

    1. Given a set of data points, convolve it with a kernel function H to generate a smooth function f(x). Equivalent to superposition of multiple kernels at each pixel. This is called Kernel Density Estimation, KDE.

    Mean shift is the gradient ascent of a kernel smoothed filter
    1. KDE setup

    KDE set up
    • bandwidth is h. if h is too small, you don't see any changes. Else, the image will be all blurred!
    1. Mean shift - the mean is the center of mass, if we consider each kernel value to be a weight

    Some sort of mean shift is gradient ascent on some sort of kernel smoothed image
    1. Proof

    1. use in image processing, Choose K(r) = exp(-||r||^2/2)
  3. What kinda values do we get?

    • For a given ROI centered at a point, we calculate the color distribution, and get the "likelihood" of each color in a region. Then, we compute the bhatacharya coefficient b/w the two distributions, and that in the target. That's the "values" we do mean shift with.
    • So, the goal is to find the pixel with the maximum bhatacharya coefficient.
  4. Particle filter using color distribution as sensor model is still better than mean shift, since if a better distribution comes up, particle distribution will catch up to that.

    • but the mean shift might be confused if a distribution closeby has a similar color distribution.
  5. Some practical questions:

    1. how do you initialize? Manual, object detector (only fires when there's a clear view of something), or Background subtraction
    2. How to get dynamics model? learn from real data (hard), derive.
    3. Does the tracker get stuck? Maybe, adaptive trackers may get stuck, when it's not confident, it stays right there.
  6. Multi-hypotheses tracking: data association problem, what measurements go to which. Paticle is a hypothesis of the state.

Rico's Color Based Particle Filter Tracker

  1. Particle Filter. (CUDA and CPU.):

    • Interface:
      1. Initial Distribution (Prediction)
      2. resampling
      3. update control (shift states)
      4. calc_observation_likelihood: takes time to get observation.
      5. send belief
  2. Program:

    • Initialization:
      1. Select ROI: Select a rectangular region, (D)
      2. State (x,y,vx,vy,hx,hy, at_dot), - vx actually delta x to the next frame, pretty much - vx is 0 initially - at_dot is the scale change rate.
      3. Count ROI weighted histogram. - need ROI center and sides - calculate ROI
      4. Initial control: (D) - Assume velocity to be 0? yes
    • Apply Control: initial control (D)
      1. State Transition: add gaussian noise
      2. Notes:
        • without correction, this will be simply random noisy additions.
    • Update observation
      1. wait for a new frame
      2. Weighted Histogram Count, each pixel's BGR value will be squished into 8x8x8
        • find adjusted region? see code? TODO
        • Cases:
          1. x0, w0 is outside of the image (safety)
          2. We just count in a box. If the box is outside the boundary, crop it.
          • k value: b = sqrt(hx^2 + hy^2), r = dist(pixel-center)/b, k = 1-r^2 ( r<1) or 0, and finally, f = sum(k)
        • normalize it at the end.
      3. B. Distance Calc: C++ Implementation
        • then bhatacharya is piped into e (p-1)/sigma
    • Update State (Aug 3)
      1. callback to python
      2. Visualize the state.
    • Python functions (D)
      • visualize particles
      • visualize boxes
      • Python Interface (D) -initialize(ROI, frame), pybind
        • state = run_one_iteration(frame)
  3. Post program (Aug 3)

    • add a camera check
    • make an illustration of this framework.

Actual Notes

  1. Future work

    • find a way to represent weight better: 0 for bhatacharya, and larger weight for matches.
  2. Particle filter may be confused by certain parts of the background, if they're similar to the targets.

    • One solution is background subtraction.
  3. If a target is lost for a long while, and there is not enough randomness introduced by control updates, since all states have had very low weights and they're constantly resampled, it is likely that most states will get filtered out, and only few states are left. In that case, the particle filter is stuck in "local minimums"

    • Resetting states randomly when target is lost is very helpful, but it needs to be triggered at the right time. Otherwise we might never/always reset states!

Notes - To Delete

  1. We need to initialize with the states.
  2. BC is zero?? - totally legit to have 0 for some bins. And we're calculating the sum of products here.
  3. Add visualization, cv uses BGR
  4. sometimes it gets stuck at one place. Maybe need the pi_threashold
    • need to tell when we lose track (to think about particle filter's restrictions)
    • Also, for each particle, if the guess is outside of the valid range, we should set its histogram to all zero
    • TODO: if max_weight is too low, generate random states at each particle and reset weight to 1/n.
  5. background learning rate? for merging
⚠️ **GitHub.com Fallback** ⚠️