Optic_Flow - RicoJia/notes GitHub Wiki

========================================================================

Motion Estimation Across Multiple Frames

========================================================================

  1. Feature Based

    • When motion is big and easy to capture
  2. Direct, Dense Methods: on pixel level, for smooth motions.

    • Motion Field is motions in the Physical World
    • Optic Flow: find pixel correspondences, Assumption1: Same Color, Assumption 2: nearby pixels
  3. Now ppl like to get features for long range, and dense methods (gradient) for short term motion estimation

========================================================================

Optic Flow:

========================================================================

  1. Brightness Constancy: Assumption3: small motion, less than a pixel. Therefore, we have the same gradient at one pixel, Ix(t+1) = Ix(t). Also, the same pixel will traverse to a nearby one, I(x+u, y+v, t+1)

  2. Aperture Problem: we can only determine the component along the image gradient direction

  3. To solve this problem, one way is to minimize an error term. It's called "smooth optic flow"

  4. Another way is Lucas-Kanade.

    1. It's to enforce local constraints, by assuming in a window, motion (u,v) is the same. Say in a 5x5 window, you get 25 equations. Then, use (vertical) least squares for this.

    2. In u-v space, a brightness constraint at a pixel is a line. So with more points, we will get intersecting lines. Getting least square is finding that intersection!

    3. Point with a good optic flow is usually a harris corner, because of the same second moment matrix, and for having it solvable, we don't want its condition number to be too large, or both of them to be too small (noise)

  5. Problems to Lucas-Kanade:

    1. Brightness Constancy doesn't hold: use sift maybe
    2. Points don't move like its neighbors (specific to Lucas Kanade) - motion segmentation
    3. Motion is big! (not just one pixel)
      1. Taylor Expansion doesn't hold anymore, need bigger step. Use Lucas-Kanade iteratively. find the best (u,v)

      2. This could also lead to local minima. Reducing the resolution will make the motion less than a pixel

        • So we have "hierarchical LK", which is based on Gaussian Pyramid. run iterative IK on small scale image, then move on to the larger one

    4. Too many points to compute! Use sparse LK, so calculate (u,v) that're at certain locations. It's in cv2.calcOpticalFlowPyrLK()
    5. One inherent contradiction: neighboring pixel's neighbors, though overlap, are assumed with different (u,v) for different pixels.
  6. Image Pyramid: developed in 1983: also have "laplacian images"

    1. Separable Filters can move horizontally, then vertically. 5-tap filter: [1,4,6,4,1]/16 on 5 pixels. Then move the filter 2 pixels to the right:

    2. Reduce is easy. Expand: use a 3-tap filter [1/8, 3/4, 1/8] on coarse pixels to generate odd fine pixels. Then use 2-tap [1/2, 1/2] on coarse pixels to generate even fine pixels (because there's no correspondence, so it's just interpolation)

    3. apple-orange blending: 1983 ACM transaction cover. 1. Get Laplacian Pyramid 2. half apple, half orange 3. add half orange, half apple. So they're blended much more in low frequency than high frequency (if you just put two halves together, the sharp edge will be a high frequency.) So reduce-expand using Laplacian Pyramid is a bandpass operation!

  7. Similarity = translation + rotation; perspective = projection of one plane onto another

    • Velocity in general similarity can be written as: sum of linear velocity + angular velocity. Note cross product can be expanded:

    • The scenario for computer vision, is the camera and your body have relative motion with a scene object.

    • The goal is to evaluate image pixel displacement(velocity), u,v, or vx, vy. Take derivative of X,Y (world frame)

    • So expand the matrix form velocity along x, y axes

    • Then two tricks to have a clean version of this: rotation about z axis is not affected by z, or f. 2. rewrite X=x*z/f.

    • So we can find relative translation that has to do z, relative rotation don't. So when taking a panorama, don't walk around, rotate

    • Rearrange: 8 params, instead of 10. (d below is distance from the plane to the body frame origin. x,y are image coords)

    • If some pixels belong to the same plane, then we can use the same motion model above!.

    • One more simplification is: if the distance on the plane is small? we can omit x^2, xy, y^2 terms. This will leave us with 6 params, affine transform

    • Therefore we can apply brightness constancy in a larger window. With the same params, instead of the same u,v. Looser condition

    • To solve for the params, we minimize the error using least squares

⚠️ **GitHub.com Fallback** ⚠️