Features - RicoJia/notes GitHub Wiki

========================================================================

General Goal

========================================================================

  1. Features are there to find matching points.

  2. We need:

    1. Interest points identification
      • Precisely repeatable on multiple similar images.
      • Compact: covers a small area
      • Much fewer features than pixels
    2. a descriptor. so that each corner, edge is unique!
  3. Most feature descriptors are based on edge, corner formation. Colors are rarely used.

    • Generally, image gradient on RGB planes are quite correlated to each other. So doing on one plane might be enough ========================================================================

Harris Corner Detection

========================================================================

  1. When you have a window, a region that may lie the corner, if there's truly a corner, the "Summed Square Error" of an image should be large, along all 8 directions of shifting.

    • window function w(x,y): in real life we may want to weigh each pixel, (like Gaussian, the further the pixel, the lesser weight), since simple wondow doesn't work too well.

    • Note: For corner detection, need smoothing prior to taking derivative to reduce noise

  2. Since we're doing small shifts (like by one pixel), we can approximate the SSD difference in this small window. The Summed Square Error in Taylor Expansion

  3. In 2D, it can be written in Matrix form

  4. Derivation of each term - note u, v are the same directions as x,y

  5. Then finally, we find that it's an ellipse!

    • recap, a generic ellipse centered at origin:

  6. What does this mean? This is E(u,v) of a pixel, i.e., the summed squared error of the pixel along one of the 8 directions. In 1, we argued that if all 8 of them are high, then this is a corner.

    1. So, after the Taylor Expansion approximation, we find that M with large eigen-values will yield large E(u,v) for all directions. If one eigen value is small, then its E(u,v) along the corresponsing eigenvector of M will be small.

      • M really is the summation of each invididual pixel's Squared Error, e(u,v), after second order taylor expansion. And e(u,v) turns out to be in this ellipsoid shape

    2. For edges, moving the window along the direction perpendicular to the edge will yield large E(u,v), but moving along the other direction doesn't. So one eigen-values should be smaller.

    3. For "flat" regions, both eigen-values should be small

    4. Explanation:

      caption
      • For all the summed terms of M, if you plot (Ix, Iy) distribution of pixels in the window, find an enclosing ellipse around it, you may find it ROUGHLY (I haven't found its relationship with the summed error term yet) the same ellipse as well
  7. So, the "Harris Corner Response Function", or R, measures if both eigen values are large:

    • recall det(M) = lambda1 * lambda2
    • ```alpha = (0.04, 0.06)``
    • R is small for flat region, R is negative for an edge (one lambda is large, the other is small)
  8. Harris Detector Algorithm:

    1. Compute Gaussian Derivative at each pixel (Sobel)
    2. For each pixel, compute M matrix, for a small window
    3. Compute R for each pixel
    4. Threshold R
    5. Find local maxima of response function (nonmaximum-suppression)
  9. Other corners: shi-tomasi, 1994, basically using the same M matrix, but cornerness is to find min(lambda1, lambda2), find local maximums. called Good Features to Track

  10. Properties

    1. Invariant to rotation:

    2. Invariant to additive intensity changes, but not multiplicative changes:

    3. Not Invariant to scale changes!

  11. Lambda Expression: function without name. The real value is calling an anonymous function inside another function.

    b = lambda x:x+1	#input is x, return x+1
    print b(1)

Non-Maximal-Suppression

  1. non-maximal-suppresion: select non-overloapping objects. Algorithm:

    1. Come up with a bunch of windows and evaluate score
    2. Select the window with the highest score
    3. Remove windows with high IOU (Overlapping Union) scores with the selected window
    4. Put the window into the final set, so it becomes part of the output
    5. Repeat 2
    6. In Harris Corner Detection, you parse thru the corner list, and keep the one corner within any given window. This way, it looks more sparse. Link to Example
  2. YOLO (You Only Look Once) also uses this

CV2

  1. CV2 Harris Corner: R=det(M)−k(trace(M))2, when R<0, one lambda is much larger than the other. when R>0 and R is large, that's a corner.
    • ksize means the size of sobel operator kernel
    • If you need gaussian kernel, use
      def get_gaussian_kernel(size:int): 
          """
          size needs to be odd
          We're not sure the padding rule of cv2.GaussianBlur - Technically applying the identity image with 
          the same kernel size should yield the Kernel, But it doesn't
          """
          #identity image here
          image = np.zeros((size+2, size+2))
          image[int(size/2) + 1, int(size/2) + 1] = 1
          kernel = cv2.GaussianBlur(image, (size, size), 0)[1:1+size, 1:1+size]
          return kernel
        ```
    • copy an image: output_img = cv2.cvtColor(input_img.copy(), cv2.COLOR_GRAY2RGB)
    • reverse an array on a single dimension:
      arr = np.array([1,2,3,4])
      arr[::-1]
  2. Error:
    • Unsupported combination of source format (=4), and buffer format (=5) in function 'getLinearRowFilter' means the input type needs to be changed (e.g., int -> float)

========================================================================

Edge Detectors

========================================================================

  1. First Order Edge Detector - why kernel works

    • When the gradient exceeds an amount, we get a smear.

  2. Second Order Edge Detector gives a clear location of the edge, instead of a smear from first order. We need to find zero-crossing

    • Effect of first order edge detector and second order

    • How to find Zero Crossing, and Laplacian Detector

    • Discrete form of Laplacian

    • LoG Laplacian of Gaussian - we need to smooth the image first, since LoG is sensitive to noise!

    • 3 ways to get Kernel

    • DoG is simply an approximation of LoG

    • DoG, LoG can be used for detecting blobs (size similar to the kernel): LoG itself looks like a blob. When it's convoluting/cross-correlating with another blob, it will get a local maxima. Of course, there should be a halo around the blob, which comes from the impulse response. A dark blob will yield local maxima, A light-colored blob yields local minima

    • Effect of LoG in Blob Detection

  3. Resources:

========================================================================

Harris - Laplacian

========================================================================

  1. get some Gaussian Pyramid (Gaussian-blurred images with different sigmas), then the LoG Pyramid

    1. get subsampling images: gaussian blur, then downsample (every other row, cln)

    2. Expand the image, insert every other row and cln with zeros

    3. In the expanded image, do gaussian blur to reduce high frequencies. 2,3 are cv2.pyrup.

      • why this work? gaussian is actually interpolating. Its effect is approximately gaussian blurring on the original image
  2. Find Harris Corners on each image.

    • Note: skip the harris corners's own smoothing
  3. Non-Maximal Suppression

  4. On the LoG pyramid, find the extremas compare against its 8 neighbors, 9 upper and lower neighbors.

    • use LoG on scale space, even tho it detects blobness, is it gives the most consistent presentation of features.
    • Subpixel level?
    • do we need multiple GP for HL (yes, but just on the neighboring scales, so we can have three octaves, each octave has one interval, and use the middle octave for comparison)
  5. represent the features in circles: r=sqrt(2)*sigma

Gaussian Pyramid

  1. Base image of one octave is 1/4 size of the previous one.

    • Of course, Gaussian Blurring should be applied BEFORE subsampling, to reduce the high frequency noise.
    • Gaussian Blur's sigma is 2 times larger
  2. Within an octave, there're k intervals. Size is the same, sigma = 2^(n/k)

    • in openCV, the default kernel is this: whose sigma is between (1, sqrt(2))
  3. LoG is approximated by difference b/w two interval images (with two sigmas).

    • LoG is a high pass filter, so we can see edges.

Resources

  1. Harris-Laplace Detection

========================================================================

SIFT

========================================================================

  1. effect - can find feature matches

    • Overview: step 2 and step 3 can be replaced by Harris Laplace
  2. Scale-Invariant Extrema Detection

    • just on neighboring DoGs, find local extremas in its neighbors.
  3. Keypoint Localization

    • second order approximation for localization?
  4. Orientation Assignment

    • find the most dominant direction in a local feature region:

    • rotate all image gradients within the region

  5. Descriptor

    • each region is 4x4 pixels, each pixel has 8 directions, so there are 16x8 bins in the descriptor, each value is the magnitude * gaussian window function
    • high magnitudes will be clipped, then normalize the descriptor?

Sub-Pixel Localization

  1. Motivation

    • LoG has zero for edges, has a minima for a matching blob

    • But LoG response decays, as sigma increases

    • Because the gradient of gaussian becomes smaller as sigma increases

    • The laplacian has sigma2 in it, so we need to multiply it by sigma2? Called Scale Normalization

    • Explanation

    • so comparing the corresponding keypoints on different LoGs is not as stable as this interpolation method, Lowe explained

  2. Method: Find keypoints in the z=(x, y, sigma) space, by approximating the D(z), then finding Derivative = 0 points.

Resources

  1. sift implementation
  2. Nice lecture
  3. on Subpixel:

========================================================================

HoG

========================================================================

========================================================================

Fitting and Alignment

========================================================================

  1. Given two images of the same scene, let's find the transformation between the two images that gives us the best matching features

  2. Transforms:

    • All these are "affine transforms": translation, rotation, shear, and scaling. Because it can be written as follows:

    • Affine transformations:

      1. Parallel lines remain parallel
      2. Line mapping are the same
      3. Ratios remain the same
    • in 2D, rotation has 2 dof, affine is 1+2+3 (translation, rotation, scaling, shearing, aspect). Homography is a projective transform. 8 dof.

========================================================================

ORB Features (Oriented FAST and Rotated BRIEF)

Principle

Paper - ORB: an efficient alternative to SIFT or SURF - By Rublee, Gary Bradski, et al. Willow Garage

  1. Fast Feature Identification

    1. Find a pixel. Draw a circle with a radius of 3 pixels (16 pixels)
    2. If you can find more than N consecutive pixels with an intensity greater than T, then this pixel is a keypoint.
      • N=12 is FAST-12. For that, you can check pixel 1,5,9,13 to rule out.
    3. Repeat this on other pixels
    4. Apply non-maximal suppresion to avoid keypoint clustering
  2. Orientation Identification (Not in Fast)

    • Compute the image moments, then the intensity centroid. Using the center of the 3x3 patch as the origin $O$ $$ m_{pq} = \sum_{x,y} x^{p} y^{q} I(x,y) \ C = [\frac{m_{10}}{m_{00}}, \frac{m_{01}}{m_{00}}] $$
      • So, $x, y \in [-3, 3]$ for the entire image patch. $p, q \in [0,1]$ for the zeroth and first order image moments.
    • Calculate orientation of $\vec{OC}$ using the quadrant-aware atan2 $$ \theta = atan2 (m_{01}, m_{10}) $$
  3. Represent the descriptor in steered BRIEF

    • Pre-select 256 pairs of pixels.
    • Steer the descriptor. From the "normalized image" to the rotated one (the patch we have): $$ S = R(\theta)[x_1,y_1; ... x_n, y_n] $$
    • Compare their intensities. If $I(p1) &gt; I(p2)$ then you get a 1, else 0. Put it in a 256-bit array
  4. Repeat the same process in an image pyramid for scale invariance. This is needed because some features are not ORB features anymore if you zoom in/out.

  • Example C++ Application (OpenCV)

Find Nearest Neighbor FLANN (Fast Nearest Neighbor )?

  • An alternative is called LSH (Locality Sensitive Hashing), also a probablility datastructure for finding an approximate nearest neighbor

========================================================================

RANSAC - random sample consensus

========================================================================

  1. Main Idea:

    1. Draw N sets, each set has s points
    2. Calculate the transform from this set
    3. Apply the transform, see how many point are within a threshold D from the model. They are called inliers.
      • Points outside this threshold are called outliers
    4. If there're enough inliers,
      • if the inlier ratio is rly high, then terminate
      • else, refit from 2.
  2. N, s, e

    • s=1 for a line, s=3 for affine, s=4 for Homography

    • N: depends on e, probability of outliers of the best-fitting model. we solve for N, by setting K = 99%. Below is the probability of at least one set is good

    • You can set e to a fixed value to begin with, but you can do "adaptive e" as well:

      1. Assume e = 1
      2. Find the first trasformation, find e' = outliers/total
      3. if e' < e , update e.
      4. N=N+1 and Repeat
  3. D: Using "half normal distribution"

  4. After RANSAC, take the set of inliers of the best fitting model, then use least square to finalize the model. use the move the line up and down one more time, using Least Squares

  5. Great things about RANSAC,

    1. Is the percentage of outliers remain pretty low

    2. N has nothing to do with the number of putative matches (the candidates)

    3. TODO: how to find the desk plane from the objects using RANSAC? Image Segmentation

    1. Works well with large number of outliers

    2. Applications:

      1. Most robot vision problems
      2. Homography
      3. Fundamental Matrix Estimation
  6. Bad things about RANSAC:

    1. Computational time grows a almost exponentially with the # of params.
    2. Not good for fitting multiple fits.
    3. REALLY NOT GOOD if your model is not what you think of (a plane, etc.)

K Means

Example:Given a list of points: (2,3),(5,4),(3,8),(8,8),(7,2),(6,3)(2,3),(5,4),(3,8),(8,8),(7,2),(6,3), with K=2 clusters

  1. Randomly pick two centroids: (2,3), (8, 8)

  2. For each point, compare its distance to each centroid. Get

    (2, 3) → Cluster 1
    (5, 4) → Cluster 1
    (3, 8) → Cluster 2
    (8, 8) → Cluster 2
    (7, 2) → Cluster 1
    (6, 3) → Cluster 1
    
  3. Find new centroids of each centroid, by calculating the new averages:

    (2+5+7+6)/4,(3+4+2+3)/4=(5,3)
    (3+8)/2,(8+8)/2=(5.5,8)(3+8)/2,(8+8)/2=(5.5,8)
    
  4. Repeat Assignment. If the result is the same, then the algorithm converges.

There's a chance kmeans could alternate. In that case, need to initialize differently.

KD tree

  1. Example:
    • Insert (7, 2)
    • Insert (5, 4). (x) 5<7, so left of (7,2)
    • Insert (9, 6). (x) 9>7, so right of (7,2)
    • Insert (4, 7). (x) 4<7, so left of (7,2). (y): 7 >4, so right of (5,4)
             (7, 2)
            /       \
        (5, 4)      (9, 6)
        /    \        /
    (2, 3) (4, 7) (8, 1)
    
    • Find the nearest neighbor of (4.7, 7.1)
      1. Find each area's best possible distance:

      Go down the tree, can find each region's best possible distance
⚠️ **GitHub.com Fallback** ⚠️