Features - RicoJia/notes GitHub Wiki

========================================================================

General Goal

========================================================================

Features are there to find matching points.
We need:
1. Interest points identification
  - Precisely repeatable on multiple similar images.
  - Compact: covers a small area
  - Much fewer features than pixels
2. a descriptor. so that each corner, edge is unique!
Most feature descriptors are based on edge, corner formation. Colors are rarely used.
- Generally, image gradient on RGB planes are quite correlated to each other. So doing on one plane might be enough

========================================================================

Harris Corner Detection

========================================================================

Non-Maximal-Suppression

non-maximal-suppresion: select non-overloapping objects. Algorithm:
1. Come up with a bunch of windows and evaluate score
2. Select the window with the highest score
3. Remove windows with high IOU (Overlapping Union) scores with the selected window
4. Put the window into the final set, so it becomes part of the output
5. Repeat 2
6. In Harris Corner Detection, you parse thru the corner list, and keep the one corner within any given window. This way, it looks more sparse. Link to Example
YOLO (You Only Look Once) also uses this

CV2

reverse an array on a single dimension:
```
arr = np.array([1,2,3,4])
arr[::-1]
```

========================================================================

Edge Detectors

========================================================================

First Order Edge Detector - why kernel works
- When the gradient exceeds an amount, we get a smear.
Second Order Edge Detector gives a clear location of the edge, instead of a smear from first order. We need to find zero-crossing
- Effect of first order edge detector and second order
- How to find Zero Crossing, and Laplacian Detector
- Discrete form of Laplacian
- LoG Laplacian of Gaussian - we need to smooth the image first, since LoG is sensitive to noise!
- 3 ways to get Kernel
- DoG is simply an approximation of LoG
- DoG, LoG can be used for detecting blobs (size similar to the kernel): LoG itself looks like a blob. When it's convoluting/cross-correlating with another blob, it will get a local maxima. Of course, there should be a halo around the blob, which comes from the impulse response. A dark blob will yield local maxima, A light-colored blob yields local minima
- Effect of LoG in Blob Detection
Resources:
- Implementation of Harris-Laplace feature detector

========================================================================

Harris - Laplacian

========================================================================

get some Gaussian Pyramid (Gaussian-blurred images with different sigmas), then the LoG Pyramid
1. get subsampling images: gaussian blur, then downsample (every other row, cln)
2. Expand the image, insert every other row and cln with zeros
3. In the expanded image, do gaussian blur to reduce high frequencies. 2,3 are cv2.pyrup.
  - why this work? gaussian is actually interpolating. Its effect is approximately gaussian blurring on the original image
Find Harris Corners on each image.
- Note: skip the harris corners's own smoothing
Non-Maximal Suppression
On the LoG pyramid, find the extremas compare against its 8 neighbors, 9 upper and lower neighbors.
- use LoG on scale space, even tho it detects blobness, is it gives the most consistent presentation of features.
- Subpixel level?
- do we need multiple GP for HL (yes, but just on the neighboring scales, so we can have three octaves, each octave has one interval, and use the middle octave for comparison)
represent the features in circles: r=sqrt(2)*sigma

Gaussian Pyramid

Base image of one octave is 1/4 size of the previous one.
- Of course, Gaussian Blurring should be applied BEFORE subsampling, to reduce the high frequency noise.
- Gaussian Blur's sigma is 2 times larger
Within an octave, there're k intervals. Size is the same, sigma = 2^(n/k)
- in openCV, the default kernel is this: whose sigma is between (1, sqrt(2))
LoG is approximated by difference b/w two interval images (with two sigmas).
- LoG is a high pass filter, so we can see edges.

Resources

Harris-Laplace Detection

========================================================================

SIFT

========================================================================

effect - can find feature matches
- Overview: step 2 and step 3 can be replaced by Harris Laplace
Scale-Invariant Extrema Detection
- just on neighboring DoGs, find local extremas in its neighbors.
Keypoint Localization
- second order approximation for localization?
Orientation Assignment
- find the most dominant direction in a local feature region:
- rotate all image gradients within the region
Descriptor
- each region is 4x4 pixels, each pixel has 8 directions, so there are 16x8 bins in the descriptor, each value is the magnitude * gaussian window function
- high magnitudes will be clipped, then normalize the descriptor?

Sub-Pixel Localization

Motivation
- LoG has zero for edges, has a minima for a matching blob
- But LoG response decays, as sigma increases
- Because the gradient of gaussian becomes smaller as sigma increases
- The laplacian has sigma² in it, so we need to multiply it by sigma²? Called Scale Normalization
- Explanation
- so comparing the corresponding keypoints on different LoGs is not as stable as this interpolation method, Lowe explained
  - Original Paper
Method: Find keypoints in the z=(x, y, sigma) space, by approximating the D(z), then finding Derivative = 0 points.

Resources

sift implementation
Nice lecture
on Subpixel:
- good intro. Note the formula of D might be wrong
- good short question

========================================================================

HoG

========================================================================

Fitting and Alignment

========================================================================

Given two images of the same scene, let's find the transformation between the two images that gives us the best matching features
Transforms:
- All these are "affine transforms": translation, rotation, shear, and scaling. Because it can be written as follows:
- Affine transformations:
  1. Parallel lines remain parallel
  2. Line mapping are the same
  3. Ratios remain the same
- in 2D, rotation has 2 dof, affine is 1+2+3 (translation, rotation, scaling, shearing, aspect). Homography is a projective transform. 8 dof.

========================================================================

RANSAC - random sample consensus

========================================================================

Main Idea:
1. Draw N sets, each set has s points
2. Come up with a propose transform
3. Apply the transform, see how many point are within a threshold D from the model. They are called inliers.
  - Points outside this threshold are called outliers
4. If there're enough inliers,
  - if the inlier ratio is rly high, then terminate
  - else, refit from 2.
N, s, e
- s=1 for a line, s=3 for affine, s=4 for Homography
- N: depends on e, probability of outliers of the best-fitting model. we solve for N, by setting K = 99%. Below is the probability of at least one set is good
- You can set e to a fixed value to begin with, but you can do "adaptive e" as well:
  1. Assume e = 1
  2. Find the first trasformation, find e' = outliers/total
  3. if e' < e , update e.
  4. N=N+1 and Repeat
D: Using "half normal distribution"
After RANSAC, take the set of inliers of the best fitting model, then use least square to finalize the model. use the move the line up and down one more time, using Least Squares
Great things about RANSAC,
1. Is the percentage of outliers remain pretty low
2. N has nothing to do with the number of putative matches (the candidates)
3. TODO: how to find the desk plane from the objects using RANSAC? Image Segmentation
1. Works well with large number of outliers
2. Applications:
  1. Most robot vision problems
  2. Homography
  3. Fundamental Matrix Estimation
Bad things about RANSAC:
1. Computational time grows a almost exponentially with the # of params.
2. Not good for fitting multiple fits.
3. REALLY NOT GOOD if your model is not what you think of (a plane, etc.)

KD tree

Example:
- Insert (7, 2)
- Insert (5, 4). (x) 5<7, so left of (7,2)
- Insert (9, 6). (x) 9>7, so right of (7,2)
- Insert (4, 7). (x) 4<7, so left of (7,2). (y): 7 >4, so right of (5,4)
```
         (7, 2)
        /       \
    (5, 4)      (9, 6)
    /    \        /
(2, 3) (4, 7) (8, 1)
```
- Find the nearest neighbor of (4.7, 7.1)
  1. Find each area's best possible distance:
  Go down the tree, can find each region's best possible distance

Features - RicoJia/notes GitHub Wiki

General Goal

Harris Corner Detection

Non-Maximal-Suppression

CV2

Edge Detectors

Harris - Laplacian

Gaussian Pyramid

Resources

SIFT

Sub-Pixel Localization

Resources

HoG

Fitting and Alignment

RANSAC - random sample consensus

KD tree

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️