Vslam - RicoJia/notes GitHub Wiki

========================================================================

Basics

========================================================================

  1. Sparse vs Dense SLAM: simultaneously build a map, and in parallel try to localize

    • Mi's floor cleaning robot uses 2D Lida, Dyson's uses 3D camera
    • DJI: Stereo
    • Waymo: High Precision 3D Lidar
    • Tesla, Mobile Eye: cheap cameras
    • Sparse is just some points, Dense: you can almost see the environment

  2. Basic Principle: 3D to 2D projection from. Challenges:

    1. Loop Closure (can I recognize the loop?)

    2. Fast moving, strong rotation.

  3. ENFT-SFM: loop closure

    • match common features among different subsequences? Important for loop closure
    • eliminate global drift, by optimizing on all the frames
    • Framework: Non-consecutive feature tracking.
      1. Input frames

      2. Consecutive feature tracking (ORB, SIFT)

        • "first pass": match SIFT features by dividing differences of the first and second candidate

        • Feature matching failures: 1. no guarantee a feature can be extracted out 2. Might be other repetitive features
      3. Non consecutive track matching (second pass?)

        • Estimate homography between two frames iteratively. Use inlier matches in first-pass matching (RANSAC). You can search along epipolar lines.
        • How to match partially overlapping non-consecutive frames? (loop closure!)
          1. Fast matrix (coarse estimate)
            1. Each frame has a group of feature Descriptors. Get their average as "track descriptor"
            2. Hierarchical K means to cluster track descriptors, and get the distance between frame

          2. Start from the best estimates in the fast matrix, then refine the matching matrix.

      4. SfM

      5. Localize

      6. Flowchart

        caption
  4. Sensors

    1. RGBD Cam: structure light (intel realsense, Microsoft-Kinect v1). So it's still stereo vision, but instead of passively receiving light, you project light using a projector. Then, you receive the light, get its image coord, do the subtraction, then you get the depth of the object. Of course, they usually generate light stripes so they can be easily identified

    2. Lidars

      • solid vs mechanical lidars: solid has small view angles, mechanical is 360. Most is mechanical.
      • multi-beam lidars, velodyne, 16 beams at the same time
  5. Resources

    1. 知乎总概论

========================================================================

Gmapping

========================================================================

  1. check cpp params https://www.guyuehome.com/14967
    • save map: rosrun map_server map_saver -f map

Theory

summary

  1. The problem is to evaluate the joint probability p(xt, mt | zt, ut), which can be decomposed into two problems:

    • So this is rao-blackwellized particle filter, which marginalizes on x_t first, and the rao-blackwellized theorem says the performance will not degrade.
  2. P(xt | zt, u1:t) -> eta * p(zt|xt) * transition p(xt | xt-1, ut-1) * prior p(xt-1 | ut-1, zt-1)

  3. However, in traditional particle filter: we may not have enough particles to estimate the real distribution (aka target distribution), so we might yield to

    1. particle impoverishmentnotes
      • target distribution is the real distribution, proposal diestribution is our estimate.
    2. Particles might lose diversity.
    3. Fast slam was the first method for LIDAR mapping. But it was using the above method. And remember, each particle has a map and the pose
  4. So, gmapping is using scan-matching as a way to estimate the most likely p(xt | xt-1, ut-1) * p(xt-1 | ut-1, zt-1) ~ P(xt | zt, u1:t), given the optimal map from the previous method. For each state, the scan-matcher will find the most likely pose X_in and the likelihood of this pose.

  5. Then our proposal distribution is assumed to be gaussian. Sample k samples around X_in. Then, get eta, then calculate miu

  6. At the end of this state, we update the weight, and update the map associated with it.

  7. At the end of the whole iteration,after all states, we resample

  8. Whole algorithm

========================================================================

Scan Matching

========================================================================

  1. Summary

    1. Mainstream: Lidar. Camera is influenced by lighting
    2. Direct matching. Main stream SLAM do not use ICP, since it needs a lot of iteration and sensitive to initial value
  2. gmapping Resources

========================================================================

ICP & NDT - The Scan Matching Step

========================================================================

  1. Summary
    1. SVD
      • eigen to tf transform
    2. Gauss-Newton
    3. Installation. article
      • fmt: You need to link the library as well

SVD Method

  1. Minimize sse, first realize we can decompose this promlem into 2: get R, then get t. Errata: should be miu_x-Rmiu_y - t!!

  2. For getting R, we can simplify the problem to

    Now we have only 1 term!
  3. Two get the argmax of the above term, we can consider that term to Tr(AA^T) form. Then we're golden. However, we also need: Tr(AB) = Tr(BA)

  4. Yes, with H = Y' X' ^ T, we can find R = V U^T (H = U E V^T) such that Tr(H) = Tr(AA^T). Note X' = [ (x1- ux) | (x2 - ux) | ... ]. t = ux - R*uy. Errata: should be t = miu_x - R*miu_y

    caption
  5. Advantages, Disadvantages

    • svd does something, but it is not good enough for scan matching.

      Before ICP-SVD Matching

      After ICP-SVD Matching
    • advantages:

      1. simple calculation.
    • Disadvantages:

      1. initial positions matter, because we are just trying to minimize over all points. So if two point clouds are too far off, we may not be able to align them.
      2. Point matching is expensive, and we may have multiple points matched to the same points (which is wrong)

Gauss Newton Method

  1. We're trying to minimize the function: 1/2 sum[(xi - T*yi)^2]. (sum of squared error, e), in terms of params epsilon = [tx, ty, tz, theta_x, theta_y, theta_z]
  2. For a single pair xi - (R*yi - t), Take derivative wrt epsilon, we can get J, in the following form, using the R's derivative from right disturbance
  3. But we are trying to get sum of squared errors. So, we need to sum up (Ji^TJi), and J^T*e.
  4. Update t, R using the right disturbance
  5. Full thing

NDT - Normal Distribution Transform

  1. Assumption: if we divide map into cells, then point clouds (reference, current) both has normal distribution in each cell. So the objective is to find such a transform T, that makes the yi' = Tyi have a similar distribution as the reference pt cloud

  2. Reference Principle

⚠️ **GitHub.com Fallback** ⚠️