Understanding ARKit Tracking and Detection - juniverse1103/ARKitStudy GitHub Wiki
WWDC18
Understanding ARKit Tracking and Detection
What's Tracking?
- Tracking provides your camera viewing position and orientation into your physical environment, which will then allow to augment virtual content into camera's view.
- Virtual content will appear always virtually correct. Correct placement, correct size, correct perspective appearance.
Technologies in ARKit 1
-
Orientation Tracking
-
World Tracking
-
Plane Detection
Technologies in ARKit 2
-
Saving and Loading Maps
-
Image Tracking
-
Object Detection
Basics of ARKit
-
Create ARSession
- ARSession is the object that handles everything from configuring to running the AR technologies and returning the results of it.
-
Create ARConfiguration
- ARConfiguration describes what kind of technologies you actually want to run.
- What kind of tracking technologies and features should be enabled. (Such as Plane Detection.)
- Take the specific ARConfiguration and call run method on the instance of the ARSession.
run(_ configuration)
- ARConfiguration describes what kind of technologies you actually want to run.
-
Built-in input system for ARKit
-
Inside ARSession
- Start congifuring an AVCaptureSession to start receiving the images
- Call CMMotionManager to begin receiving the motion sensor data
-
-
ARFrames
- After the processing, the results are returned in ARFrames at 60fps
- ARFrame is a snapshot in time that gives everything needed to render AR scene
- Captured camera image, which will be rendered in the background of AR scenario
- Track camera motion, which will be applied to virtual camera to render the virtual content from the same perspective as the physical camera
- Environment information like, detected planes
-
Orientation Tracking
-
Features
- Tracks the rotation only(3 Degree of Freedom)
- Spherical virtual environment
- Augmenation of far objects
- Not suited for physical world augmentation from different views
-
Process
- Only uses the rotation data from core motion, which applies sensor fusion to the motion sensors data
- Motion data is provided at a higher frequency than the camera image
- Orientation tracking takes the latest motion data from Core Motion, once the camera image is available.
- Returns both rotation and image data in result of ARFrame
- Camera feed is not processed in Orientation Tracking
-
Results
- Results will then be returned in an ARCamera object provided by the ARFrames
- ARCamera class
- Transform property: Orientation tracking; only contains the rotation data of tracked physical camera(coordinates in world space)
- EulerAngles property: rotation angle data
-
-
World Tracking
-
Features
-
World Tracking tracks camera viewing orientation, and also the change in position into physical environment
-
Motion Sensor
- Accelerometer: Translation
- Gyroscope: Orientation
- Provides information in correct meter scale
-
Inertial Odometry
- 
-
Visual Odometry
- 
-
Visual Inertial Odometry
- 
- With Visual Intertial Odometry, World Tracking can skip the computer vision processing for some frames, while still keeping an efficient and responsive tracking.
-
-
Visual Inertial Odometry
- 
- Within the computer vision process, device extracts regions through every image of the same environment.
- Those regions are called features. Features are then matched between multiple images over the camera stream based on their similarity and appearance.
- ARKit computes depth information by parallax between the features and creates 3D map.
- To make this reconstruction successful, the camera position must have changed by a translation to provide enough parallax.
- Sidewise movement is recommended. Pure rotation does not give enough information here.
-
World Map
- First camera's origin of the triangulated frames becomes the origin of the World map. World origin is gravity aligned.
- Camera's positions and orientations of sequences are computed in World map relatively to the World origin.
- ARAnchors are needed to place virtual content correctly to an AR Session.
-
ARAnchors
- 
- ARAnchors are reference points within the world map, to the world coordinates system.
- World Tracking might update the anchors during the tracking, which means that all the virtual content that is assigned to anchor will be updated and correctly augmented into the view.
-
Tracking
- 
- New features are extracted, matched, and triangulated, which are then extending the World map. Meaning ARKit is learning the environment.
- This allows the continuous computation of tracking updetes of the current camera's position and orientation.
- Over time, small offsets, small error could become noticeable when accumulated over time.
- To solve this, when the device comes back to a similar view, which was already explored before, ARKit can perform another optimization step.
-
Visual Inertial SLAM System
- World tracking checks how well the tracking information and the World map of the current view aligns with the past views, and will perform the optimization step and align the current information and the current World map with real physical environment.
- SLAM system optimization step is executed when the camera comes back to its initial position.
- During SLAM system optimization step, ARAnchor is also updated.
-
World Tracking API
-
To run world tracking, configure ARSession with an ARWorldTrackingConfiguration
-
Results are returned in an ARCameraobject of the ARFrame.
-
ARCamera
- transform
- Contains the rotation and translation data of the track camera.
- trackingState
- Information about tracking quality
- trackingStateReason
- transform
-
-
World Tracking Quality
-
Uninterrupted sensor data
- If constant stream of camera images and sensor data is interrupted for too long, tracking will become limited
-
Textured environments
- Enough visual complexity in the environment is important.
- If the environment is too dark or looking against a white wall might perform tracking quality poorly.
-
Static scenes
- If too much of the camera sees is moving, then the visual data won't correspond with the motion data, which might result in the potential drift.
- Device should not be on a moving platform like a bus or an elevator.
- The motionsensor would actually sense a motion, while visually the environment had not changed.
-
Tracking Quality Notifications
- Applied Machine learning
- To train a classifier, which tells how tracking performs
- Annotations like
- Number of visible features tracked in the image
- Current velocity of the device
- Annotations like
- Tracking State
- Simplified information for health of tracking state.
- Normal: Healthy state
- Limited: Limited state, comes along with the reason. Like insufficient features or excessive motion or being currently in the initialization phase.
- Not available: Tracking did not start yet.
- Whenever the tracking state changes, a delegate is called. *
- Simplified information for health of tracking state.
-