Depth Estimation with Stereo Pair - GreycLab/gmic-community GitHub Wiki

Conceptual Overview

This stub is about retrieving depth information from a pair of images taken from different points of view. Whereas an aligned stereo-gram will show only displacement from parallax effects - displacement inversely proportionate to distance - in general two photographs may be offset in other ways that require correction.

This task is related to "optical-flow" techniques used for in-betweening frames of a motion sequence. Here we assume that the just camera and not the subject has moved.

The inverse is also possible: to create a clean stereo-pair from a single image you'd apply warps inversely proportional to a depth-map. Those could then be processed to produce a single Red-Green anaglyph for instance - see Tom Keil's filters.

The human eye's stereo vision uses a combination of many cues to visualise objects in 3d space:

parallax between the eyes
knowledge of size / relative scale
contrast / colour in far distance
perspective and position on a plane
surface angle implied by global lighting
focus / depth of field

The discussion referenced here focused on mid-field distance estimation using parallax.

Related GMIC Commands

-displacement[dest_image] [source_image], smoothness 0.1, precision 5, scales auto, max iteration 10000, is_backward true

Optimises a 2d warp to minimise "energy" between one input image and a warped version of the other.

img src=https://sourceforge.net/p/gmic/wiki/_discuss/thread/ebd071ef/f01b/attachment/latex_render.png Latex: E(U) = \int_\Omega (I_1(X) - I_2(X+U))^2 + \alpha |\nabla U|^2

related Horn & Schunck

The formula's two terms are firstly a total 'fitting error' of values, secondly 'smoothness constraint' on the warp.

The algorithm solves the PDE derived by Euler-Lagrange from E(U), it incrementally refines the warp details starting at low resolution reduced size and zooming in to full scale.

advantages	disadvantages
smooth estimated displacement using texture	low sensitivity to cues from shading
anisotropic smooth-ness sub-pixel resolution	point based not line based
confused by specular highlights	requires images to have close image intensities
works well with smaller displacements	doesn't allow for discontinuities at edges

-phase_correlation[dest_image,source_image]

estimates a single translation vector in x,y by detecting the dominant frequency and direction of a phase difference in the fourier transforms

advantages	disadvantages
sensitive to edges	one vector for whole image
sub-pixel resolution	need to break images into discrete patches
robust to intensity differences	assumes no rotational or trapezoidal distortion

References

http://www.flickr.com/groups/gmic/discuss/72157626199490827