Depth Estimation with Stereo Pair - GreycLab/gmic-community GitHub Wiki

Conceptual Overview

This stub is about retrieving depth information from a pair of images taken from different points of view. Whereas an aligned stereo-gram will show only displacement from parallax effects - displacement inversely proportionate to distance - in general two photographs may be offset in other ways that require correction.

This task is related to "optical-flow" techniques used for in-betweening frames of a motion sequence. Here we assume that the just camera and not the subject has moved.

The inverse is also possible: to create a clean stereo-pair from a single image you'd apply warps inversely proportional to a depth-map. Those could then be processed to produce a single Red-Green anaglyph for instance - see Tom Keil's filters.

The human eye's stereo vision uses a combination of many cues to visualise objects in 3d space:

  • parallax between the eyes
  • knowledge of size / relative scale
  • contrast / colour in far distance
  • perspective and position on a plane
  • surface angle implied by global lighting
  • focus / depth of field

The discussion referenced here focused on mid-field distance estimation using parallax.

Related GMIC Commands

-displacement[dest_image] [source_image], smoothness 0.1, precision 5, scales auto, max iteration 10000, is_backward true

Optimises a 2d warp to minimise "energy" between one input image and a warped version of the other.

img src=https://sourceforge.net/p/gmic/wiki/_discuss/thread/ebd071ef/f01b/attachment/latex_render.png Latex: E(U) = \int_\Omega (I_1(X) - I_2(X+U))^2 + \alpha |\nabla U|^2

related Horn & Schunck

The formula's two terms are firstly a total 'fitting error' of values, secondly 'smoothness constraint' on the warp.

The algorithm solves the PDE derived by Euler-Lagrange from E(U), it incrementally refines the warp details starting at low resolution reduced size and zooming in to full scale.

advantages disadvantages
smooth estimated displacement using texture low sensitivity to cues from shading
anisotropic smooth-ness sub-pixel resolution point based not line based
confused by specular highlights requires images to have close image intensities
works well with smaller displacements doesn't allow for discontinuities at edges
-phase_correlation[dest_image,source_image]

estimates a single translation vector in x,y by detecting the dominant frequency and direction of a phase difference in the fourier transforms

advantages disadvantages
sensitive to edges one vector for whole image
sub-pixel resolution need to break images into discrete patches
robust to intensity differences assumes no rotational or trapezoidal distortion

References

http://www.flickr.com/groups/gmic/discuss/72157626199490827