Elastic Bunch Graph Matching - davidar/scholarpedia GitHub Wiki

Elastic Bunch Graph Matching is an algorithm in computer vision for recognizing objects or object classes in an image based on a graph representation extracted from other images. It has been prominently used in face recognition and analysis but also for gestures and other object classes.

Matching in frontal pose Matching at -45^\circ

Table of Contents

Introduction

Elastic Graph Matching (EGM) is a biologically inspired algorithm for object recognition in the field of computer vision. It draws its biological inspiration from two sources: (i) The visual features used are based on Gabor wavelets, which have been found to be a good model of early visual processing in the brain, more precisely simple cells in primary visual cortex. (ii) The matching algorithm itself is an algorithmic version of dynamic link matching (DLM) (Wiskott and von der Malsburg, 1996), which is a model of invariant object recognition in the brain.

Visual objects in EGM are represented as labeled graphs, where the nodes represent local textures based on Gabor wavelets and the edges represent distances between the node locations on an image. Thus an image of an object is represented as a collection of local textures in a certain spatial arrangement. If a new object in an image shall be recognized, the labeled graphs of stored objects, so-called model graphs, are matched onto the image. For each model graph the locations for the nodes in the image are optimized such that local texture of the image fits the local texture of the model and distances between the locations fit the distances between the nodes of the model. The model graph with the best fit constitutes the recognized object, and with its node locations in the image an image graph can be created.

Elastic Bunch Graph Matching (EBGM) is an extension to elastic graph matching for object classes with a common structure, such as faces in identical pose. All instances of such a class are represented by the same type of graph. From these graphs a bunch graph of same structure is created, with the nodes representing local textures of any object in the class, e.g. all variants of a left eye, and the edges represent average distances between the node locations, e.g. the average distance between the two eyes. This permits taking advantage of the combinatorics of the local textures to represent instances of the object class not seen before. For instance, the textures of the eyes could be taken from one face and the textures of the mouth from another face to represent a new face that shares features with the two stored faces. Thus, a bunch graph is an abstraction for representing object classes rather than individual objects.

EBGM can only be applied to objects with a common structure, such as faces in frontal pose, sharing a common set of landmarks like the tip of the nose or the corner of an eye. For the recognition of arbitrary objects, in the absence of landmarks, the graphs are required to be dynamic with respect to both shape and attributed features. To this end, Westphal and Würtz (2009) have proposed a graph dynamics that lets generic object representations, model or bunch graphs, emerge from a collection of arbitrary objects. The idea is to extract typical local texture arrangements from the objects and provide the rules to compose them as needed to represent new objects.

Algorithm

Gabor Wavelets

Cosine Gabor wavelet, which is the real part of the complex wavelet. Sine Gabor wavelet, which is the imaginary part of the complex wavelet. The most often used elementary features in EBGM are based on Gabor wavelets, having the shape of a (co)sine wave multiplied with a Gaussian envelope function (<figref>SpaceKernelRealPart.jpg</figref>, <figref>SpaceKernelImaginaryPart.jpg</figref>), written as \[ \psi_{\vec{k}}(\vec{x})]=

 Move all nodes locally to adapt to local distortions or differences.''' Finally one can randomly move the nodes locally to find an even better match.  This should be done for all nodes repeatedly, e.g. in a random order, because one node cannot be optimally positioned if the others are still misplaced.

The result of such matching is an image graph, which could be integrated into the gallery as a model graph, and a similarity value between the model graph matched to the image and the resulting image graph, which can be used to decide whether the model object is in the image or not.

Bunch Graphs

Schema of a bunch graph. The jets of many model graphs are combined. During matching, different jets become the local experts, here shaded in gray.

Elastic graph matching solves the problem of finding a known object in an image by matching a model graph to the image. However, it would be advantageous if one could also find an unknown object in an image and create an image graph for it in a reproducible manner. This would have at least two major advantages: Firstly, model graphs for new objects could be created without manual assistance. Secondly, there would be no need to match every single model in the gallery to the image. An image graph could be generated independently of a concrete model and then only the graph comparison would be done for each model, which would save a lot of computation.

Such a generic creation of an image graph is difficult for two reasons: Firstly, one has to segment the object from the background, and secondly, one has to decide where to place the nodes on the object. However, when processing an object class where all objects have a common structure, much is known about a new object of that class, even if we have not seen it before. Faces are a prominent example. Even if we have not seen a particular face before, we know its structure and can easily find the eyes, nose, etc., because we have seen many other faces before.

The structural knowledge of faces and their variants can be represented in a so-called bunch graph <math>G^B</math> (Wiskott, 1997, Wiskott et al, 1997) (<figref>BunchGraph.png</figref>). Assume model graphs of identical structure are given for 100 frontal views of different faces. Since the graphs have identical structure, one can easily calculate the average distance between two nodes across all 100 model graphs. This yields the labels of the edges in the bunch graph. The nodes are simply labeled with all hundred jets from the different faces at one node. Thus, a bunch graph differs from a normal object graph in that each node is labeled with not just one but many jets, and the edges are labeled with average distances rather than distances from one concrete face.

When comparing a bunch graph with a normal graph, one takes advantage of the combinatorics and always picks the best fitting jet at each node independently of the other nodes, realized by the <math>\max</math> operation in the bunch graph similarity function \eqref{similarityBunchGraph}. \[\label{similarityBunchGraph} S_{BG}(G^I,G^B)] Thus, the eye jets might come from one person, the nose jet from another one, and the mouth jets from a third one. This process resembles the process by which police creates an identikit picture for an unknown person they are searching for. The best fitting jet at one node is referred to as the local expert.

Elastic Bunch Graph Matching

Matching a bunch graph to an image (<figref>my_pose_45-animated.gif</figref>, <figref>my_pose_-45-animated.gif</figref>, <figref>my_pose_-45-animated.gif</figref>) to create an image graph works exactly as elastic graph matching except that the local expert, as determined by the <math>\max</math> operation, is used for each node and might actually change during matching as the position of the node changes (Wiskott, 1997,Wiskott et al, 1997). The result of the matching is not only the image graph but also the identity of the local experts, which can be used to further analyze the object, see below.

Applications

Face Recognition

Once we have the means to generate and compare image graphs, recognition of faces in identical pose is relatively straight-forward. Matters become more complicated when trying to recognize faces across different poses.

Recognition in Identical Pose

Assume we are given 1000 facial images of identical pose, e.g. all looking straight into the camera, and correctly labeled with the name of the person they show. This constitutes the model gallery. For face recognition we would proceed as follows:

  • Step 1: Building a face graph. The first step to bootstrap the system is to define the graph structure for the given pose. Thus, we take the first image and manually define node locations on the face that are easy to localize, such as the corners of the eyes or mouth, the center of the eyes, the tip of the noise, some points on the outline etc. We also define edges between the nodes. This constitutes our first face graph.
  • Step 2: Building a face bunch graph. The single face graph defined above can be viewed as a bunch graph with just one instance in it. It can be matched onto the second face image, but if the first two face images are not very similar, the match is of poor quality. For instance, the tip-of-the-nose node might by placed at the cheek, so we need to move the node onto the tip of the nose by hand. After some such manual correction the graph is acceptable and constitutes the second instance in the bunch graph. The bunch graph with two instances is then matched onto the third image, and after some manual correction we have a third instance for the bunch graph. By repeating this process, the bunch graph grows, and as it grows the match onto new images gets more and more reliable. If we are satisfied with the quality of the matches and only little manual correction is needed, we are done with building the bunch graph. Let say this happens after having processed the first 100 images.
  • Step 3: Building the model gallery of graphs. Since we now have a bunch graph that provides sufficient quality in finding the node locations in a new face, we can process the remaining 900 images fully automatically. To avoid the distinction between manually corrected and fully automatically generated model graphs, one can create the bunch graph on an extra set of 100 images distinct from the 1000 model images. Then all 1000 model graphs can be created automatically. We are now in the position to perform face recognition on a new probe image.
  • Step 4: Building the probe graph. Assume we are given a new image and shall find the depicted person in the gallery. First we need to create a graph for the probe image. This process works exactly as for the model images, just that we use the probe image.
  • Step 5: Comparison with all model graphs. The image graph is compared with all model graphs, resulting in 1000 similarity values. These form the basis of the recognition decision. Notice that this does not require EBGM anymore, only the graphs are compared according to the similarity function \eqref{similarityGraphs}.
  • Step 6: Recognition. For recognition it is obvious that the model graph with the highest similarity with the image graph is the candidate to be recognized. However, if the best similarity value is relatively low, the system might decide that the person on the probe image is not in the model gallery at all; and if there are more than one very high similarity values, the system might decide that the person is most likely in the model gallery but that there are several possible candidates. Only if the highest similarity is high and the next one is low can the system recognize the face on the probe image with high reliability.
This system has achieved 98% correct recognition rates on a gallery of 250 frontal view faces, 57% on half profiles, and 84% on profiles (Wiskott et al, 1997). Half profiles were presumably most difficult, because the pose was least well controlled.

Recognition Across Different Poses

EBGM is robust to small pose changes. Larger differences in pose modify the local features too much for corresponding points to be found, some features are occluded in some poses. This difficulty can be overcome by creating bunch graphs for different poses and matching all of them to the probe image. As small pose differences are tolerated by the matching process, only coarse sampling of pose is necessary. Only if the poses matched are roughly equal, high similarity values and good correspondences will result.

On the other hand, a single frontal image per person should contain enough information for successful recognition. It is therefore desirable to reduce the gallery to single pose and learn the pose variation from examples. Müller et al (2013) have done this by setting aside a model gallery of graphs of <math>N_M</math> people known in all poses. The actual gallery of all people only contains frontal poses. Each gallery graph can be matched to the models in frontal pose. From the matching result <math>G^0</math> the rank list <math>\gamma^0</math>of jet similarity to all models is calculated for each node. These define the similarity relations of the gallery image to the frontal pose images. For a probe image, EBGM can be applied to the model gallery, with the best match belonging to the correct pose but not necessarily to the correct identity, because that identity need not be part of the model gallery. For the found pose again similarity rank lists <math>\pi^v</math>of length <math>N_M</math> can be calculated for each graph location. These are a representation of the probe image in terms of similarities to the various model identities in the appropriate pose. While no direct similarity between the probe image and the gallery images can be calculated, it can be expected that the rank lists <math>\pi_v</math> and <math>\gamma_v</math> will be similar for the same person. The underlying assumption is that faces that are similar in one pose will also be similar in other poses. With the help of the rank list similarity function \[ S_{\text{rank}}(\pi,\gamma_g)] which is averaged over all graph nodes present in both pose graphs

\[
&#61;&#61;&#61;&#61; Recognition in different illuminations &#61;&#61;&#61;&#61;


The same method can be applied to recognition in different illumination
situations. With a model gallery of 100 identities from the CAS&#45;PEAL 
database[[#cas-peal-smc| (Gao et al., 2008)]] and 9 different illuminations
the recognition rate for standard jet similarity was 82.7%, with combined
similarities 88.9%.


&#61;&#61;&#61; Face Analysis &#61;&#61;&#61;



&#61;&#61;&#61;&#61; Facial Features &#61;&#61;&#61;&#61;


[[File:Wiskott-1997-PatternRecognition-Fig02.png|300px|thumb|right|Face analysis with EBGM. A face bunch graph (middle) is matched onto a new face image (left). For each node (small black dots) there is a local expert in the bunch graph (pointed at by the arrow heads), from which a so called phantom face can be synthesized and abstract features such as gender, glasses, or beard can be inferred. In this case the phantom face looks Caucasian even though the original face is half Asian, because the bunch graph only includes Caucasian faces. Reproduced from [[#phantomfaces|(Wiskott, 1997)]] with permission from &lt;a href&#61;&quot;http&#58;//www.elsevier.com/journal&#45;authors/author&#45;rights&#45;and&#45;responsibilities&quot; target&#61;&quot;_blank&quot;&gt;Elsevier&lt;/a&gt;.&#93;&#93;

As a result, EBGM yields 
&#42; an overall similarity value, which can be used to decide whether the image actually contains an object of the given class, e.g. a face,
&#42; locations of the nodes in the image, which form the basis for creating an image graph used for storage or recognition, and
&#42; for any node the best fitting instance in the bunch graph, referred to as local expert.
For faces, the latter can be used for an analysis of some abstract properties of the face.  For instance, if most local experts come from female faces, it is likely that the face on the new image is female as well&#59; if most local experts in the lower half of the graph come from bearded faces, it is likely that the person has a beard&#59; if most local experts in the upper half of the graph come from faces with glasses, it is likely that the person wears glasses (&amp;lt&#59;figref&amp;gt&#59;Wiskott&amp;&#35;45&#59;1997&amp;&#35;45&#59;PatternRecognition&amp;&#35;45&#59;Fig02.png&amp;lt&#59;/figref&amp;gt&#59;). On a mixed set of 111 faces this method has achieved 92% correct discrimination between male and female faces, 94% correct detection of a beard vs no beard, and 96% correct detection of glasses vs no glasses [[#phantomfaces|(Wiskott, 1997)]].


&#61;&#61;&#61;&#61; Medical Analysis &#61;&#61;&#61;&#61;


The image graph found by EBGM in a facial image is a representation of that image. It can be converted to a feature vector by simply listing all
vertex and edge labels. This vector can be used as input for classification systems. 
Such classifiers can provide suggestions for
medical diagnoses for ailments which influence the appearance of the face. 
[[#syndrome-clin|Böhringer et al. (2011)]] have used this approach to distinguish 
among eleven genetically caused syndromes. Five of them could be identified in
100% of the cases, the overall correct recognition rate was 60%. The classifier
used was Local Discriminant Analysis after Principal Component Analysis and dimension 
reduction. These high classification rates
could, however, only be  achieved with some manual corrections on the placement 
of the graph nodes on the image. Finding good landmark locations automatically remains
a research challenge.
In a further study [[#acromegaly-2011|(Schneider et al, 2011)]] the method was applied to distinguishing 
patients with acromegaly from a control group  with correct decisions for patients in 72% and
controls in 92% of the cases. 
[[#cushing|Kosilek et al  (2013)]] used the image graph features for 
diagnosing Cushing&#45;syndrome in female patients with an overall accuracy
of 92%. Both studies also required manual corrections of landmark placement.


&#61;&#61;&#61; Object Recognition &#61;&#61;&#61;



&#61;&#61;&#61;&#61; Cluttered Scenes &#61;&#61;&#61;&#61;



Further challenges in recognizing objects are cluttered background and partial occlusions.  EGM can deal with that due to the localized feature representation in the
graphs, which limits the influence of background on the object representation and permits
to ignore the occluded parts.  However, it is still necessary to find the object in the
scene despite the clutter and to decide which parts might be occluded. It has been shown that
EGM is fairly robust to clutter and occlusions in finding the object, thus localizing a
known object in a scene is possible with relatively high reliability.  Deciding which
regions are occluded can be done based on the local similarity values of the model jets
with the corresponding image jets, visible regions have high similarity values while
occluded regions have low similarity values.  Thus a simple thresholding operation can
already be suggestive as to which nodes are occluded.  This can be further refined by
regularization, e.g. one can consider a node as occluded if at least three out of four of
its neighbors are occluded as well, even if its similarity value is above threshold. An object is then considered recognized with confidence if a certain minimal area of it is considered visible and if the average similarity within that area is high enough.
This algorithm gave a performance of 80% correct recognition rate in cluttered scenes
with extensive occlusions [[#clutteredScenes|(Wiskott and von der Malsburg, 1993)]].  

Further improvement can be achieved if one uses matches of other objects.  For instance,
if the system has recognized a known object in the foreground, it is clear that any other
candidate object matched to the scene must be occluded in the area of the known object in
the foreground.  With that logic one can work a scene from front to back with high
reliability.  An algorithm using this synergy between object matches in a scene yielded a
performance of 21 correctly interpreted scenes out of 30 scenes tested.  For 3 scenes the
objects were recognized correctly but the estimated occlusion order was wrong.  In the
remaining 6 scenes, 3 models were accepted although the corresponding object was not
present, and 4 objects that were present were not recognized.  Overall 96.7% of the
objects were recognized correctly [[#clutteredScenes|(Wiskott and von der Malsburg, 1993)]].


&#61;&#61;&#61;&#61; Gesture Recognition &#61;&#61;&#61;&#61;


[[#triesch-gestures|Triesch and von der Malsburg (2001)]] have extended the method of EBGM 
to &#39;&#39;compound features&#39;&#39; and have applied it to hand gesture recognition. Compound features
augment the jet vectors by extra dimensions provided by color features and Gabor functions
applied to color channels. The similarity functions are adjusted accordingly.
An additional optimization step accounts for rotation in the image plane. Bunches are 
built from graphs of the same gesture in front of uniform backgrounds 
of varying brightness. The bunch graph similarity function selects the best
local match in the bunch graph. This leads to considerable invariance against structured
backgrounds, because the local experts may match to different brightness values
across nodes. The system has been used successfully to guide pick&#45;and&#45;place behavior 
by a robot [[#gripsee|(Becker et al., 2001)]].


&#61;&#61;&#61;&#61; Generalized Object Graphs &#61;&#61;&#61;&#61;


[[File:ElasticBunchGraphMatching_GeneralizedObjectGraphs_CompositionOfModelAndBunchGraphs2.jpg|450px|thumb|right|Composition of parquet graph features into model or bunch graphs: Upon presentation of an image (first column) a model or bunch graph (fourth column) is composed (arrows) out of memorized model parquet graph features (third column) that have a superthreshold similarity to the image features (second column). The best model candidate is given in the fifth column. The reconstruction from the model graph (column six) demonstrates that the model graph describes the object in the input image well. Reconstruction and model candidate contain the same object in the same pose, which is slightly different from the one in the input image. The nodes of the image and model parquet graphs that are known to reside in the background are excluded from calculation and are painted in grey.]]

EGM and EBGM have proven to perform well when the objects to be recognized have a common structure, such as faces in frontal pose. These share a common set of fiducial points, so&#45;called &#39;&#39;landmarks&#39;&#39;, such as the corners of the eyes, of the mouth, the tip of the nose and so forth. However, for the recognition of a larger repertoire of 
arbitrary objects such a set of landmarks cannot be defined. Therefore, EGM and EBGM, in their pure form, encounter problems when applied to the task of invariant visual object recognition. Here, object models are required to be dynamic with respect both to shape and attributed features in order to cope with object variations like changes in identity, pose, illumination and so on. Graph&#45;like structures, and model graphs in particular, inherently fulfill this requirement as they allow for &#39;&#39;compositionality&#39;&#39;, the ability to construct mental representations, hierarchically, in terms of parts and their relations. What needs to be specified are the elementary parts, the &#39;&#39;constituents&#39;&#39;, and the rules of composition. [[#westphal-feat|Westphal and Würtz (2009)]] have employed small regular graphs of 3x3 Gabor jets on a ten times ten pixel grid as constituents (&amp;lt&#59;figref&amp;gt&#59;ElasticBunchGraphMatching_GeneralizedObjectGraphs_CompositionOfModelAndBunchGraphs2.jpg&amp;lt&#59;/figref&amp;gt&#59;). These are termed &#39;&#39;parquet graphs&#39;&#39;, inspired by the look of ready&#45;to&#45;lay parquet tiles. They can be used as local image features and can be aggregated to larger graph entities that can represent entire objects. As a useful initialization of E(B)GM [[#westphal-feat|Westphal and Würtz (2009)]] have proposed that fast feature&#45;based preprocessing is applied as far as it goes by excluding as many objects as possible and that only a manageable number of ambiguous cases really worth their while, they have called them &#39;&#39;model candidates&#39;&#39;, are subjected to the more accurate correspondence&#45;based processing of EGM. In the course of this, model graphs emerge that represent the object in the input image well.

The desired preselection of model candidates is achieved by a neural network, called the &#39;&#39;preselection network&#39;&#39;. Given an input image, it computes an &#39;&#39;activation&#39;&#39; for each model image based on the number of feature coincidences. The activations scale proportionally with the probability that the input and the respective model image contain the same object under similar object variations, i.e., the output neurons respond to particular object views. The pre&#45;selection of model candidates is achieved through picking a manageable number of highly activated models. As a second result the preselection network provides a set of coincidental features for each model candidate. The rules to compose these parquet graphs into larger graph entities are provided by their memorized spatial arrangements. Therefore, for each model candidate, the coincidental parquet graph features can be aggregated to an image and a model graph, respectively. These are matched onto each other using EGM. The model graph that yields the highest similarity value is considered to be the best approximation of the object contained in the input image. The method is also capable of aggregating model parquet graph features to &#39;&#39;bunch graphs&#39;&#39;. To this end, the model parquet graphs are not just taken from one but from a carefully chosen selection of highly activated model candidates, e.g., from model candidates that belong to the same semantic category. Thus, in a similar way, the method exploits the combinatorics of model parquet graph features like EBGM exploits the combinatorics of the jets in the bunches.

The method has achieved high recognition rates on identity, object category, pose, and illumination type, provided that the individual object variations are sufficiently covered by learning examples. Unlike many other models this technique can also cope with varying background, multiple objects, and partial occlusion [[#westphal-feat| (Westphal and Würtz, 2009)]].


&#61;&#61; References &#61;&#61;


&#42;
&#42;
&#42;
&#42;
&#42;
&#42;
&#42;
&#42;
&#42;
&#42;
&#42;
&#42;
&#42;
&#42;&lt;span id&#61;&quot;WiskottVonDerMalsburg1996&quot;&gt;&lt;/span&gt;Wiskott, L and von der Malsburg, C (1996) &lt;a href&#61;&quot;http&#58;//www.cs.utexas.edu/users/nn/web&#45;pubs/htmlbook96/&quot; target&#61;&quot;_blank&quot;&gt;Face Recognition by Dynamic Link Matching&lt;/a&gt;. Chapter 11 in Lateral Interactions in the Cortex&#58; Structure and Function. Eds. Sirosh, J and Miikkulainen, R and Choe, Y. Publ. The UTCS Neural Networks Research Group, Austin, TX.&lt;/span&gt;
Category:Computer Vision Category:Biometrics garbage]
The same method can be applied to recognition in different illumination
situations. With a model gallery of 100 identities from the CAS&amp;&#35;45&#59;PEAL 
database[[#cas-peal-smc| (Gao et al., 2008)]] and 9 different illuminations
the recognition rate for standard jet similarity was 82.7%, with combined
similarities 88.9%.


&amp;&#35;61&#59;&amp;&#35;61&#59;&amp;&#35;61&#59; Face Analysis &amp;&#35;61&#59;&amp;&#35;61&#59;&amp;&#35;61&#59;



&amp;&#35;61&#59;&amp;&#35;61&#59;&amp;&#35;61&#59;&amp;&#35;61&#59; Facial Features &amp;&#35;61&#59;&amp;&#35;61&#59;&amp;&#35;61&#59;&amp;&#35;61&#59;


[[File:Wiskott-1997-PatternRecognition-Fig02.png|300px|thumb|right|Face analysis with EBGM. A face bunch graph (middle) is matched onto a new face image (left). For each node (small black dots) there is a local expert in the bunch graph (pointed at by the arrow heads), from which a so called phantom face can be synthesized and abstract features such as gender, glasses, or beard can be inferred. In this case the phantom face looks Caucasian even though the original face is half Asian, because the bunch graph only includes Caucasian faces. Reproduced from [[#phantomfaces|(Wiskott, 1997)]] with permission from &amp;lt&#59;a href&amp;&#35;61&#59;&amp;quot&#59;http&amp;&#35;58&#59;//www.elsevier.com/journal&amp;&#35;45&#59;authors/author&amp;&#35;45&#59;rights&amp;&#35;45&#59;and&amp;&#35;45&#59;responsibilities&amp;quot&#59; target&amp;&#35;61&#59;&amp;quot&#59;_blank&amp;quot&#59;&amp;gt&#59;Elsevier&amp;lt&#59;/a&amp;gt&#59;.&amp;&#35;93&#59;&amp;&#35;93&#59;

As a result, EBGM yields 
&amp;&#35;42&#59; an overall similarity value, which can be used to decide whether the image actually contains an object of the given class, e.g. a face,
&amp;&#35;42&#59; locations of the nodes in the image, which form the basis for creating an image graph used for storage or recognition, and
&amp;&#35;42&#59; for any node the best fitting instance in the bunch graph, referred to as local expert.
For faces, the latter can be used for an analysis of some abstract properties of the face.  For instance, if most local experts come from female faces, it is likely that the face on the new image is female as well&amp;&#35;59&#59; if most local experts in the lower half of the graph come from bearded faces, it is likely that the person has a beard&amp;&#35;59&#59; if most local experts in the upper half of the graph come from faces with glasses, it is likely that the person wears glasses (&amp;amp&#59;lt&amp;&#35;59&#59;figref&amp;amp&#59;gt&amp;&#35;59&#59;Wiskott&amp;amp&#59;&amp;&#35;35&#59;45&amp;&#35;59&#59;1997&amp;amp&#59;&amp;&#35;35&#59;45&amp;&#35;59&#59;PatternRecognition&amp;amp&#59;&amp;&#35;35&#59;45&amp;&#35;59&#59;Fig02.png&amp;amp&#59;lt&amp;&#35;59&#59;/figref&amp;amp&#59;gt&amp;&#35;59&#59;). On a mixed set of 111 faces this method has achieved 92% correct discrimination between male and female faces, 94% correct detection of a beard vs no beard, and 96% correct detection of glasses vs no glasses [[#phantomfaces|(Wiskott, 1997)]].


&amp;&#35;61&#59;&amp;&#35;61&#59;&amp;&#35;61&#59;&amp;&#35;61&#59; Medical Analysis &amp;&#35;61&#59;&amp;&#35;61&#59;&amp;&#35;61&#59;&amp;&#35;61&#59;


The image graph found by EBGM in a facial image is a representation of that image. It can be converted to a feature vector by simply listing all
vertex and edge labels. This vector can be used as input for classification systems. 
Such classifiers can provide suggestions for
medical diagnoses for ailments which influence the appearance of the face. 
[[#syndrome-clin|Böhringer et al. (2011)]] have used this approach to distinguish 
among eleven genetically caused syndromes. Five of them could be identified in
100% of the cases, the overall correct recognition rate was 60%. The classifier
used was Local Discriminant Analysis after Principal Component Analysis and dimension 
reduction. These high classification rates
could, however, only be  achieved with some manual corrections on the placement 
of the graph nodes on the image. Finding good landmark locations automatically remains
a research challenge.
In a further study [[#acromegaly-2011|(Schneider et al, 2011)]] the method was applied to distinguishing 
patients with acromegaly from a control group  with correct decisions for patients in 72% and
controls in 92% of the cases. 
[[#cushing|Kosilek et al  (2013)]] used the image graph features for 
diagnosing Cushing&amp;&#35;45&#59;syndrome in female patients with an overall accuracy
of 92%. Both studies also required manual corrections of landmark placement.


&amp;&#35;61&#59;&amp;&#35;61&#59;&amp;&#35;61&#59; Object Recognition &amp;&#35;61&#59;&amp;&#35;61&#59;&amp;&#35;61&#59;



&amp;&#35;61&#59;&amp;&#35;61&#59;&amp;&#35;61&#59;&amp;&#35;61&#59; Cluttered Scenes &amp;&#35;61&#59;&amp;&#35;61&#59;&amp;&#35;61&#59;&amp;&#35;61&#59;



Further challenges in recognizing objects are cluttered background and partial occlusions.  EGM can deal with that due to the localized feature representation in the
graphs, which limits the influence of background on the object representation and permits
to ignore the occluded parts.  However, it is still necessary to find the object in the
scene despite the clutter and to decide which parts might be occluded. It has been shown that
EGM is fairly robust to clutter and occlusions in finding the object, thus localizing a
known object in a scene is possible with relatively high reliability.  Deciding which
regions are occluded can be done based on the local similarity values of the model jets
with the corresponding image jets, visible regions have high similarity values while
occluded regions have low similarity values.  Thus a simple thresholding operation can
already be suggestive as to which nodes are occluded.  This can be further refined by
regularization, e.g. one can consider a node as occluded if at least three out of four of
its neighbors are occluded as well, even if its similarity value is above threshold. An object is then considered recognized with confidence if a certain minimal area of it is considered visible and if the average similarity within that area is high enough.
This algorithm gave a performance of 80% correct recognition rate in cluttered scenes
with extensive occlusions [[#clutteredScenes|(Wiskott and von der Malsburg, 1993)]].  

Further improvement can be achieved if one uses matches of other objects.  For instance,
if the system has recognized a known object in the foreground, it is clear that any other
candidate object matched to the scene must be occluded in the area of the known object in
the foreground.  With that logic one can work a scene from front to back with high
reliability.  An algorithm using this synergy between object matches in a scene yielded a
performance of 21 correctly interpreted scenes out of 30 scenes tested.  For 3 scenes the
objects were recognized correctly but the estimated occlusion order was wrong.  In the
remaining 6 scenes, 3 models were accepted although the corresponding object was not
present, and 4 objects that were present were not recognized.  Overall 96.7% of the
objects were recognized correctly [[#clutteredScenes|(Wiskott and von der Malsburg, 1993)]].


&amp;&#35;61&#59;&amp;&#35;61&#59;&amp;&#35;61&#59;&amp;&#35;61&#59; Gesture Recognition &amp;&#35;61&#59;&amp;&#35;61&#59;&amp;&#35;61&#59;&amp;&#35;61&#59;


[[#triesch-gestures|Triesch and von der Malsburg (2001)]] have extended the method of EBGM 
to &amp;&#35;39&#59;&amp;&#35;39&#59;compound features&amp;&#35;39&#59;&amp;&#35;39&#59; and have applied it to hand gesture recognition. Compound features
augment the jet vectors by extra dimensions provided by color features and Gabor functions
applied to color channels. The similarity functions are adjusted accordingly.
An additional optimization step accounts for rotation in the image plane. Bunches are 
built from graphs of the same gesture in front of uniform backgrounds 
of varying brightness. The bunch graph similarity function selects the best
local match in the bunch graph. This leads to considerable invariance against structured
backgrounds, because the local experts may match to different brightness values
across nodes. The system has been used successfully to guide pick&amp;&#35;45&#59;and&amp;&#35;45&#59;place behavior 
by a robot [[#gripsee|(Becker et al., 2001)]].


&amp;&#35;61&#59;&amp;&#35;61&#59;&amp;&#35;61&#59;&amp;&#35;61&#59; Generalized Object Graphs &amp;&#35;61&#59;&amp;&#35;61&#59;&amp;&#35;61&#59;&amp;&#35;61&#59;


[[File:ElasticBunchGraphMatching_GeneralizedObjectGraphs_CompositionOfModelAndBunchGraphs2.jpg|450px|thumb|right|Composition of parquet graph features into model or bunch graphs: Upon presentation of an image (first column) a model or bunch graph (fourth column) is composed (arrows) out of memorized model parquet graph features (third column) that have a superthreshold similarity to the image features (second column). The best model candidate is given in the fifth column. The reconstruction from the model graph (column six) demonstrates that the model graph describes the object in the input image well. Reconstruction and model candidate contain the same object in the same pose, which is slightly different from the one in the input image. The nodes of the image and model parquet graphs that are known to reside in the background are excluded from calculation and are painted in grey.]]

EGM and EBGM have proven to perform well when the objects to be recognized have a common structure, such as faces in frontal pose. These share a common set of fiducial points, so&amp;&#35;45&#59;called &amp;&#35;39&#59;&amp;&#35;39&#59;landmarks&amp;&#35;39&#59;&amp;&#35;39&#59;, such as the corners of the eyes, of the mouth, the tip of the nose and so forth. However, for the recognition of a larger repertoire of 
arbitrary objects such a set of landmarks cannot be defined. Therefore, EGM and EBGM, in their pure form, encounter problems when applied to the task of invariant visual object recognition. Here, object models are required to be dynamic with respect both to shape and attributed features in order to cope with object variations like changes in identity, pose, illumination and so on. Graph&amp;&#35;45&#59;like structures, and model graphs in particular, inherently fulfill this requirement as they allow for &amp;&#35;39&#59;&amp;&#35;39&#59;compositionality&amp;&#35;39&#59;&amp;&#35;39&#59;, the ability to construct mental representations, hierarchically, in terms of parts and their relations. What needs to be specified are the elementary parts, the &amp;&#35;39&#59;&amp;&#35;39&#59;constituents&amp;&#35;39&#59;&amp;&#35;39&#59;, and the rules of composition. [[#westphal-feat|Westphal and Würtz (2009)]] have employed small regular graphs of 3x3 Gabor jets on a ten times ten pixel grid as constituents (&amp;amp&#59;lt&amp;&#35;59&#59;figref&amp;amp&#59;gt&amp;&#35;59&#59;ElasticBunchGraphMatching_GeneralizedObjectGraphs_CompositionOfModelAndBunchGraphs2.jpg&amp;amp&#59;lt&amp;&#35;59&#59;/figref&amp;amp&#59;gt&amp;&#35;59&#59;). These are termed &amp;&#35;39&#59;&amp;&#35;39&#59;parquet graphs&amp;&#35;39&#59;&amp;&#35;39&#59;, inspired by the look of ready&amp;&#35;45&#59;to&amp;&#35;45&#59;lay parquet tiles. They can be used as local image features and can be aggregated to larger graph entities that can represent entire objects. As a useful initialization of E(B)GM [[#westphal-feat|Westphal and Würtz (2009)]] have proposed that fast feature&amp;&#35;45&#59;based preprocessing is applied as far as it goes by excluding as many objects as possible and that only a manageable number of ambiguous cases really worth their while, they have called them &amp;&#35;39&#59;&amp;&#35;39&#59;model candidates&amp;&#35;39&#59;&amp;&#35;39&#59;, are subjected to the more accurate correspondence&amp;&#35;45&#59;based processing of EGM. In the course of this, model graphs emerge that represent the object in the input image well.

The desired preselection of model candidates is achieved by a neural network, called the &amp;&#35;39&#59;&amp;&#35;39&#59;preselection network&amp;&#35;39&#59;&amp;&#35;39&#59;. Given an input image, it computes an &amp;&#35;39&#59;&amp;&#35;39&#59;activation&amp;&#35;39&#59;&amp;&#35;39&#59; for each model image based on the number of feature coincidences. The activations scale proportionally with the probability that the input and the respective model image contain the same object under similar object variations, i.e., the output neurons respond to particular object views. The pre&amp;&#35;45&#59;selection of model candidates is achieved through picking a manageable number of highly activated models. As a second result the preselection network provides a set of coincidental features for each model candidate. The rules to compose these parquet graphs into larger graph entities are provided by their memorized spatial arrangements. Therefore, for each model candidate, the coincidental parquet graph features can be aggregated to an image and a model graph, respectively. These are matched onto each other using EGM. The model graph that yields the highest similarity value is considered to be the best approximation of the object contained in the input image. The method is also capable of aggregating model parquet graph features to &amp;&#35;39&#59;&amp;&#35;39&#59;bunch graphs&amp;&#35;39&#59;&amp;&#35;39&#59;. To this end, the model parquet graphs are not just taken from one but from a carefully chosen selection of highly activated model candidates, e.g., from model candidates that belong to the same semantic category. Thus, in a similar way, the method exploits the combinatorics of model parquet graph features like EBGM exploits the combinatorics of the jets in the bunches.

The method has achieved high recognition rates on identity, object category, pose, and illumination type, provided that the individual object variations are sufficiently covered by learning examples. Unlike many other models this technique can also cope with varying background, multiple objects, and partial occlusion [[#westphal-feat| (Westphal and Würtz, 2009)]].


&amp;&#35;61&#59;&amp;&#35;61&#59; References &amp;&#35;61&#59;&amp;&#35;61&#59;


&amp;&#35;42&#59;
&amp;&#35;42&#59;
&amp;&#35;42&#59;
&amp;&#35;42&#59;
&amp;&#35;42&#59;
&amp;&#35;42&#59;
&amp;&#35;42&#59;
&amp;&#35;42&#59;
&amp;&#35;42&#59;
&amp;&#35;42&#59;
&amp;&#35;42&#59;
&amp;&#35;42&#59;
&amp;&#35;42&#59;
&amp;&#35;42&#59;&amp;lt&#59;span id&amp;&#35;61&#59;&amp;quot&#59;WiskottVonDerMalsburg1996&amp;quot&#59;&amp;gt&#59;&amp;lt&#59;/span&amp;gt&#59;Wiskott, L and von der Malsburg, C (1996) &amp;lt&#59;a href&amp;&#35;61&#59;&amp;quot&#59;http&amp;&#35;58&#59;//www.cs.utexas.edu/users/nn/web&amp;&#35;45&#59;pubs/htmlbook96/&amp;quot&#59; target&amp;&#35;61&#59;&amp;quot&#59;_blank&amp;quot&#59;&amp;gt&#59;Face Recognition by Dynamic Link Matching&amp;lt&#59;/a&amp;gt&#59;. Chapter 11 in Lateral Interactions in the Cortex&amp;&#35;58&#59; Structure and Function. Eds. Sirosh, J and Miikkulainen, R and Choe, Y. Publ. The UTCS Neural Networks Research Group, Austin, TX.&amp;lt&#59;/span&amp;gt&#59;

Category:Computer Vision Category:Biometrics

⚠️ **GitHub.com Fallback** ⚠️