Latent Semantic Indexing - HestiaProject/PAxSPL GitHub Wiki

Definition:

An indexing and retrieval method that uses a mathematical technique to identify patterns in the relationships between the terms and concepts contained in an unstructured collection of text.

Variations:

Latent semantic analysis (LSA):

Different name for the same technique;

Semantic hashing:

Documents are mapped to memory addresses by means of a neural network in such a way that semantically similar documents are located at nearby addresses.

Priority Order:

Extraction > Categorize > Group

Inputs:

Outputs:

  • Occurrence matrix
  • Rank lowering

Examples:

Tools:

Related Techniques:

Recommended situations

Latent Semantic Indexing is recommended when program elements (such as classes, methods, etc.) have meaningful names ("attribute" instead of "atr" or "home" instead of "hm"). Besides that, is highly recommended to use this technique in products well documented.

Not Recommended situations

A Information Retrieval Technique cannot achieve quality results when applied to products with no documentation and no meaningful identifiers names. For that reason we don't recommend the use of Latent Semantic Indexing or any other Information Retrieval Technique in those situations.