Latent Semantic Indexing - HestiaProject/PAxSPL GitHub Wiki
Definition:
An indexing and retrieval method that uses a mathematical technique to identify patterns in the relationships between the terms and concepts contained in an unstructured collection of text.
Variations:
Latent semantic analysis (LSA):
Different name for the same technique;
Semantic hashing:
Documents are mapped to memory addresses by means of a neural network in such a way that semantically similar documents are located at nearby addresses.
Priority Order:
Extraction > Categorize > Group
Inputs:
Outputs:
- Occurrence matrix
- Rank lowering
Examples:
- (AL-msie'deen et al. 2013)
- (Xue et al. 2012)
- (AL-msie'deen et al. 2012)
- (Eyal-Salman et al. 2013a)
- (Eyal-Salman et al. 2013b)
- (Eyal-Salman et al. 2013c)
- (Maazoun et al. 2014a)
- (Maazoun et al. 2014b)
- (Alves et al. 2008)
Tools:
Related Techniques:
Recommended situations
Latent Semantic Indexing is recommended when program elements (such as classes, methods, etc.) have meaningful names ("attribute" instead of "atr" or "home" instead of "hm"). Besides that, is highly recommended to use this technique in products well documented.
Not Recommended situations
A Information Retrieval Technique cannot achieve quality results when applied to products with no documentation and no meaningful identifiers names. For that reason we don't recommend the use of Latent Semantic Indexing or any other Information Retrieval Technique in those situations.