Positional Index vs Inverted Index - nglthu/infoRetrieval GitHub Wiki
Positional index:
Explain:
•Each term in the vocabulary, store postings of the form docID (position1, position2,..) •Store each term in the form of
<number of docs containing term;
Doc1: position1, position2…;
Doc2: position1, position2…;
etc.>
Example:
To, 993:
(1,6:(7,18,33,72,..))
The word to has a document frequency 993 and occurs 6 times in document 1 at positions 7,18,33, 72 and etc.
- Phrase queries, use a merge algorithm recursively at the document level.
Pros/Cons:
Fast indexing
Less efficient query's
Inverted index
Explain
- A data structure that maps terms back to the parts of a document in which they occur
- For each term T, store a list of all documents containing term T. It is a list of all unique word that appears in any document.
Example:
Document 1: The quick brown fox jumped
Document 2: Quick brown foxes
Term Doc1 Doc2
The x
quick x
brown x x
fox x
jumped x
Quick x
foxes x
Pros/Cons:
Fast query
Slower indexing