Token - Sequence of characters, chopping functions and throwing tokens certain characters
Type - Equivalence class of Tokens
Term - Type in IR Dictionary
Term Frequency - Number of times term t appears in document d
Log Frequency - (1+ log(tf), if tf > 0)
Document Frequency - Number of documents in collection the term appears
Inverse Document Frequency - Log(N/Dft) - (Number of documents in collection / Number of documents term t appears)
IDF = log[Total Docs / Docs contain the term]
Stemming - Crude heuristics chopping end of words. Collapse derivationally related words. Stemming increases recall because morphological variation of words are collapsed into single token enabling higher chances of retrieval
Lemmatization - Return to base word or dictionary form of word. Collapse different inflectional form of words.
Skip Pointers - post of length N, Sqrt(N) evenly placed pointers
Positional Index - Term: DocId <Pos1, Pos2>
Inverted Index - is a dictionary mapping each word token to a set of file names
Boolean Retrieval (AND, OR, NOT)
NDCG = DCG / IDCG
HITS - Hyperlink induced Topic Search
Happy Learning!!!
Type - Equivalence class of Tokens
Term - Type in IR Dictionary
Term Frequency - Number of times term t appears in document d
Log Frequency - (1+ log(tf), if tf > 0)
Document Frequency - Number of documents in collection the term appears
Inverse Document Frequency - Log(N/Dft) - (Number of documents in collection / Number of documents term t appears)
IDF = log[Total Docs / Docs contain the term]
Stemming - Crude heuristics chopping end of words. Collapse derivationally related words. Stemming increases recall because morphological variation of words are collapsed into single token enabling higher chances of retrieval
Lemmatization - Return to base word or dictionary form of word. Collapse different inflectional form of words.
Skip Pointers - post of length N, Sqrt(N) evenly placed pointers
Positional Index - Term: DocId <Pos1, Pos2>
Inverted Index - is a dictionary mapping each word token to a set of file names
Boolean Retrieval (AND, OR, NOT)
- Easy to Implement
- Computationally efficient
- Expressiveness and Clarity
- No Ranking
- No Weighing
- Highly relevant docs are more useful when they appear earlier in search results list
- Highly relevant docs are more useful than marginally relevant docs
NDCG = DCG / IDCG
HITS - Hyperlink induced Topic Search
- Authorities - Direct answer to information need. Homepage of microsoft.com
- Hub - Good Links to pages answering the information
- Wikipedia good example for both Hub & Authority
Happy Learning!!!
No comments:
Post a Comment