Token - Sequence of characters, chopping functions and throwing tokens certain characters Type - Equivalence class of Tokens Term - Type in IR Dictionary Term Frequency - Number of times term t appears in document d Log Frequency - (1+ log(tf), if tf > 0) Document Frequency - Number of documents in collection the term appears Inverse Document Frequency - Log(N/Dft) - (Number of documents in collection / Number of documents term t appears) IDF = log[Total Docs / Docs contain the term] Stemming - Crude heuristics chopping end of words. Collapse derivationally related words. Stemming increases recall because morphological variation of words are collapsed into single token enabling higher chances of retrieval Lemmatization - Return to base word or dictionary form of word. Collapse different inflectional form of words. Skip Pointers - post of length N, Sqrt(N) evenly placed pointers Positional Index - Term: DocId <Pos1, Pos2> Inverted Index - is a dictionary mapping each word token to a set of file names
Boolean Retrieval (AND, OR, NOT)
Easy to Implement
Computationally efficient
Expressiveness and Clarity
Cons of Boolean Retrieval
No Ranking
No Weighing
Discounted Cumulative Gain (DCG)
Highly relevant docs are more useful when they appear earlier in search results list
Highly relevant docs are more useful than marginally relevant docs
- Developed in 1908 by William Gosset
- T-test referred as Student's t-test
- Mu, Sigma (Indicate Population parameters)
- X-Dash, S represent mean and standard deviation of sample
After exams I understood my improvement areas in terms of learning. Predominantly these are crucial chapters
- P test using R Programming
- P test using Python Programming
- Hypothesis test using R Programming
- Hypothesis test using Python Programming
I glanced through couple of sites, Bookmarking some of pointers
Normal Distribution Properties
Key Pointers
- Normal distribution unimodal and symmetric
- Mean (Mu)
- Standard Deviation (Sigma)
- 99.7% < 3 Sigma
- 95% < 2 Sigma
- Z > 2 (Unusual)
- pnorm (percentile of observation)
- Qnorm for quantile or cutoff values
Key Pointers
- Creating Null and Alternate Hypothesis conditions
- Identifying sample space, standard error, population mean, standard deviation from input question
- Computing P value