Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database): December 2015

December 31, 2015

R and Datascience

I found this site very interesting datascienceplus

Using R author has categorized

Data Loading
Data Management
Visualization
Stats

This really helps to align R learning accordingly. I am trying to repeat the pattern for my R learning's

Happy Learning and Happy New Year 2016!!!

December 28, 2015

Token - Sequence of characters, chopping functions and throwing tokens certain characters
Type - Equivalence class of Tokens
Term - Type in IR Dictionary
Term Frequency - Number of times term t appears in document d
Log Frequency - (1+ log(tf), if tf > 0)
Document Frequency - Number of documents in collection the term appears
Inverse Document Frequency - Log(N/Dft) - (Number of documents in collection / Number of documents term t appears)
IDF = log[Total Docs / Docs contain the term]
Stemming - Crude heuristics chopping end of words. Collapse derivationally related words. Stemming increases recall because morphological variation of words are collapsed into single token enabling higher chances of retrieval
Lemmatization - Return to base word or dictionary form of word. Collapse different inflectional form of words.
Skip Pointers - post of length N, Sqrt(N) evenly placed pointers
Positional Index - Term: DocId <Pos1, Pos2>
Inverted Index - is a dictionary mapping each word token to a set of file names

Boolean Retrieval (AND, OR, NOT)

Easy to Implement
Computationally efficient
Expressiveness and Clarity

Cons of Boolean Retrieval

No Ranking
No Weighing

Discounted Cumulative Gain (DCG)

Highly relevant docs are more useful when they appear earlier in search results list
Highly relevant docs are more useful than marginally relevant docs

DCG - 2 power (relevance-1) / log2(i+1)
NDCG = DCG / IDCG

HITS - Hyperlink induced Topic Search

Authorities - Direct answer to information need. Homepage of microsoft.com
Hub - Good Links to pages answering the information
Wikipedia good example for both Hub & Authority

Happy Learning!!!

December 24, 2015

T-Test

T-Test

- Developed in 1908 by William Gosset
- T-test referred as Student's t-test
- Mu, Sigma (Indicate Population parameters)
- X-Dash, S represent mean and standard deviation of sample

Hypothesis Tests in R

One Sample T-Test

Function - t.test example in R

Happy Learning!!!

December 23, 2015

Hypothesis Testing Basics

After exams I understood my improvement areas in terms of learning. Predominantly these are crucial chapters

- P test using R Programming
- P test using Python Programming
- Hypothesis test using R Programming
- Hypothesis test using Python Programming

I glanced through couple of sites, Bookmarking some of pointers

Normal Distribution Properties

Key Pointers
- Normal distribution unimodal and symmetric
- Mean (Mu)
- Standard Deviation (Sigma)
- 99.7% < 3 Sigma
- 95% < 2 Sigma
- Z > 2 (Unusual)
- pnorm (percentile of observation)
- Qnorm for quantile or cutoff values

Key Pointers
- Creating Null and Alternate Hypothesis conditions
- Identifying sample space, standard error, population mean, standard deviation from input question
- Computing P value

Happy Learning!!!

Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database)

December 31, 2015

R and Datascience

December 28, 2015

Information Retrieval Notes

December 24, 2015

T-Test

December 23, 2015

Hypothesis Testing Basics

Git Code Repository

About Me

What is your Expertise

Search This Blog

Translate

About Me and Disclaimer

Labels

Data Science Good Reads

Cloud, Datacentre, BigData and NOSQL Blogs

SQL Links

Archecture Blog List

Programming Problems

Startup - Reads

Perl-Python-Ruby-Linux-Oracle

Management + Leadership Blogs

Research Papers & Podcasts

My Wordpress

Interesting Reads

Useful Links - C# and .NET

Java, Selenium, QTP and Test Tools Learning

Agile Testing

Reverse Logistics Reads

Biztalk Blogs

MS BI Links

Process - Learnt it :)

Usability Guidelines - Building Better Sites

.NET Test Tools and Other Interesting Reads

Review Checklist

Blog Archive

Live Traffic

Total Pageviews

Popular Posts