May 20, 2016
Day #24 - Python Code Examples
Examples for - for loop, while loop, dictionary, function examples and plotting graphs
Happy Learning!!
Labels:
Data Science Tips
Day #23 - Newton Raphson - Gradient Descent
Newton Raphson
Optimal Solutions
Happy Learning!!!
- Optimization Technique
- Newton's method tries to find a point x satisfying f'(x) = 0
- Between two successive approximations
- Stop iteration when difference between x(n+1) and x(n) is close to zero
- x(n+1) = x(n) - (f(x)/f'(x))
- Choose suitable value for x0
- Works for convex function
- x(n+1) = x(n) - af'(x)
- a - learning rate
- Gradient descent tries to find such a minimum x by using information from the first derivative of f
- Both gradient and netwon raphson are similar the update rule is different
Optimal Solutions
- Strategy to get bottom of the valley, go-down in steepest slope
- Measures local error function with respect to the parameter vector
- Once gradient zero you have reached the minimum
- Learning rate, steps to converge to a minimum
- How to converge to Global vs Local Minimum
- Gradient descent does guarantee local minimum
- Cost function elongated/circular decides on the convergence
Happy Learning!!!
Labels:
Data Science,
Data Science Tips
May 14, 2016
Day #22 - Data science - Maths Basics
Eigen Vector - Vector along which there is no change in direction
Eigen Value - Amount of Scaling factor defined by Eigen value
Eigen Value Decomposition - Only Square matrix can be performed Eigen Decomposition
Trace - Sum of Eigen Values
Rank of A - Number of Non-Zero Eigen Values
SVD - Singular Value Decomposition
- Swiss Army Knife of Linear Algebra
- SVD - for Stock market Prediction
- SVD - for Data Compression
- SVD - to model sentiments
- SVD is Greatest Gift of Linear Algebra to Data Science
- Square Root of (Eigen Values of AtA) - A Transpose A, becomes Singular Value of
Happy Learning!!! (Revise - Relearn - Practice)
Labels:
Data Science Tips,
Maths
May 09, 2016
Day #21 - Data Science - Maths Basics - Vectors and Matrices
Matrix - Combination of rows and columns
Check for Linear Dependence - R2 = R2 - 2R1, When one of the rows is all zeros it is linearly dependent
Span - Linear combination of vectors
Rank - Linearly Independent set
Good Related Read - Span
Vector Space - Space of vectors, collection of many vectors
If V,W belong to space, V+W also belongs to space, multiplied vector will lie in R Square
If the determinant is non-zero, then the vectors are linearly independent. Otherwise, they are linearly dependent
Vector space properties
Vector Space V, Subset W. W is called subspace of V
Properties
W is subspace in following conditions
v = r1v1+ r2v2+... rkvk
v1,v2 distinct vectors from S, r belongs to R
Basis - Linearly Independent spanning set. Vector space is called basis if every vector in the vector space is a linear combination of set. All basis for vector V same cardinality
Null Space, Row Space, Column Space
Let A be m x n matrix
Orthogonality - Linearly Independent, perpendicular will be linearly independent
Orthogonal matrix will always have determinant +/-1
Lectures - Link
Check for Linear Dependence - R2 = R2 - 2R1, When one of the rows is all zeros it is linearly dependent
Span - Linear combination of vectors
Rank - Linearly Independent set
Good Related Read - Span
Vector Space - Space of vectors, collection of many vectors
If V,W belong to space, V+W also belongs to space, multiplied vector will lie in R Square
If the determinant is non-zero, then the vectors are linearly independent. Otherwise, they are linearly dependent
Vector space properties
- Commutative x+y = y+x
- Associative (x+y)+z = x+(y+z)
- Origin vector - Vector will all zeros, 0+x = x+0 = x
- Additive (Inverse) - For every X there exists -x such that x+(-x) = 0
- Distributivity of scalar sum, r(x+s) = rx+rs
- Distributivity of vector sum, r(x+s) = rx+rs
- Identity multiplication, 1*x = x
Vector Space V, Subset W. W is called subspace of V
Properties
W is subspace in following conditions
- Zero vector belongs to W
- if u and v are vectors, u+v is in W (closure under +)
- if v is any vector in W, and c is any real number, c.v is in W
v = r1v1+ r2v2+... rkvk
v1,v2 distinct vectors from S, r belongs to R
Basis - Linearly Independent spanning set. Vector space is called basis if every vector in the vector space is a linear combination of set. All basis for vector V same cardinality
Null Space, Row Space, Column Space
Let A be m x n matrix
- Null Space - All solutions for Ax = 0, Null space of A, denoted by Null A, is set of all homogenous solution for Ax=0
- Row Space - Subspace of R power N spanned by row vectors is called Row Space
- Column Space - Subspace of R power N spanned by column vector is called Column Space
- For (1,-1,2), L1 Norm = Absolute value = 1+1+2 = 4
- L1 - Same Angle
- L2 - Plane
- L3 - Sum of vectors in 3D space
- L2 norm (5,2) = 5*5+2*2 = 29
- L infinity - Max of (5,2) = 5
Orthogonality - Linearly Independent, perpendicular will be linearly independent
Orthogonal matrix will always have determinant +/-1
Map of Mathematics.
— Cliff Pickover (@pickover) August 22, 2022
Enlarge the figure to see all the wonderful areas for exploration and imagination. Which topic might you find most fascinating?
By Dominic Walliman, @DominicWalliman, Source: https://t.co/mNu0hWzFGW, Used with permission. pic.twitter.com/kx1azWIhle
Differential Equations - Notes - Link
Lectures - Link
Course Notes - Link
Happy Learning!!!
Labels:
Data Science Tips,
Maths
May 08, 2016
Day #20 - PCA basics
Machine Learning Algorithms adjusts itself based on the input data set. Very different from traditional rules based / logic based systems. The capability to tune itself and work according to changing data set makes it self-learning / self-updating systems. Obviously, the inputs / updated data would be supplied by humans.
Basics
Basics
- Line is unidirectional, Square is 2D, Cube is 3D
- Fundamentally shapes are just set of points
- For a N-dimensional space it is represented in N-dimensional hypercube
Feature Extraction
- Converting a feature vector from Higher to lower dimension
PCA (Principal Component Analysis)
- Input is a large number of correlated variables We perform Orthogonal transformation, convert them into uncorrelated variables. We identify principal components based on highest variation
- Orthogonal vector - Dot product equals zero. The components perpendicular to each other
- This is achieved using SVD (Single Value Decomposition)
- SVD internally solves the matrix and identifies the Eigen Vectors
- Eigen vector does not change direction when linear transformation is applied
- PCA is used to explain variations in data. Find principal component with largest variation, Direction with next highest variation (orthogonal for first PCA)
- Rotation or Reflection is referred as Orthogonal Transformation
- PCA - Use components with high variations
- SVD - Express Data as a Matrix
More Reads
Happy Learning!!!
Labels:
Data Science Tips
May 03, 2016
Day #19 - Probability Basics
Concepts
- Events - Subset of Sample Space
- Sample Space - Set of all possible outcomes
- Random Variable - Outcome of experiment captured by Random variable
- Permutation - Ordering matters
- Combination - Ordering does not matter
- Binomial - Only two outcomes of trail
- Poisson - Events that take place over and over again. Rate of Event denoted by lambda
- Geometric - Suppose you'd like to figure out how many attempts at something is necessary until the first success occurs, and the probability of success is the same for each trial and the trials are independent of each other, then you'd want to use the geometric distribution
- Conditional Probability - P(A Given B) = P(A) will occur assume B has already occurred
- Normal Distribution - Appears because of central limit theorem (Gaussian and Normal Distribution both are same)
"Consider a binomial distribution with parameters n and p. The distribution is underlined by only two outcomes in the run of an independent trial- success and failure. A binomial distribution converges to a Poisson distribution when the parameter n tends to infinity and the probability of success p tends to zero. These extreme behaviours of the two parameters make the mean constant i.e. n*p = mean of Poisson distribution "
Read Michael Lamar's answer to Probability (statistics): What is difference between binominal, poisson and normal distribution? on Quora
Happy Learning!!!!
Happy Learning!!!!
Labels:
Data Science Tips
May 01, 2016
Day #18 - Linear Regression , K Nearest Neighbours
Linear Regression
KNN
Recommendation Algo Analysis
Linear Regression
Linear Regression - Concept and Theory
Linear Regression Problem 1
Linear Regression Problem 2
Linear Regression Problem 3
Happy Learning!!!
- Fitting straight line to set of data points
- Create line to predict new values based on previous observations
- Uses OLS (Ordinary Least Squares). Minimize squared error between each point and line
- Maximum likelihood estimation
- R squared - Fraction of total variation in Y
- 0 - R Squared - Terrible
- 1 - R Squared is good
- High R Squared good fit
- ML Model to predict continuous variables based on set of features
- Used where target variable is continuous
- Minimize residuals of points from the line
- Find line of best fit
- y = mx + c
- Residual = sum (y-mx-c)^2
- Reduce residuals
- Assumptions in LR
- Linearity, Residuals Gaussian Distribution, Independence of errors, normal distribution
Updated May 28/ 2020
- Supervised Machine Learning Technique
- New Data point classify based on distance between existing points
- Choice of K - Small enough to pick neighbours
- Determine value of K based on trial tests
- K nearest neighbours on scatter plot and identify neighbours
Recommendation Algo Analysis
Linear Regression
Linear Regression - Concept and Theory
Linear Regression Problem 1
Linear Regression Problem 2
Linear Regression Problem 3
Happy Learning!!!
Labels:
Data Science Tips
Subscribe to:
Posts (Atom)