"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

February 06, 2019

Day #207 - Dimensionality Reduction Notes

SVD - The sum of the squares of the singular values should be equal to the total variance in A

Matrix A, Can be expressed as
A = USVt

U,V - Orthogonal
U - Left Singular Vector
V - Right Singular Vector

A is an m × n matrix
U is an m × n orthogonal matrix
S is an n × n diagonal matrix
V is an n × n orthogonal matrix

Since an m × n matrix, where m > n, will have only n singular values, in SVD this is equivalent to solving an m × m matrix using only n singular values.
  • Dimensionality reduction is done by neglecting small singular values in the diagonal matrix S
  • Feature of dimensionality reduction is only exploited in the decomposed version
Output -  Storing the truncated forms of U, S, and V in place of A

Reference - Link

Eigen Vectors
  • Satisfy AV(Vector) = L(Eigen Value)V(Eigen Vector)
  • Certain Lines stretch don't change direction
Linear Dimensionality Reduction (PCA, SVD)
  • High Dimensional Data (Images, Text, Vector of Stock Data)
  • Describe the data with only few values
#https://gist.github.com/addisonhuddy/8a9e682259c9dca1f61672b4027863dc
import numpy as np
a = np.array([[1,1,1,0,2],[2,1,3,5,0],[1,3,5,6,2],[1,3,5,6,9],[2,3,4,5,6]])
#set printing options
np.set_printoptions(suppress=True)
np.set_printoptions(precision=3)
print('FULL')
U,S,Vt = np.linalg.svd(a,full_matrices=True)
print('U')
print(U)
print('S')
print(S)
print('Vt')
print(Vt)
print('Reduced - Ignore small values')
U,S,Vt = np.linalg.svd(a,full_matrices=False)
print('U')
print(U)
print('S')
print(S)
print('Vt')
print(Vt)
from sklearn.decomposition import PCA
from sklearn.decomposition import TruncatedSVD
pca = PCA(n_components=2)
pca.fit(a)
a_transformed = pca.transform(a)
print('pca')
print(a_transformed)
print(pca.explained_variance_)
view raw svd.py hosted with ❤ by GitHub


How Many Singular Values Should We Retain? - A useful rule of thumb is to retain enough singular values to make up 90% of the energy in Σ, Link

SVD - (Application in NLP) - Latent Semantic Analysis Notes
  • LSA applies singular value decomposition (SVD) to the matrix
  • In SVD, a rectangular matrix is decomposed into the product of three other matrices
  • One component matrix describes the original row entities as vectors of derived orthogonal factor values
  • Another describes the original column entities in the same way
  • Third is a diagonal matrix containing scaling values such that when the three components are matrix-multiplied, the original matrix is reconstructed
LDA
  • The Dirichlet distribution takes a number (called alpha in most places) for each topic (or category)
More Read - Link

Happy Mastering DL!!!

No comments: