Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database): Day #207

February 06, 2019

Day #207 - Dimensionality Reduction Notes

SVD - The sum of the squares of the singular values should be equal to the total variance in A

Matrix A, Can be expressed as
A = USVt

U,V - Orthogonal
U - Left Singular Vector
V - Right Singular Vector

A is an m × n matrix
U is an m × n orthogonal matrix
S is an n × n diagonal matrix
V is an n × n orthogonal matrix

Since an m × n matrix, where m > n, will have only n singular values, in SVD this is equivalent to solving an m × m matrix using only n singular values.

Dimensionality reduction is done by neglecting small singular values in the diagonal matrix S
Feature of dimensionality reduction is only exploited in the decomposed version

Output - Storing the truncated forms of U, S, and V in place of A

Reference - Link

Eigen Vectors

Satisfy AV(Vector) = L(Eigen Value)V(Eigen Vector)
Certain Lines stretch don't change direction

Linear Dimensionality Reduction (PCA, SVD)

High Dimensional Data (Images, Text, Vector of Stock Data)
Describe the data with only few values

	#https://gist.github.com/addisonhuddy/8a9e682259c9dca1f61672b4027863dc
	import numpy as np
	a = np.array([[1,1,1,0,2],[2,1,3,5,0],[1,3,5,6,2],[1,3,5,6,9],[2,3,4,5,6]])

	#set printing options
	np.set_printoptions(suppress=True)
	np.set_printoptions(precision=3)

	print('FULL')
	U,S,Vt = np.linalg.svd(a,full_matrices=True)

	print('U')
	print(U)
	print('S')
	print(S)
	print('Vt')
	print(Vt)

	print('Reduced - Ignore small values')
	U,S,Vt = np.linalg.svd(a,full_matrices=False)

	print('U')
	print(U)
	print('S')
	print(S)
	print('Vt')
	print(Vt)

	from sklearn.decomposition import PCA
	from sklearn.decomposition import TruncatedSVD

	pca = PCA(n_components=2)
	pca.fit(a)
	a_transformed = pca.transform(a)
	print('pca')
	print(a_transformed)
	print(pca.explained_variance_)

view raw svd.py hosted with ❤ by GitHub

How Many Singular Values Should We Retain? - A useful rule of thumb is to retain enough singular values to make up 90% of the energy in Σ, Link

SVD - (Application in NLP) - Latent Semantic Analysis Notes

LSA applies singular value decomposition (SVD) to the matrix
In SVD, a rectangular matrix is decomposed into the product of three other matrices
One component matrix describes the original row entities as vectors of derived orthogonal factor values
Another describes the original column entities in the same way
Third is a diagonal matrix containing scaling values such that when the three components are matrix-multiplied, the original matrix is reconstructed

LDA

The Dirichlet distribution takes a number (called alpha in most places) for each topic (or category)

Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database)

February 06, 2019

Day #207 - Dimensionality Reduction Notes

No comments:

Git Code Repository

About Me

What is your Expertise

Search This Blog

Translate

About Me and Disclaimer

Labels

Data Science Good Reads

Cloud, Datacentre, BigData and NOSQL Blogs

SQL Links

Archecture Blog List

Programming Problems

Startup - Reads

Perl-Python-Ruby-Linux-Oracle

Management + Leadership Blogs

Research Papers & Podcasts

My Wordpress

Interesting Reads

Useful Links - C# and .NET

Java, Selenium, QTP and Test Tools Learning

Agile Testing

Reverse Logistics Reads

Biztalk Blogs

MS BI Links

Process - Learnt it :)

Usability Guidelines - Building Better Sites

.NET Test Tools and Other Interesting Reads

Review Checklist

Blog Archive

Live Traffic

Total Pageviews

Popular Posts