"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

February 23, 2019

XGBoost on Windows, Python 3

#Install XGboost on windows, Python 3.2
#Step 1
anaconda search -t conda xgboost
#Step2
conda install -c mikesilva xgboost
#Step 3
pip install xgboost
from xgboost import XGBClassifie
classifier = XGBClassifier()
classifier.fit(x1,y1)
#https://towardsdatascience.com/fine-tuning-xgboost-in-python-like-a-boss-b4543ed8b1e
Happy Mastering DL!!!

February 21, 2019

SVD Summary






Recommendations




























Happy Learning!!!

Analysis of MIT Deep Learning Projects

I spent sometime to Analyze the MIT Deep Learning Projects. Very Inspiring. The healthcare projects are very inspiring. Broad categories and different domains. Good Read to know use cases and architecture.


Updated link











Happy Mastering DL!!!

Segmentation of Data Scientists

Data Scientists from stats world - This cluster has PhDs from the 2000s and working in Vision, Analytics since 2K period. Conversations with them were useful to handcraft features for image processing problems. They know the algos, basic math involved, intuitive details and the limitations of techniques.

Data Scientists with domain expertise - Laterals upskilled with data science skills. Data science practitioner world. Ability to bridge domain and Data Science use cases. Their Strength lies in identifying data, building the pipeline. Envisioning the end to end use flow.

Rookies - These days MOOC, Coursera, Udemy, Online Sessions, data science has a lot of visibility and attention for Entry level career choice. A lot of entry-level folks getting deeper into building models, getting good at model building, feature engineering

Kaggle Experts - The goto guys on feature engineering, parameter tuning, experimenting models, applying ensemble techniques, build models from anonymized data with the best accuracy

My journey has been through Databases, BI, Analytics. I use database primarily to data analysis, the perspective of BI helps to understand the Data from the business context, domain knowledge helps to quickly extract key data and quickly build models. All this experience helps to find use cases, building features for data models, build the data model, and sell it to business. I am still getting better in *selling part*. I keep learning with my interactions from all the segments of Data Scientists

Updated - 2022 - Feb 21


Ref - Link


Happy Mastering DL!!!


February 19, 2019

Day #215 - Deep Dive OpenCV


#cv2.getStructuringElement
#===============================
#Generate Different Kernel Combinations
kernel1 = cv2.getStructuringElement(cv2.MORPH_RECT,(5,5))
kernel2 = cv2.getStructuringElement(cv2.MORPH_ELLIPLSE,(5,5))
kernel3 = cv2.getStructuringElement(cv2.MORPH_CROSS,(5,5))
#cv2.filter2D
#==============
#OpenCV provides a function cv2.filter2D() to convolve a kernel with an image
#Example1
import cv2
import numpy as np
original_image = cv2.imread(r'E:\Opencv_Examples\goalkeeper.jpg')
kernel1 = cv2.getStructuringElement(cv2.MORPH_RECT,(5,5))
kernel2 = cv2.getStructuringElement(cv2.MORPH_ELLIPSE,(5,5))
kernel3 = cv2.getStructuringElement(cv2.MORPH_CROSS,(5,5))
conv1 = cv2.filter2D(original_image ,-1,kernel1)
conv2 = cv2.filter2D(original_image ,-1,kernel2)
conv3 = cv2.filter2D(original_image ,-1,kernel3)
cv2.imshow('conv1',conv1)
cv2.imshow('conv2',conv2)
cv2.imshow('conv3',conv3)
cv2.waitKey(0)
cv2.destroyAllWindows()
#cv2.calcHist
#=============
#cv2.calcHist() function to find the histogram
#cv2.calcHist(images, channels, mask, histSize, ranges[, hist[, accumulate]])
#channels - grayscale image [0]. [0], [1] or [2] to calculate histogram of blue, green or red channel
#mask : mask image. To find histogram of full image, it is given as "None"
#histSize : this represents our BIN count. Need to be given in square brackets. For full scale, we pass [256].
#ranges : this is our RANGE. Normally, it is [0,256].
##Example2
import cv2
original_image = cv2.imread(r'E:\Opencv_Examples\goalkeeper.jpg',cv2.IMREAD_GRAYSCALE)
hist = cv2.calcHist(original_image,[0],None,[256],[0,256])
original_image = cv2.imread(r'E:\Opencv_Examples\goalkeeper.jpg',cv2.IMREAD_COLOR)
hist_r = cv2.calcHist(original_image,[0],None,[256],[0,256])
hist_g = cv2.calcHist(original_image,[1],None,[256],[0,256])
hist_b = cv2.calcHist(original_image,[2],None,[256],[0,256])
from matplotlib import pyplot as plt
plt.hist(hist_b, bins=10)
plt.ylabel('Values')
plt.show()
#cv2.threshold
#===============
#If pixel value is greater than a threshold value, it is assigned one value (may be white)
#assigned another value (may be black)
#Minot changes from https://docs.opencv.org/3.4/d7/d4d/tutorial_py_thresholding.html
import cv2
import numpy as np
from matplotlib import pyplot as plt
img = cv2.imread(r'E:\Opencv_Examples\goalkeeper.jpg',cv2.IMREAD_GRAYSCALE)
ret,thresh1 = cv2.threshold(img,127,255,cv2.THRESH_BINARY)
ret,thresh2 = cv2.threshold(img,127,255,cv2.THRESH_BINARY_INV)
ret,thresh3 = cv2.threshold(img,127,255,cv2.THRESH_TRUNC)
ret,thresh4 = cv2.threshold(img,127,255,cv2.THRESH_TOZERO)
ret,thresh5 = cv2.threshold(img,127,255,cv2.THRESH_TOZERO_INV)
titles = ['Original Image','BINARY','BINARY_INV','TRUNC','TOZERO','TOZERO_INV']
images = [img, thresh1, thresh2, thresh3, thresh4, thresh5]
for i in range(6):
plt.subplot(2,3,i+1),plt.imshow(images[i],'gray')
plt.title(titles[i])
plt.xticks([]),plt.yticks([])
plt.show()
#cv2.calcBackProject
#===================
#cv2.calcBackProject(). Its parameters are almost same as the cv2.calcHist()
#object histogram should be normalized before passing on to the backproject function
#One of input is histogram of object we want to find it
#cv2.merge
#=========
import cv2
img = cv2.imread(r'E:\Opencv_Examples\goalkeeper.jpg',cv2.IMREAD_COLOR)
b,g,r = cv2.split(img)
cv2.imshow('b',b)
cv2.imshow('g',g)
cv2.imshow('r',r)
img2 = cv2.merge((b,g,g))
cv2.imshow('bgg',img2)
img3 = cv2.merge((b,g,r))
cv2.imshow('bgr',img3)
cv2.waitKey(0)
cv2.destroyAllWindows()
#cv2.bitwise_and
#===============
#This includes bitwise AND, OR, NOT and XOR operations.
#They will be highly useful while extracting any part of the image
#cv2.bitwise_and
#cv2.bitwise_not
#https://docs.opencv.org/3.2.0/d0/d86/tutorial_py_image_arithmetics.html
Happy Mastering DL!!!

February 18, 2019

Day #214 - Python Working with Arrays / Data Collection


#Working with arrays
import numpy as np
a = np.array([[1,2,3],[3,4,5]])
print('a')
print(a)
#Reshape as three rows 2 columns
b = a.reshape(3,2)
print('b')
print(b)
#Reshape as two rows and three columns
c = a.reshape(2,3)
print('c')
print(c)
#Transpose
d = a.T
print('d')
print(d)
#Can only specify one unknown dimension
#unknown dimension for rows but two columns
e = a.reshape(-1,2)
print('e')
print(e)
#2 rows one unknown dimension for columns
f = a.reshape(2,-1)
print('f')
print(f)
#parse the array
i = 0
print('print values of a')
for row in a:
for value in row:
print('position',str(i))
i = i+1
print(value)
view raw arrays.py hosted with ❤ by GitHub
#list = []
#tuple = ()
#sets = {}
#dictionary = {}
sportslist = ['cricket','hockey','tennis','badminton']
vegetablestuple = ('brinjal','tomato','beetroot','drumstick')
foodmenuset = {'biryani','roti','rice','curd'}
dicthotels = {'india':'Delhi','china':'beijing','srilanka':'colombo','pakistan':'islamabad'}
dicthotelscities = {'india':['Delhi','chennai','mumbai'],'china':['beijing','shangai']}
print('sportslist')
print('_______________')
for name in sportslist:
print(name)
print('vegetablestuple')
print('_______________')
for name in vegetablestuple:
print(name)
print('dicthotels')
print('_______________')
for key,value in dicthotels.items():
print(key)
print(value)
print('dicthotelscities')
print('_______________')
for key,value in dicthotelscities.items():
print(key)
for city in value:
print(city)
Happy Mastering DL!!!

February 17, 2019

Day #213 - Working with Sound and Python


#pip install librosa
import librosa
import matplotlib.pyplot as plt
import librosa.display
import numpy as np
path = r'D:\PetProject\Audio_Analytics\UrbanSound.tar\UrbanSound\data\air_conditioner\204240.wav'
#waveform `y`
#Store the sampling rate as `sr`
#By default, this uses resampy’s high-quality mode (‘kaiser_best’).
y,sr = librosa.load(path,res_type='kaiser_fast')
plt.figure(figsize=(10,5))
librosa.display.waveplot(y,sr=sr)
#Mel frequency cepstral coefficients (MFCCs)
#small set of features
mfcc = librosa.feature.mfcc(y=y, sr=sr)
print(mfcc.shape)
#numpy.ndarray of size (n_mfcc, T)
mfccs=np.mean(librosa.feature.mfcc(y=y,sr=sr,n_mfcc=40).T,axis=0)
print(mfccs)
print(mfccs.shape)
#Load 5 seconds of a wav file, starting 15 seconds in
y,sr = librosa.load(path,res_type='kaiser_fast', offset=15.0, duration=5.0)
plt.figure(figsize=(10,5))
librosa.display.waveplot(y,sr=sr)
#Load a wav file and resample to 11 KHz
y,sr = librosa.load(path,res_type='kaiser_fast',sr=11025)
plt.figure(figsize=(10,5))
librosa.display.waveplot(y,sr=sr)
view raw sound.py hosted with ❤ by GitHub
Happy Mastering DL!!!

February 14, 2019

Day #212 - OpenCV based Object Tracking Learning's


#https://docs.opencv.org/3.1.0/db/df8/tutorial_py_meanshift.html
#Yolo + Meanshift - OpenCV
#Write comment for each line that you don't understand :)
import numpy as np
import cv2
cap = cv2.VideoCapture(r'E:\Optical_Flow\slow.flv')
ret, frame = cap.read()
#frame = cv2.resize(old_frame, (500, 400))
#set up initial location window
r,h,c,w = 250,90,400,125 #assign it based on Yolo
track_window = (c,r,w,h)
#set up roi for tracking
roi = frame[r:r+h,c:c+w]
#Converts an image from one color space to another.
hsv_roi = cv2.cvtColor(roi,cv2.COLOR_BGR2HSV)
#The cv2.inRange - three arguments
#first is the image to perform color detection
#second - lower limit of the color you want to detect
#third argument - upper limit of the color you want to detect.
mask = cv2.inRange(hsv_roi,np.array((0.,60.,32.)),np.array((180.,255.,255.)))
#cv2.calcHist to calculate the histogram of an image
#cv2.calcHist(images, channels, mask, bins, ranges)
#For grayscale images as there's only one channel and [0], [1] or [2]
#bins - is a list containing the number of bins to use for each channel
#ranges - is the range of the possibile pixel values which is [0, 256] in case of RGB color space (where 256 is not inclusive).
roi_hist = cv2.calcHist([hsv_roi],[0],mask,[180],[0,180])
#its normalized values will be R/S, G/S and B/S (where, S=R+G+B).
#when normType=NORM_MINMAX (for dense arrays only).
#The optional mask specifies a sub-array to be normalized.
#This means that the norm or min-n-max are calculated over the sub-array, and then this sub-array is modified to be normalized
cv2.normalize(roi_hist,roi_hist,0,255,cv2.NORM_MINMAX)
#Set up Termination Criteria either 10 iteration or move by atlease 1 pt
term_crit = (cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT,10,1)
while(1):
ret, frame = cap.read()
if ret == True:
hsv = cv2.cvtColor(frame,cv2.COLOR_BGR2HSV)
#Calculates the back projection of a histogram.
#images, channels, hist, ranges, scale[, dst]
dst = cv2.calcBackProject([hsv],[0],roi_hist,[0,180],1)
#apply mean shift to get the new location
#move that window to the area of maximum pixel density (or maximum number of points)
#the movement is reflected in histogram backprojected image.
#As a result, meanshift algorithm moves our window to the new location with maximum density
ret, track_window = cv2.meanShift(dst,track_window,term_crit)
#draw it on image00000000
x,y,w,h = track_window
img2 = cv2.rectangle(frame,(x,y),(x+w,y+h),255,2)
cv2.imshow('img2',img2)
k = cv2.waitKey(0) & 0xff
if k==27:
break
else:
break
cv2.destroyAllWindows()
cap.release()
view raw py_meanshift.py hosted with ❤ by GitHub
Happy Mastering DL!!!

Voice Powered SQL Assistant

SQLBot - I am your Query Assistant what do you want me to do?
User - I want a query to join few tables

SQLBot - Tell the tables
User - Employee, Payment, JobDetails tables

SQLBot - Based on my analysis these are join columns EmployeeId for Employee-JobDetails, JobId for Job and Payment Table
User - Give me the query

SQLBot - There are four indexes available which indexes do you want me to use, any inputs
User - Give best possible query

SQLBot - I tried this query on 10K records it took 2.3 seconds, Is it fine? Do you want me to populate for 100K and try again?
User - I will do it in next sprint, Until then this is fine

SQLBot - Thank you, Small Stats - Other uses who worked on this similar query spent 40% more time analyzing than how you spent time
User - Time to go, bye, Check-in the code 

Use Technology to add value on top of human intelligence :)

Happy Learning!!!

February 13, 2019

Day #211- OpenCV based Optical Flow Example


#Modified and updated opencv example based on my requirements
#https://docs.opencv.org/3.1.0/d7/d8b/tutorial_py_lucas_kanade.html
import numpy as np
import cv2
cap = cv2.VideoCapture(r'E:\Optical_Flow\Demo.mp4')
#params for shitomasi corner detection
feature_params = dict(maxCorners=100,qualityLevel=0.3,minDistance=7,blockSize=7)
#parameters for lucas kanade optical flow
lk_params = dict(winSize=(15,15),maxLevel=2,criteria=(cv2.TERM_CRITERIA_EPS|cv2.TERM_CRITERIA_COUNT,10,0.03))
#Create some random colors
color = np.random.randint(0,255,(100,3))
#Take first frame and find corners in it
ret, old_frame = cap.read()
image = cv2.resize(old_frame, (500, 400))
old_gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
p0 = cv2.goodFeaturesToTrack(old_gray,mask=None,**feature_params)
#create mask for drawing purpose
mask = np.zeros_like(image)
final_frame = np.zeros_like(image)
while(1):
flag, frame1 = cap.read()
if flag:
frame = cv2.resize(frame1, (500, 400))
frame_gray = cv2.cvtColor(frame,cv2.COLOR_BGR2GRAY)
#calculate optical flow
p1,st,err = cv2.calcOpticalFlowPyrLK(old_gray,frame_gray,p0,None,**lk_params)
#select good points
good_new = p1[st==1]
good_old = p0[st==1]
print(good_new)
print(good_old)
#draw the tracks
for i,(new,old) in enumerate(zip(good_new,good_old)):
a,b = new.ravel()
c,d = old.ravel()
mask = cv2.line(mask,(a,b),(c,d),color[i].tolist(),2)
frame = cv2.circle(frame,(a,b),5,color[i].tolist())
img = cv2.add(frame,mask)
cv2.imshow('Intermediate frame',img)
final_frame = img.copy()
k = cv2.waitKey(30) & 0xff
if k==27:
break
#now update previous frame and previous points
old_gray = frame_gray.copy()
p0 = good_new.reshape(-1,1,2)
else:
break
cv2.destroyAllWindows()
#Hough Transformation
#Line Count
gray = cv2.cvtColor(final_frame,cv2.COLOR_BGR2GRAY)
img_gaussian = cv2.GaussianBlur(gray,(3,3),0)
img_sobelx = cv2.Sobel(img_gaussian,cv2.CV_8U,1,0,ksize=5)
img_sobely = cv2.Sobel(img_gaussian,cv2.CV_8U,0,1,ksize=5)
#Compute lines using Hough Transformation
xlines = cv2.HoughLines(img_sobelx,1,np.pi/180,200)
ylines = cv2.HoughLines(img_sobely,1,np.pi/180,200)
if(ylines is not None):
if(xlines is not None):
print('sobel - HoughLines')
c = ylines.size/xlines.size
print(c)
print('X Line Counts')
print(xlines)
print('Y Line Counts')
print(ylines)
cv2.imshow("Final Frame",final_frame)
cv2.waitKey(0)
cv2.destroyAllWindows()
cap.release()
Happy Mastering DL!!!

February 12, 2019

Day #210 - NLP Coding Snippets

Samples on Entity Extraction, Keywords extraction, Sentiment Analysis for evaluating sentences.

#pip install spacy
#python -m spacy download en
#pip install multi-rake
from multi_rake import Rake
import spacy
nlp = spacy.load('en')
from textblob import TextBlob
def computekeywords(sentence):
print(sentence)
doc = nlp(sentence)
rake = Rake()
print('using spacy')
for ent in doc.ents:
print(ent.text, ent.label_)
keywords = rake.apply(sentence)
print('Keywords using Rake')
print(keywords)
print('Sentiment of the sentence')
analysis = TextBlob(sentence)
if(analysis.sentiment[0])>0:
intent = 'Positive'
elif(analysis.sentiment[0])<0:
intent = 'Negative'
else:
intent = 'Neutral'
print(intent)
sentence = "I need car insurance"
computekeywords(sentence)
sentence2 = "I lost my credit card"
computekeywords(sentence2)
#I need car insurance
#using spacy
#Keywords using Rake
#[('car insurance', 4.0)]
#Sentiment of the sentence
#Neutral
#I lost my credit card
#using spacy
#Keywords using Rake
#[('credit card', 4.0), ('lost', 1.0)]
#Sentiment of the sentence
#Neutral
view raw NLP_Keywords.py hosted with ❤ by GitHub
Happy Mastering DL!!!!

February 11, 2019

Day # 209 Pandas DateTime Coding Snippets

Lessons working on Pandas DateTime columns

import numpy as np
from dateutil.parser import parse
a = parse('2012-11-01 02:48:30')
print(a)
import time
print(int(time.mktime(a.timetuple())))
import pandas as pd
data = [['Alex','2012-11-01 02:48:30'],['Bob','2011-11-01 02:48:30'],['Clarke','2013-11-01 02:48:30']]
#Create a dataframe
df = pd.DataFrame(data,columns=['Name','EmploymentDate'])
#convert str to date time
df['Date_time1'] = pd.to_datetime(df['EmploymentDate'])
#convert date time to int value
df['date_time_int'] = df.Date_time1.astype(np.int64)
print(df)


Happy Mastering DL!!!

Deep Life

Life is a form of reinforcement learning. I believe the growth-oriented mindset reflects reinforcement learning. Learn the lesson when you fail, re-apply the lesson when you succeed. Add a bit of randomness to evaluate newer unexplored territories. Keep Learning!!!#ArtificialIntelligence #DeepLearning #rl

February 10, 2019

Day #208 - OpenAI - Spinning Up in Deep RL Workshop - Part 1

Key Lessons
  • AGI - Artificial General Intelligence
  • Do Most Economically Valuable work
  • Deep Reinforcement Learning trains Deep Networks with Trial and Error
  • Function approximators - Deep Networks 


Reinforcement Learning
  • Good for Sequential Learning
  • Good when we do not know optimal behavior
  • RL is useful when evaluating behavior is easier than generating them
Deep Learning
  • Good for High Dimensional Data
  • Approximate a function
Deep RL
  • Video Games
  • For Decision Rules 


Recap of DL Patterns
  • Finding a model that would give right output for certain inputs
  • Output of each layer is a re-arrangement of input with the non-linearity applied
  • Loss function differentialble with parameters in model
  • Compute loss changes with respect to change in parameters
  • Function composition is the core of the model
  • Function topology with multiple architecture
  • Non-Linearity does a lot of work
  • Successive layers represents more complex features
  • LSTM (RNN) - Accept timeseries of input and timeseries of output
  • Transformer - Allows network to do (Attention) over several inputs
  • Attention Neural Networks - Select most meaningful details from data, Make Decision based on lot of data
  • Regularizers - Tradeoff loss against something that is not dependant on task, They do better job at Generalization
  • Adaptive Optimizer 




Formulate RL Problem
  • Agent that interacts with environment
  • Agent picks and executes action
  • With New Environment Agent proceeds further
  • What decision maximizes rewards
  • Attains the goal with Trial and Error

Observations And Actions
  • Observations are continuous
  • Actions may be discrete or continuous

Policy
  • Randomly (Stochastic)
  • Deterministic (Map directly with no randomness)
  • Randomness is helpful
  • Logits (Probabilities of particular action)
  • Probabilities of max of softmax of the logits
Trajectory - Sequence of states and actions in an environment

Reward function - Measures of good / bad. More positive better


Value functions - How much reward expected to get
Value function satisfy Bellman Equation


Types of RL Algos
Model, Environment based models


Try - Evaluate - Improve the policy
Policy Optimzation


  • Run policy by complete trajectory
  • Represent policy with Neural Network
Derive Policy Gradient
  • Parameters are in distribution
  • Bring gradient inside expectation
  • Expectation based on Trajectory

  • Starting state drawn from some distribution
  • Markov Property notion of picking next state depends on current state not on previous state

  • Every action will get some update
  • Reward to go Policy Gradient


  • Advantage form functions
  • How much better action is than average


  • N-Step Advantage Estimates



  • Initially Assignments (Weights) do Matter while setting up the system
Next is Part II


Happy Mastering DL!!!

February 07, 2019

Startup Idea - Incentivised Learning for Kids - Paid Online Courses

Taking a cue from the Cabs, Food Delivery we can also apply the same approach for Online Classroom / Teaching kids / MOOC courses

Students, buy a course for X% use a certain portion of it to surprise them with desserts / good meal/toys or something that would motivate them on daily basis. If I complete this I might get an Icecream so I will do this. (A little bribe to read)

Setting up Customer Base (Cabs) - OLA, Uber rides initially had heavy discounts for users, incentives for cab drivers. I remember applying coupons heavily and satisfaction of saving X% from every ride
Stabilization Phase - Since June 2018, I didn't get to see any more offers after they achieved user-base
Setting up Customer Base (Food Delivery) - Since October / November 2018 I again observed this trend of offers in Food Delivery. Heavily I have observed/used ubereats.

It's human psychology to go for offers / feel happy with the x% savings from the offer.

Learning (Incentivised Approach)
Taking the same idea and applying it for learning methodology. For every learning course, it can be evaluated based on
  • Consistency in attending classes, Consistency of Learning
  • Solving problems, Concept Understanding
  • Explaining the concept in own terminology 
  • Using AI to evaluate theoretical/experimental aspects of knowledge
  • Grade them against themselves, Instead of comparative grade provide individual progress time to time-based on collected data
  • Based on progress provide incentive points
  • These incentive points can be claimed with a special meal/toy something for that day
Call up and follow up when they are not taking up regularly. This personalized reminders also will give the motivation to continue the course. When the scheme of offers/promotions has a psychology impact at an individual level. The same concept for kids for the learning can be used to motivate them and provide more personalized incentives to make it more committed and encouragement for them.

Everything in life is connected with multiple aspects of actions, perspectives, thoughts. Hope this idea is implemented in learning apps to make it more encouraging for the kids.  Give the perspective of marks as something they are good currently in this particular subject. Do not give the impression that good scores mean you know everything.

All these courses should aid in creating long term learning interests, consistent learning / creative thinking/experimenting mindset. 

Happy Mastering DL!!!

February 06, 2019

Day #207 - Dimensionality Reduction Notes

SVD - The sum of the squares of the singular values should be equal to the total variance in A

Matrix A, Can be expressed as
A = USVt

U,V - Orthogonal
U - Left Singular Vector
V - Right Singular Vector

A is an m × n matrix
U is an m × n orthogonal matrix
S is an n × n diagonal matrix
V is an n × n orthogonal matrix

Since an m × n matrix, where m > n, will have only n singular values, in SVD this is equivalent to solving an m × m matrix using only n singular values.
  • Dimensionality reduction is done by neglecting small singular values in the diagonal matrix S
  • Feature of dimensionality reduction is only exploited in the decomposed version
Output -  Storing the truncated forms of U, S, and V in place of A

Reference - Link

Eigen Vectors
  • Satisfy AV(Vector) = L(Eigen Value)V(Eigen Vector)
  • Certain Lines stretch don't change direction
Linear Dimensionality Reduction (PCA, SVD)
  • High Dimensional Data (Images, Text, Vector of Stock Data)
  • Describe the data with only few values
#https://gist.github.com/addisonhuddy/8a9e682259c9dca1f61672b4027863dc
import numpy as np
a = np.array([[1,1,1,0,2],[2,1,3,5,0],[1,3,5,6,2],[1,3,5,6,9],[2,3,4,5,6]])
#set printing options
np.set_printoptions(suppress=True)
np.set_printoptions(precision=3)
print('FULL')
U,S,Vt = np.linalg.svd(a,full_matrices=True)
print('U')
print(U)
print('S')
print(S)
print('Vt')
print(Vt)
print('Reduced - Ignore small values')
U,S,Vt = np.linalg.svd(a,full_matrices=False)
print('U')
print(U)
print('S')
print(S)
print('Vt')
print(Vt)
from sklearn.decomposition import PCA
from sklearn.decomposition import TruncatedSVD
pca = PCA(n_components=2)
pca.fit(a)
a_transformed = pca.transform(a)
print('pca')
print(a_transformed)
print(pca.explained_variance_)
view raw svd.py hosted with ❤ by GitHub


How Many Singular Values Should We Retain? - A useful rule of thumb is to retain enough singular values to make up 90% of the energy in Σ, Link

SVD - (Application in NLP) - Latent Semantic Analysis Notes
  • LSA applies singular value decomposition (SVD) to the matrix
  • In SVD, a rectangular matrix is decomposed into the product of three other matrices
  • One component matrix describes the original row entities as vectors of derived orthogonal factor values
  • Another describes the original column entities in the same way
  • Third is a diagonal matrix containing scaling values such that when the three components are matrix-multiplied, the original matrix is reconstructed
LDA
  • The Dirichlet distribution takes a number (called alpha in most places) for each topic (or category)
More Read - Link

Happy Mastering DL!!!

February 05, 2019

Day#206 - AI for Social Cause

Few more ideas based on Recent Reads / Patterns
  • Segment / Predict crime network based on telephone signals, vehicle movements, face recognition, card transaction, crime activity
  • Auto-detect vehicle details from Video for violating Traffic Patterns
  • Predict child leanings issues with AI attention, focus, writing, reading, interpretation and micro skills
  • Spot early depression signs based on activity patterns
  • Predict drought based on patterns
  • Map missing children with Face similar Search global database, Predict child trafficking
  • AI for post medication follow-up and drop out prediction
  • AI for course study, drop out prediction and follow up
  • Early Intervention to detect / prevent obesity /Diabetics
AI for Social Cause, AI for better humanity - Found this talk interesting 
Key Lessons
  • Direct Advances for Society Benefit
  • Health, Safety and Wildlife Conservation
  • Optimize resources
  • Wildlife - Past poaching incident to predict
  • Health - Homeless shelters, Influence Maximzation, Awareness of HIV, TB, Obesity, Health Challenges
Safety and Security
Case #1 - Schedule Checkpoints and Patrols in Airport
  • Game Theory for Security Resource Optimization
  • Stackelberg Security Games Model
  • Randomness / Deterministic
  • Defender commits to randomized strategy
  • Probability at different points at time
  • These samples are used to generate schedule
  • Randomized checkpoints, detections





Case #2 - Assign Air Marshals to Flights
  • Assigning marshals to flights
  • Support Size set is small
  • Solve the Game Matrix
  • Incremental Strategy generation
  • Randomization of Scheduling


Case #3 - Patrols using Graphs
  • Different ways of patrol boat movements
  • Optimize and Schedule for patrol
Conservation / Wildlife
  • Snairs and Traps to kill animals
  • Divide into Grid Squares
  • Mixed Integer to generate patrols
  • Multiple bounded rational poachers
  • Learn response based on past poaching data
  •  Finding a missing item in a grid cell
  • Ensemble of classifiers to predict 
  • Classify into high risk / low risk areas based on historical data
  • Strategically Signalling
  • Optimal Deceptive Signalling




Health
  • Awareness to reduce rates of HIV
  • Peer Leader Campaign
  • Peer Leaders
  • Homeless Shelters
Prevent TB in India
  • Low resource community
  • Non-adherence of TB treatment
  • Digital Adherence tracking technology
  • Calling and reminding
  • From Calling pattern predict adherence
  • Everwell 
  • Predict High-Risk using SVM / RF Algo
  • Mixed Strategy (Randomization with multiple predictors)
  • Decision Focused Learning
Prevent Suicides
  • Choose K gate keepers
  • Solving this game
More Reads - http://teamcore.usc.edu/lecture.htm


IAAI Robert S. Engelmore Award Lecture: Milind Tambe (USC) | AI and Multiagent Systems for Social Good from AAAI Livestreaming on Vimeo.

Happy Mastering DL!!!