Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database): December 2019

December 31, 2019

Day #308 - Display on Rasberry PI

Finally got a 2.8 Inch LCD display for Rasberry PI. Next step is to experiment with the models with real-time situations. This LCD will help to visualize the output. Package installation commands and reference.

	#https://www.waveshare.com/wiki/2.8inch_RPi_LCD_(A)

	pi Display
	============
	git clone https://github.com/waveshare/LCD-show.git
	cd LCD-show/
	/LCD28-show

	To Get back to monitor display
	================================
	cd LCD-show/
	./LCD-hdmi

	Virtual Keyboard
	===================
	#https://raspberrypi.stackexchange.com/questions/41150/virtual-keyboard-activation
	sudo apt-get install matchbox-keyboard
	sudo reboot
	Menu >> Accessories >> Keyboard

view raw RPI_Display.txt hosted with ❤ by GitHub

Happy Learning!!!

December 30, 2019

Social Media Responsibilities

How do we measure social media impact? What are the pros and cons of Social Media
Pros

Information Sharing
Connect with a larger set of population

Cons

Sharing information without Authenticity
Motive / Authenticity of people sharing information
Smiles / Selfies vs Reality of Life
Manipulative / Biased / Personalized targeted ads
Exploit human tendencies of feedback / Sensitized to bias

Social Media Accountability

Freedom of speech vs Hurting Sentiments
Consequences of blackmail/threat/bullying through social media contacts
Consequences of any form of violence instigated through messages
Validating the facts/claims shared?
Endorsing political advertisement claims without a moral stand?
Biased targeting of users?
Depression / Suicide due to excess usage of social media?
Freedom of speech vs Authenticity of speech vs Intentions of information shared?

When we can’t even agree on what is real

Keep Thinking!!!

December 26, 2019

Analyzing top 25 AI companies in 2019

Analyzing top 25 AI companies listed in Link

Happy Learning AI Landscape!!!

December 24, 2019

Data Analysis vs Forensic Science

Before AI / BI it's about #exploring the data to uncover the #DataInsights. #DataAnalysis is similar to #ForensicScience. Side by Side comparison of both perspectives.

#Data and #Insights sets the #direction for successful #AI / #BI usecases #datascience #bigdata #analytics

Happy Learning!!!

December 23, 2019

Difference between SQL and NOSQL Systems

Reposting from my two-year-old Quora answer

The Key differences between them lies in the understanding CAP theorem

Consistency
Availability
Partition Tolerance

In layman terms. SQL systems ex-RDBMS will adhere ACID properties (Atomicity, Consistency, Isolation, Durability).

The datatypes, schema are predefined, You cannot store non-matching datatypes
To avoid dirty data, systems enforce isolation levels that govern only committed data is read (Consistency)
Only latest records are available, records at that point in time are not available
Banking Systems, ordering systems where data needs to consistent will be mostly SQL based systems where consistency is important

No-SQL systems (Not Only SQL)

The schema is not tightly governed, its flexible you can store different datatypes in same columns
These may be geographically distributed where data may be synced and eventually be consistent end of day not realtime
They also support point in time data, data values at a point in time can also be looked up
Where there is no requirement for consistency we can achieve other 2 Availability and partition tolerance
Since some of the ACID properties are compromised you will have high availability of this systems

It is more to do with business need to decided SQL or SQL based storage.

Happy Learning!!!

December 17, 2019

Improving Women Safety

To reduce crime against women more than strengthening laws we need to get to the root cause of issues. We need to analyze the crime data and fix the source of the problem.

We need to analyze the crime patterns based on different aspects to find the underlying patterns.

Pattern vs Solutions

Correlation with alcohol - How to reduce/limit alcohol consumption
Correlation with education - How to reduce dropouts and improve education
Correlation with income category - Sustainable jobs
Correlation with marital status - Family aspects
Correlation to caste - Driven by caste / Unemployment / Dropouts
Correlation to age group - Social media, porn impact
Correlation to social behavior - Drugs / partying / Addiction
Correlation to job type - Government vs Private jobs vs Daily vs Organized Crimes

Education is not limited to a few years. Education is not about degrees. The real purpose of education is to unlearn and relearn things from morality/humanity perspective.

It needs complete society change, not just laws. Let's prepare a safer tomorrow by making the required changes.

Keep Questioning!!!

December 16, 2019

Day #307 - Porting Keras to Tensorflow Lite Version

Next Task is to run all the developed models in Pi using Tensorflow Lite. I am using google colab to convert the models into lite version.

	#Step 1
	from google.colab import drive
	drive.mount('/content/drive')
	#Upload data to drive in directory ColabData

	#Step 2
	import tensorflow as tf
	print(tf.__version__)
	#tensorflow.contrib is not there in 2.0
	#!pip install tensorflow==2.0.0
	#Downgrade tensorflow to 1.13.2
	#!pip install tensorflow==1.13.2

	#Step 3
	from tensorflow.contrib import lite
	converter = tf.lite.TFLiteConverter.from_keras_model_file(r'/content/drive/My Drive/ColabData/model_landmark_vgg16.h5')
	tflite_model = converter.convert()
	open(r'/content/drive/My Drive/ColabData/model.tflite','wb').write(tflite_model)

view raw TFLiteConversion.py hosted with ❤ by GitHub

The ported models we will attempt to run in Rasberry PI as next steps

Happy Learning!!!

Day#306 - Express the SQL in pandas, TSQL in Pandas

I wanted to mimic joins, aggregation, sum whatever we do in Database with pandas. A simple storyline of Data Analysis between Employee, Department and Salary using pandas dataframes.

	import pandas as pd

	#Define Data Frames
	Employee = {'name': ['Raj', 'Siva', 'Mike', 'Gopi','New_Joinee'],
	'age': [22,38,26,35,22]}

	dfEmployee = pd.DataFrame(Employee)
	print(dfEmployee)

	#Salary
	Salary = {'name': ['Raj', 'Siva', 'Mike', 'Gopi','Raj', 'Siva'],
	'salary': [2200,3800,2600,3500,7000,5000],
	'Month': ['Jan','Feb','Jan','Jan','Feb','Mar']
	}
	dfSalary = pd.DataFrame(Salary)
	print(dfSalary)

	#Department
	Department = {'name': ['Raj', 'Siva', 'Mike', 'Gopi','NOCODE'],
	'dept': ['IT','AI','HR','DB','NOCODE']}
	dfDepartment = pd.DataFrame(Department)
	print(dfDepartment)

	#Inner Join, Employee and Dept
	print('Outer Join')
	print(pd.merge(dfEmployee, dfDepartment, on='name', how='outer'))

	#Left Join
	print('Left Join')
	print(pd.merge(dfEmployee, dfDepartment, on='name', how='left'))

	#Right Join
	print('Right Join')
	print(pd.merge(dfEmployee, dfDepartment, on='name', how='right'))

	#Inner Join
	print('Inner Join')
	print(pd.merge(dfEmployee, dfDepartment, on='name', how='inner'))

	#Group by
	#Total Salary Group by Employee
	#Do a join
	salarydata = pd.merge(dfEmployee, dfSalary, on='name', how='inner')
	print(salarydata)
	print('Total Paid by Employee')
	#Perform Group By
	print(salarydata.groupby(['name']).sum())

	#Sum
	#Total Salary Paid
	salarydata = pd.merge(dfEmployee, dfSalary, on='name', how='inner')
	Total = salarydata['salary'].sum()
	print('Total Salary Paid')
	print(Total)

	#Min
	#Min Salary paid by employee
	MinimumSalary = salarydata['salary'].min()
	print('MinimumSalary')
	print(MinimumSalary)

	#Max
	#Max Salary paid by employee
	MaxSalary = salarydata['salary'].max()
	print('MaxSalary')
	print(MaxSalary)

view raw Pandas_Joins.py hosted with ❤ by GitHub

Everything can be done in SQL. This is a different approach to it using pandas.

Happy Learning!!!

Day #305 - Loading from Weights file HDF, saved models H5 files

We will look at

Vanilla Model
Load preexisting weights HDF5 and Continue
Load preexisting model H5 and Continue

	from __future__ import print_function
	import keras
	from keras.datasets import mnist
	from keras.models import Sequential
	from keras.layers import Dense, Dropout, Flatten
	from keras.layers import Conv2D, MaxPooling2D
	from keras import backend as K
	from keras.callbacks import ModelCheckpoint, CSVLogger, EarlyStopping
	import os

	batch_size = 128
	num_classes = 10
	epochs = 5

	log_file_path = r'E:\Landmark\mnist_training_log.log'
	model_save_path = r"E:\Landmark\\mnist.h5"
	weights_filepath="E:\\Landmark\\mnist-weights-improvement-{epoch:02d}.hdf5"
	pre_weights_path = "E:\\Landmark\\mnist-weights-improvement-04.hdf5"
	pre_model_h5_path = "E:\\Landmark\\mnist.h5"

	# input image dimensions
	img_rows, img_cols = 28, 28

	# the data, split between train and test sets
	(x_train, y_train), (x_test, y_test) = mnist.load_data()

	if K.image_data_format() == 'channels_first':
	x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
	x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
	input_shape = (1, img_rows, img_cols)
	else:
	x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
	x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
	input_shape = (img_rows, img_cols, 1)

	x_train = x_train.astype('float32')
	x_test = x_test.astype('float32')
	x_train /= 255
	x_test /= 255
	print('x_train shape:', x_train.shape)
	print(x_train.shape[0], 'train samples')
	print(x_test.shape[0], 'test samples')

	# convert class vectors to binary class matrices
	y_train = keras.utils.to_categorical(y_train, num_classes)
	y_test = keras.utils.to_categorical(y_test, num_classes)

	from keras.models import load_model

	def CreateModel():
	model = Sequential()
	model.add(Conv2D(32, kernel_size=(3, 3),activation='relu',input_shape=input_shape))
	model.add(Conv2D(64, (3, 3), activation='relu'))
	model.add(MaxPooling2D(pool_size=(2, 2)))
	model.add(Dropout(0.25))
	model.add(Flatten())
	model.add(Dense(128, activation='relu'))
	model.add(Dropout(0.5))
	model.add(Dense(num_classes, activation='softmax'))
	return model

	def LoadModelfromH5(model_h5_path):
	if os.path.exists(model_h5_path):
	print('Loading Definitions')
	model = load_model(model_h5_path)
	return model

	def LoadModelWeights(pre_weights_path):
	model = CreateModel()
	model.load_weights(pre_weights_path)
	return model

	#Option 1
	#model = CreateModel()

	#Option 2
	#model = LoadModelWeights(pre_weights_path)

	#Option#3
	model = LoadModelfromH5(pre_model_h5_path)

	model.compile(loss=keras.losses.categorical_crossentropy, optimizer=keras.optimizers.Adadelta(),metrics=['accuracy'])

	#Add Early Stop and Checkpoint
	early_stop = EarlyStopping(monitor='val_loss', patience=5, verbose=1)
	checkpoint = ModelCheckpoint(weights_filepath, monitor='val_loss', verbose=1, save_best_only=True, mode='auto')
	csv_logger = CSVLogger(log_file_path, append=False)
	callbacks_list = [checkpoint,early_stop,csv_logger]

	model.fit(x_train, y_train,
	batch_size=batch_size,
	epochs=epochs,
	verbose=1,
	validation_data=(x_test, y_test), callbacks=callbacks_list)

	score = model.evaluate(x_test, y_test, verbose=0)
	print('Test loss:', score[0])
	print('Test accuracy:', score[1])
	model.save(model_save_path)

	import pandas as pd
	import matplotlib.pyplot as plt
	# Plot the Loss
	file_name = log_file_path
	df = pd.DataFrame.from_csv(file_name)
	print(df.head())
	training_loss = df['loss']
	test_loss = df['val_loss']
	print(training_loss)
	print(test_loss)

	epoch_count = range(1, len(training_loss) + 1)
	plt.plot(epoch_count, training_loss, 'r--')
	plt.plot(epoch_count, test_loss, 'b-')
	plt.legend(['Training Loss', 'Test Loss'])
	plt.xlabel('Epoch')
	plt.ylabel('Loss')
	plt.show();

view raw Mnist_load_weights.py hosted with ❤ by GitHub

Results

Option #1 - Vanilla Model

Option #2 - Continue from Saved Weights

Option #3 - Continue from Saved Model H5 File

Happy Learning!!!

Day #304 - Analysis of Deep Fashion Dataset - LandmarkDetection

Three different poses

8 localization points only 4 is not null in all columns
Different visibility value options (0,1,2) - visibility: v=2 visible; v=1 occlusion; v=0 not labeled

One model for each category we need to do
The top-level has 3 generic categories:

1: “top” (upper-body clothes such as jackets, sweaters, tees, etc.)
2: “bottom” (lower-body clothes such as jeans, shorts, skirts, etc.)
3: “long” (full-body clothes such as dresses, coats, robes, etc.)

The implementation is defined in paper - Link
Data - Link

Architecture Implementation

Data Analysis of the Dataset for Non-Zero Columns

image_name 0
landmark_visibility_1 0
landmark_location_x_1 0
landmark_location_y_1 0
landmark_visibility_2 0
landmark_location_x_2 0
landmark_location_y_2 0
landmark_visibility_3 0
landmark_location_x_3 0
landmark_location_y_3 0
landmark_visibility_4 0
landmark_location_x_4 0
landmark_location_y_4 0
landmark_visibility_5 30972
landmark_location_x_5 30972
landmark_location_y_5 30972
landmark_visibility_6 30972
landmark_location_x_6 30972
landmark_location_y_6 30972
landmark_visibility_7 73003
landmark_location_x_7 73003
landmark_location_y_7 73003
landmark_visibility_8 73003
landmark_location_x_8 73003
landmark_location_y_8 73003

Non-zero columns

landmark_visibility_1 0
landmark_location_x_1 0
landmark_location_y_1 0
landmark_visibility_2 0
landmark_location_x_2 0
landmark_location_y_2 0
landmark_visibility_3 0
landmark_location_x_3 0
landmark_location_y_3 0
landmark_visibility_4 0
landmark_location_x_4 0
landmark_location_y_4 0

Happy Learning!!!

Day #303 - Model Training Guidelines - Part II

Here we will look at two more additions on top of the previous post

Save model h5 file after every run/epoch
Add Data batching to run in smaller iterations, Leverage Sequencer

	from __future__ import print_function
	import keras
	from keras.datasets import mnist
	from keras.models import Sequential
	from keras.layers import Dense, Dropout, Flatten
	from keras.layers import Conv2D, MaxPooling2D
	from keras import backend as K
	from keras.callbacks import ModelCheckpoint, CSVLogger, EarlyStopping
	import os
	import numpy as np
	import math

	batch_size = 128
	num_classes = 10
	epochs = 5

	log_file_path = r'E:\Landmark\mnist_training_log.log'
	model_checkpoint_path = r"E:\Landmark\\mnist.h5"
	model_save_path = r"E:\Landmark\\mnist_model_{}.hd5.h5"
	weights_filepath="E:\\Landmark\\mnist-weights-improvement-{epoch:02d}.hdf5"

	# input image dimensions
	img_rows, img_cols = 28, 28

	# the data, split between train and test sets
	(x_train, y_train), (x_test, y_test) = mnist.load_data()

	if K.image_data_format() == 'channels_first':
	x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
	x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
	input_shape = (1, img_rows, img_cols)
	else:
	x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
	x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
	input_shape = (img_rows, img_cols, 1)

	x_train = x_train.astype('float32')
	x_test = x_test.astype('float32')
	x_train /= 255
	x_test /= 255
	print('x_train shape:', x_train.shape)
	print(x_train.shape[0], 'train samples')
	print(x_test.shape[0], 'test samples')

	# convert class vectors to binary class matrices
	y_train = keras.utils.to_categorical(y_train, num_classes)
	y_test = keras.utils.to_categorical(y_test, num_classes)

	#Data Batching
	class Generator(keras.utils.Sequence):
	# Class is a dataset wrapper for better training performance
	def __init__(self, x_set, y_set, batch_size, datacount):
	self.x = x_set
	self.y = y_set
	self.batch_size = batch_size
	self.indices = np.arange(self.x.shape[0])
	self.idx = 0
	self.datacount = datacount

	def __len__(self):
	print('length')
	print(math.ceil(self.datacount/ self.batch_size))
	return math.ceil(self.datacount/ self.batch_size)

	def __getitem__(self, idx):
	print('idx')
	print(idx)
	i1 = idx*self.batch_size
	i2 = (idx+1)*self.batch_size
	print('Start-' + str(i1) + '- End-' + str(i2))
	if(i2 > self.datacount):
	i2 = self.datacount
	batch_x = self.x[i1:i2]
	batch_y = self.y[i1:i2]
	return batch_x, batch_y

	def on_epoch_end(self):
	np.random.shuffle(self.indices)

	#Save Model after every Epoch
	#https://stackoverflow.com/questions/54323960/save-keras-model-at-specific-epochs
	class CustomSaver(keras.callbacks.Callback):
	def on_epoch_end(self, epoch, logs={}):
	#if epoch == 2: # or save after some epoch, each k-th epoch etc.
	self.model.save(model_save_path.format(epoch))

	batch_size = 500
	print('x_train')
	print(len(x_train))
	print('x_test')
	print(len(x_test))

	training_generator = Generator(x_train, y_train, batch_size, len(x_train))
	validation_generator = Generator(x_test, y_test, batch_size, len(x_test))

	from keras.models import load_model

	#Load and Continue Training
	# load weights if it exists
	if os.path.exists(model_checkpoint_path):
	print('Loading Definitions')
	model = load_model(model_checkpoint_path)
	else:
	model = Sequential()
	model.add(Conv2D(32, kernel_size=(3, 3),
	activation='relu',
	input_shape=input_shape))
	model.add(Conv2D(64, (3, 3), activation='relu'))
	model.add(MaxPooling2D(pool_size=(2, 2)))
	model.add(Dropout(0.25))
	model.add(Flatten())
	model.add(Dense(128, activation='relu'))
	model.add(Dropout(0.5))
	model.add(Dense(num_classes, activation='softmax'))
	model.compile(loss=keras.losses.categorical_crossentropy,
	optimizer=keras.optimizers.Adadelta(),
	metrics=['accuracy'])

	#Add Early Stop and Checkpoint
	early_stop = EarlyStopping(monitor='val_loss', patience=5, verbose=1)
	checkpoint = ModelCheckpoint(weights_filepath, monitor='val_loss', verbose=1, save_best_only=True, mode='auto')
	csv_logger = CSVLogger(log_file_path, append=False)
	saver = CustomSaver()
	callbacks_list = [checkpoint,early_stop,csv_logger,saver]

	model.fit_generator(training_generator, validation_data = validation_generator, epochs = 10, callbacks=callbacks_list)

	score = model.evaluate(x_test, y_test, verbose=0)
	print('Test loss:', score[0])
	print('Test accuracy:', score[1])
	model.save(model_save_path)

	import pandas as pd
	import matplotlib.pyplot as plt
	# Plot the Loss
	file_name = log_file_path
	df = pd.DataFrame.from_csv(file_name)
	print(df.head())
	training_loss = df['loss']
	test_loss = df['val_loss']
	print(training_loss)
	print(test_loss)

	epoch_count = range(1, len(training_loss) + 1)
	plt.plot(epoch_count, training_loss, 'r--')
	plt.plot(epoch_count, test_loss, 'b-')
	plt.legend(['Training Loss', 'Test Loss'])
	plt.xlabel('Epoch')
	plt.ylabel('Loss')
	plt.show();

view raw Mnist_Guidelines.py hosted with ❤ by GitHub

This is a template code. This can be customized for larger datasets

Happy Learning!!!

December 15, 2019

Project Learning Notes

Tracking, Counting has always been quite interesting topic for sometime. Explored this codebase link

	#https://github.com/poojavinod100/People-Counting-Crowd-Density-Detection/blob/master/people_counter.py
	#pip install imutils
	#pip install CMake
	#pip install dlib

	python people_counter.py --prototxt mobilenet_ssd/MobileNetSSD_deploy.prototxt --model mobilenet_ssd/MobileNetSSD_deploy.caffemodel --output output/webcam_output.avi

view raw commands.txt hosted with ❤ by GitHub

I liked the approach of directionality based tracking. This is very needed for directionality based counting. Hoping to reuse / implement it in people counting scenarios.

My perspective is

Tracking by Sampling Frames (Reduce Load)
Use Euclidean and other attributes to track/match
Evaluate existing tracking built in OpenCV (Again these need frame by frame tracking)

Happy Learning!!!

December 13, 2019

Day #302 - Keras Best Practices during Training

In this post we take the raw version of code and add below features in code

Adding Checkpoint
Adding Logging
Plot Results
Restart Training from Checkpoint
Early Stopping

	#Base code - https://keras.io/examples/mnist_cnn/

	#Added Features
	#========================
	# Adding Checkpoint
	# Adding Logging
	# Plot Results
	# Restart Training from Checkpoint

	from __future__ import print_function
	import keras
	from keras.datasets import mnist
	from keras.models import Sequential
	from keras.layers import Dense, Dropout, Flatten
	from keras.layers import Conv2D, MaxPooling2D
	from keras import backend as K
	from keras.callbacks import ModelCheckpoint, CSVLogger, EarlyStopping
	import os

	batch_size = 128
	num_classes = 10
	epochs = 5

	log_file_path = r'E:\Landmark\mnist_training_log.log'
	model_checkpoint_path = r"E:\Landmark\\mnist.h5"
	model_save_path = r"E:\Landmark\\mnist.h5"
	weights_filepath="E:\\Landmark\\mnist-weights-improvement-{epoch:02d}.hdf5"

	# input image dimensions
	img_rows, img_cols = 28, 28

	# the data, split between train and test sets
	(x_train, y_train), (x_test, y_test) = mnist.load_data()

	if K.image_data_format() == 'channels_first':
	x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
	x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
	input_shape = (1, img_rows, img_cols)
	else:
	x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
	x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
	input_shape = (img_rows, img_cols, 1)

	x_train = x_train.astype('float32')
	x_test = x_test.astype('float32')
	x_train /= 255
	x_test /= 255
	print('x_train shape:', x_train.shape)
	print(x_train.shape[0], 'train samples')
	print(x_test.shape[0], 'test samples')

	# convert class vectors to binary class matrices
	y_train = keras.utils.to_categorical(y_train, num_classes)
	y_test = keras.utils.to_categorical(y_test, num_classes)

	from keras.models import load_model

	#Load and Continue Training
	# load weights if it exists
	if os.path.exists(model_checkpoint_path):
	print('Loading Definitions')
	model = load_model(model_checkpoint_path)
	else:
	model = Sequential()
	model.add(Conv2D(32, kernel_size=(3, 3),
	activation='relu',
	input_shape=input_shape))
	model.add(Conv2D(64, (3, 3), activation='relu'))
	model.add(MaxPooling2D(pool_size=(2, 2)))
	model.add(Dropout(0.25))
	model.add(Flatten())
	model.add(Dense(128, activation='relu'))
	model.add(Dropout(0.5))
	model.add(Dense(num_classes, activation='softmax'))
	model.compile(loss=keras.losses.categorical_crossentropy,
	optimizer=keras.optimizers.Adadelta(),
	metrics=['accuracy'])

	#Add Early Stop and Checkpoint
	early_stop = EarlyStopping(monitor='val_loss', patience=5, verbose=1)
	checkpoint = ModelCheckpoint(weights_filepath, monitor='val_loss', verbose=1, save_best_only=True, mode='auto')
	csv_logger = CSVLogger(log_file_path, append=False)
	callbacks_list = [checkpoint,early_stop,csv_logger]

	model.fit(x_train, y_train,
	batch_size=batch_size,
	epochs=epochs,
	verbose=1,
	validation_data=(x_test, y_test), callbacks=callbacks_list)

	score = model.evaluate(x_test, y_test, verbose=0)
	print('Test loss:', score[0])
	print('Test accuracy:', score[1])
	model.save(model_save_path)

	import pandas as pd
	import matplotlib.pyplot as plt
	# Plot the Loss
	file_name = log_file_path
	df = pd.DataFrame.from_csv(file_name)
	print(df.head())
	training_loss = df['loss']
	test_loss = df['val_loss']
	print(training_loss)
	print(test_loss)

	epoch_count = range(1, len(training_loss) + 1)
	plt.plot(epoch_count, training_loss, 'r--')
	plt.plot(epoch_count, test_loss, 'b-')
	plt.legend(['Training Loss', 'Test Loss'])
	plt.xlabel('Epoch')
	plt.ylabel('Loss')
	plt.show();

view raw Keras_Best_Practices.py hosted with ❤ by GitHub

Run #1 Output (10 Epochs)

Run #2 Continue with Existing Weights (5 Epochs)

Happy Learning!!!

December 11, 2019

Day #301 - Data Batching in Keras

This post is about custom data batching using Keras. Here we override the methods of inbuilt sequence. The below example is with dummy data generation, data splitting and fetching the batch of records.

	#Generate dummy data
	import pandas as pd
	import numpy as np

	#Generate 250 Records and Split into 6 Columns
	df = pd.DataFrame(np.random.randint(0,100,size=(250, 6)), columns=list(['X1','X2','X3','X4','Y1','Y2']))
	print(df.head())
	print(df.count())

	#Split into X and Y
	X = df[['X1','X2','X3','X4']]
	Y = df[['Y1','Y2']]

	print(X.head())
	print(Y.head())

	#Split data into train and test
	from sklearn.cross_validation import train_test_split
	x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.2, random_state=2)

	import keras
	import math
	class Generator(keras.utils.Sequence):
	# Class is a dataset wrapper for better training performance
	def __init__(self, x_set, y_set, batch_size, datacount):
	self.x = x_set
	self.y = y_set
	self.batch_size = batch_size
	self.indices = np.arange(self.x.shape[0])
	self.idx = 0
	self.datacount = datacount

	def __len__(self):
	print('length')
	print(math.ceil(self.datacount/ self.batch_size))
	return math.ceil(self.datacount/ self.batch_size)

	def __getitem__(self, idx):
	print('idx')
	print(idx)
	i1 = idx*self.batch_size
	i2 = (idx+1)*self.batch_size
	print('Start-' + str(i1) + '- End-' + str(i2))
	if(i2 > self.datacount):
	i2 = self.datacount
	batch_x = self.x[i1:i2]
	batch_y = self.y[i1:i2]
	return batch_x, batch_y

	def on_epoch_end(self):
	np.random.shuffle(self.indices)

	batch_size = 10
	print('x_train')
	print(x_train['X1'].count())
	print('x_test')
	print(x_test['X1'].count())
	training_generator = Generator(x_train, y_train, batch_size, x_train['X1'].count())
	validation_generator = Generator(x_test, y_test, batch_size, x_test['X1'].count())

	print('training_generator')
	#Compute batches for training data
	num_batches_train = x_train['X1'].count()/batch_size
	for batch_id in range(int(num_batches_train)):
	print('batch_id')
	print(batch_id)
	print(training_generator.__getitem__(batch_id))

	print('validation_generator')
	#Compute batches for test data
	num_batches_test = x_test['X1'].count()/batch_size
	for batch_id in range(int(num_batches_test)):
	print('batch_id')
	print(batch_id)
	print(validation_generator.__getitem__(batch_id))

	#Model Architecture, Layers, Compile
	#Model fitgenerator
	#Model Save Checkpoint

view raw Data_Batching.py hosted with ❤ by GitHub

Other strategies

Databases -> CSV 50K Data Chunks Records -> Training and Save Checkpoint
Checkpoint to Save for Each run and reuse for next 50K Chunk of Data

This is a classic data fetching solution. Database can store millions of records. We can fetch each batch, export it to a CSV and use each chunk, train and save checkpoint and continue further for next run.

Happy Learning!!!

December 01, 2019

Day #300 - Lessons Learnt from Multi-Label Classification

Today is 300th Post on Data Science. It has been a long journey. Still I feel there is a lot more to catchup. Keep Learning, Keep Going.

There are different tasks involved

1. Data Collection - Fatkun Batch Download Image chrome extension to download images
2. Script to reshape images and store in a standard format
3. Simple DB script to update and prepare data

	import pyodbc
	import os

	files = os.listdir(r'E:\Multi_Label\Input_Data\T-Shirt_Jeans')
	base_filepath = r'E:\Multi_Label\Input_Data\T-Shirt_Jeans'
	cnxn = pyodbc.connect(r'DRIVER={SQL SERVER};SERVER=XXXXX\SQLEXPRESS;DATABASE=DataGeneration;Trusted_Connection=yes;')
	cursor = cnxn.cursor()
	for file in files:
	filepath = base_filepath + '\\' + file
	cursor.execute("insert into Dataset([FileName],[Cap],[Jeans],[Jacket],[TShirt],[Shirt],[Pants]) values (?,0,1,0,1,0,0)",filepath)
	cnxn.commit()
	cnxn.close()

view raw pyodbc_insert.sql hosted with ❤ by GitHub

4. This base implementation was useful for model implementation link
5. Data Test Results

	#Test Code
	import cv2
	import os
	import numpy as np
	from keras.models import save_model, load_model

	test_dataset = r'E:\Multi_Label\Test'
	model = load_model(r'E:\Multi_Label\\model_multi_label.h5')
	print(model.summary())

	test_images = []
	arr = os.listdir(test_dataset)
	files = []
	for file in arr:
	path = test_dataset + '\\'+file
	img = cv2.imread(path,1)
	img = cv2.resize(img,(256,256))
	test_images.append([np.array(img)])
	files.append(path)

	i = 0
	for data in test_images:
	test_img_data = np.array(data).reshape(-1,256,256,3)
	result = model.predict(test_img_data)
	#print(result)
	item = ['1-Cap','2 Jeans','3 - Jacket','4-TShirt','5-Shirt','6-Pants']
	values = []
	values.append(result[0][0])
	values.append(result[0][1])
	values.append(result[0][2])
	values.append(result[0][3])
	values.append(result[0][4])
	values.append(result[0][5])
	print(files[i])
	if(float(result[0][0])>0.5):
	print('Cap')
	if(float(result[0][1])>0.5):
	print('Jeans')
	if(float(result[0][2])>0.5):
	print('Jacket')
	if(float(result[0][3])>0.5):
	print('Tshirt')
	if(float(result[0][4])>0.5):
	print('Shirt')
	if(float(result[0][5])>0.5):
	print('Pants')
	#print(values)
	i = i+1

	#E:\Multi_Label\Test\Cap_Jean.jpeg
	#Cap
	#Jeans
	#E:\Multi_Label\Test\Cap_Jean_2.jpeg
	#Cap
	#Jeans
	#E:\Multi_Label\Test\Cap_Jean_3.jpeg
	#Cap
	#Jeans
	#E:\Multi_Label\Test\Jacket_5.jpg
	#Cap
	#Jeans
	#E:\Multi_Label\Test\Jacket_WM2.jpg
	#Cap
	#Jeans
	#Jacket
	#E:\Multi_Label\Test\Jacket_WM3.jpg
	#Cap
	#Jeans
	#Jacket
	#E:\Multi_Label\Test\Jacket_WM4.jpg
	#Cap
	#Jeans
	#E:\Multi_Label\Test\Jacket_WML1.jpg
	#Cap
	#Jeans
	#E:\Multi_Label\Test\T_Shirt_Target.jpg
	#Cap
	#Jeans
	#E:\Multi_Label\Test\T_Shirt_Target1.jpg
	#Cap
	#Jeans

view raw Test_Results.py hosted with ❤ by GitHub

Happy Learning!!!

December 31, 2019

December 30, 2019

December 26, 2019

December 24, 2019

December 23, 2019

December 17, 2019

December 16, 2019

December 15, 2019

December 13, 2019

December 11, 2019

December 01, 2019

Git Code Repository

About Me

What is your Expertise

Search This Blog

Translate

About Me and Disclaimer

Labels

Data Science Good Reads

Cloud, Datacentre, BigData and NOSQL Blogs

SQL Links

Archecture Blog List

Programming Problems

Startup - Reads

Perl-Python-Ruby-Linux-Oracle

Management + Leadership Blogs

Research Papers & Podcasts

My Wordpress

Interesting Reads

Useful Links - C# and .NET

Java, Selenium, QTP and Test Tools Learning

Agile Testing

Reverse Logistics Reads

Biztalk Blogs

MS BI Links

Process - Learnt it :)

Usability Guidelines - Building Better Sites

.NET Test Tools and Other Interesting Reads

Review Checklist

Blog Archive

Live Traffic

Total Pageviews

Popular Posts