"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

February 28, 2020

Career Lessons

I have gone through cycles of ideas, initiatives rejected, put down as it flows through leadership levels. Many times it got delayed but there were few memorable successes in #rewrite #warranty to XBOX, filing #retail patents, pushing ideas. I have observed myself going through frustration, rejection cycle. I was able to achieve those ideas in my next role the same/next company. It takes time to prove/demonstrate our ideas have potential. Sometimes we need to wait/ further sharpen our skills/ till the next role / next company to make the idea successful.
  • Sometimes your best ideas will have no future, they will get killed. Keep going
  • Keep versioning all your ideas and add techniques to improve upon it
  • Build domain knowledge + AI to solve it optimally
  • When people kill ideas, find a place to grow if you believe in your ideas
  • 'Go' where you 'Grow', 'Grow' where you 'Go'
More Reads - Link

Accidental Leaders - (July 16th, 2020)

IT has a lot of accidental leaders. Years of experience may not reflect competency. Some categories of leaders

Idea Killers - Any idea you bring up to the table. The intention is to play a safe game. They view IT projects as sailing in smooth weather. Very less interest in innovation and pure 9-6 safe side players.

Jargon Gurus - Highly qualified, great connection with their ladder up. Any idea you take up they will provide a counter idea for it. Ultimately their goal is to prove the idea is not good enough to pursue

Enthusiast Leaders - They don't know about technical aspects but get carried away with wow factors. Their shortsightedness will not give them long term perspective

Passionate Leaders - Talk on your face, encourage the idea. Warn you when things fail but really back you when it fails. They are hard to work with but they take the organization to the next level.

The IT industry is a very hard industry to spot true leaders. All leaders who speak may not be good at execution. All leaders who deliver may not be good presenters. Keep going. Build the true leader in your inner self - Siva


P and L in Career
Yes, you guessed it right, It's Profit and Loss. Naaaa. My perspective is Passion and Learning.

Profit and Loss Perspective
  • Goal is reach title / position / salary
  • What is in it for me
  • What's my next role
  • Sell better
Passion and Learning
  • Build expertise
  • Continuously find new ways to implement ideas
  • My vision will succeed one day
  • Share credit and learning
  • Customer needs to be happy
  • It's okay to fail 
#HappyLearning
#Keep Thinking!!!!

February 27, 2020

Evolution of Data Storage, Analysis, Analytics - Database - DW - DataLake

2000 - 2010
  • Stage 1 - Papers, Ledgers
  • Stage 2 - Excel, Access
  • Stage 3 - Databases for OLTP / DW for OLAP (ACID Properties)
Knowledge -  I know how my business is performing as of today. I know the past 6 months of historical data and performance

2010 - 2015
  • Stage 4 - Hadoop for large scale DW (Velocity, Volume, Veracity)
  • Stage 5 - NoSQL (CAP Theorem)
Knowledge -  I store all possible data without strict data type validations, can query in large scale data for Adhoc queries. Schema on Read, Finding insights from unstructured data (logs, text, records, events)

2015 - 2020
  • Stage 6 - AI to extract  interpretable data from Image, Video, Sound (Numbers, Object classification, Count, etc)
  • Stage 7 - lakehouse = (Hadoop + RDBMS + NoSQL + AI for data extraction from unstructured sources)
Knowledge - I have insights from structured, unstructured, vision, audio and every type of data. Handling all types of data and finding meaningful insights.

CAP Properties and Databases


More Reads
Spanner Reads

Happy Learning!!!

February 26, 2020

IoT Architecture

  • Edge - Lightweight protocols for Machine to Cloud communication - MQTT (Lightweight pub/sub) - Immutable data - Read-Only - Data cannot be modified
  • Cloud Entry - Large scale data ingestion to consume data - Kafka - (Distributed Log processing)- Immutable data - Data cannot be modified
  • Cloud Streaming - Real-time data analysis to report and alert - Spark - (RDDs / ML ) - Immutable data - Data cannot be modified
  • Store and Analyze - Reports on data, Completed transactions - Postgres, RDBMS - Do the Remaining CRUD
  • ML on Edge, ML on Streaming data, ML on Stored data (completed transaction)
Happy Learning!!!

Queues vs Logs

Queues - Someone will publish a message in the broker, Consumers can read from the queue. Long back I worked on this on SQL Server Message Broker
Logs - All the information/transactions in SQL is implemented as WAL (Write ahead logging). At some point when a checkpoint is reached the transactions are written to disk. Commands are applied and data changes saved to disk.

How logs can be used?
  • Read the transaction and replay it elsewhere (Allows multiple consumers without blocking each other)
  • Keep logs read-only and let everyone read it (persist it as long as needed)
  • Read information in logs in sequence (maintain sequence to replay it in order)
So, Logs can be read across multiple readers and it enables scaling :)
Tools have evolved but the fundamentals are the same. Kafka is similar to a log playback system (distributed log processing) which helps to scale, publish and consume data.

Happy Learning!!!

Telecom ML Use cases

  • Data - massive amount of network performance data 
  • Broad Areas - Network optimization, Preventive maintenance, Virtual Assistants and Customer Experience
  • AI Focus Areas - increase in data traffic, identifying potential problems in the network, best customer experience, AI bot, virtual assistants, Smart Home customer experience, current health of our network, predict a battery failure in telecommunications equipment 
  • AI for security management - detect the spread of viruses, the activation of unknown attacks as well as data and information exfiltration.
Machine Learning for Networking: Workflow, Advances and Opportunities

Happy Learning!!!

February 25, 2020

MQTT vs Kafka Notes

MQTT (Message Queue Telemetry Transport)
  • MQ Telemetry Transport
  • The choice for wireless networks
  • Publish / Subscribe system
Key Concepts
  • MQTT Session - Connection, Authentication, Communication and Termination
  • Client Operations - Publish, Subscribe, Unsubscribe, ping
  • Multiple implementations of client libraries and brokers (Mosquitto, JoramMQ...) exist and are virtually compatible
  • MQTT just specifies the transport, and vaguely the application part (i.e. how data is handled and possibly stored, how clients are authorized...)
  • Standard pub/sub protocol (with multiple implementations)
  • MQTT as a communication protocol between several applications. It was designed to be extremely low light to fit into IoT and resource-constrained environment
Competing Tools
  • Constrained Application Protocol (CoAP) 
  • Simple Media Control Protocol (SMCP) 
MQTT Recommendations
  • Machine-to-Machine (M2M) communication
  • MQTT is designed for low-power devices
  • MQTT purpose is to hold a communication channel alive on client-side without draining battery and to have a reliable messaging
  • The edge devices speak MQTT protocol (for the benefits it has in edge environments). 
  • Very easy to configure and use with open source tools, Lightweight with a relatively small data footprint, Varying levels of Quality of Service to fit a range of
Kafka
  • The main motive behind Kafka is scalability.
  • Apache Kafka is a message broker based on an internal "commit log": its focus is storing massive amounts of data on disk, and allowing consumption in real-time or later (as long as data is still available on disk)
  • It's designed to be deployable as cluster of multiple nodes, with good scalability properties. Kafka uses its own network protocol.
  • Kafka has no built in msg priority, poor security, heavy protocol
  • Apache Kafka may deal with high-velocity data ingestion
  • Kafka depends on Zookeeper in order to work properly
  • Kafka is better suited for microservices
  • Kafka is a messaging broker with transient store which consumers can subscribe and listen to. It's an append only log, which consumers can pull from.
  • Specific message storing/distributing software, vaguley of the same family with its own protocol.
  • Kafka is broker that can store large volume of data and for long time (or for ever). It was designed to be scalable and provide the best performances. 
  • High-throughput, Distributed, Scalable, High-Performance, Durable, Publish-Subscribe, Simple-to-use
  • Advantage of Kafka's strengths (replayability, based on an even sourcing architecture)
IoT environments combine both MQTT and Apache Kafka.


Apache Kafka is the New Black at the Edge in Industrial IoT, Logistics and Retailing
MQTT Overview
IoT Data Platform
MQTT vs Kafka Stackoverflow
MQTT vs Kafka Cloudera
MQTT vs Kafka Stack Share

Happy Learning!!!

AI Magic - AI Use Cases


The stories and magic are the AI use cases we witness today
  • Recommendations we provide in eCommerce sites (Bought together, Sold together)
  • Chatbot assistance we provide with NLP, Sentiment Analysis of Product Reviews
  • Forecasting to stock up requested inventory levels
  • Analytics provides insights on who shops in which stores, which brings efficiency to the supply chain and improves customer service
Keep Thinking!!!

Day #329 - Face Re-Identification

First Approach (Blog Posted)
1. Haar based face extraction
2. Dlib for comparison
3. Custom models for Age / Gender

Second Approach (Blog posted)
Custom openvino models

Third Approach 
This link was useful

pip install easyfacenet

Key Concepts
1. Alignment
2. Feature Extraction
3. Comparison
Happy Learning!!!

Day #328 - Simple masking function

Next task is segmentation. Basic Masking function.

Happy Learning!!!

Day #327 - Keras basic experiments

Revising basics again. Example code of MNIST in Sequential vs Functional vs Multi-Inputs based approach. Already we have posted a few best practices/guidelines notes. These are few more handly experiments.

Happy Learning!!!

February 23, 2020

Interesting Data Science Questions and Answers from Data Science Stack Exchange

Interesting Data Science Questions and Answers from Data Science Stack Exchange

Question #1 When is a Model Underfitted?
Answer

Question #2 What makes columnar databases suitable for data science?
Answer

Question #3 Is it necessary to standardize your data before clustering?
Answer

Question #4 The difference of Activation Functions in Neural Networks in general
Answer

Question #5 Why Is Overfitting Bad in Machine Learning?
Answer

Question #6 Why do convolutional neural networks work?
Answer
ConvNets work because they exploit feature locality. They do it at different granularities, therefore being able to model hierarchically higher level features. They are translation invariant thanks to pooling units. They are not rotation-invariant per se, but they usually converge to filters that are rotated versions of the same filters, hence supporting rotated inputs. I know of no other neural architecture that profits from feature locality in the same sense as ConvNets do.

Question #7 Why are Machine Learning models called black boxes?
Answer

Question #8 How do you visualize neural network architectures?
Answer

Question #9 Is there any domain where Bayesian Networks outperform neural networks?
Answer
One of the areas where Bayesian approaches are often used, is where one needs interpretability of the prediction system. You don't want to give doctors a Neural net and say that it's 95% accurate.

Question #10 What are deconvolutional layers?
Answer
Yes, a deconvolution layer performs also convolution! That is why transposed convolution fits so much better as name and the term deconvolution is actually misleading

Question #11 Does batch_size in Keras have any effects in results' quality?
Answer
Batch size impacts learning significantly. What happens when you put a batch through your network is that you average the gradients. The concept is that if your batch size is big enough, this will provide a stable enough estimate of what the gradient of the full dataset would be. By taking samples from your dataset, you estimate the gradient while reducing computational cost significantly. The lower you go, the less accurate your esttimate will be, however in some cases these noisy gradients can actually help escape local minima.

Question #12 How does Keras calculate accuracy?
Answer

Question #13 What is the significance of model merging in Keras?
Answer

Deep learning basics

Happy Learning!!!

ML Design Perspectives - Code / Infra - Video Analytics

Performance
  • Capture Model Execution Time
  • Running Parallel instances of models
  • Validating Max frames per second support
  • Logging Results in DB, Designing for reports
  • Real-time notifications vs Batch Reports based design
  • Having sufficient cloud / inhouse infra, Running beta tests real-time
Debugging
  • Logging between API calls and Model Execution
  • Infra to capture true positives / false positives to act accordingly
Keep Thinking!!!


February 22, 2020

Git Walk through Examples

1. Create Repository logging into git
2. Add the required code in locally cloned git


3. Add example code file with function and update the code


4. Create branch and checkout code


5. Update code in branch

6. Create Another directory and update code in another branch
7. Overall two branches and two versions of same file


Merge using GIT GUI to master, resolve the conflicts. Simple walkthrough of GIT code management.

Large Commits








Useful tip - Link
1- git stash
2- git add .
3- git commit -m "your commit message"
Force push - git push -f 

More to do

GIT Project Cycle

Git base project
branch 1 dev
branch 2 test
branch 3 uat
branch 4 prod

Checkin Dev code
Move code to QA Branch
Do a Bug fix in QA
Merge QA to Dev

Move QA code to UAT Branch
Do a Bug fix in UAT
Merge UAT to QA, Dev

Move UAT to Prod
Deploy in Prod

1. //pull the latest changes of current development branch if any
git pull (current development branch)

2. //switch to master branch
git checkout master 

3. //pull all the changes if any
git pull

4. //Now merge development into master    
git merge development

5. //push the master branch
git push origin master

Good Read in same lines - Link



Happy Learning!!!

February 20, 2020

ML one liners - Examples

List of Common Machine Learning Algorithms

Un Supervised Models
  • K Means Clustering Algorithm - Clustering similar groups based on selected features (Un-Supervised)
  • Nearest Neighbors - Anamoly detection, Identifying outliers (Un-Supervised)
Supervised Models
  • Naïve Bayes Classifier Algorithm - Generative, Example - Text mining, Probability article belongs to a particular category (Supervised)
  • Support Vector Machine Algorithm - Classification (Supervised)
  • Linear Regression - Predictive Analytics (Supervised)
  • Logistic Regression - Classification / Discriminative (Supervised) Ex- Theft will happen in a store
  • Decision Trees - Feature Bagging (Supervised)
  • Random Forests - Feature Bagging. Forest of Decision Trees (Supervised)
  • Apriori Algorithm - Frequent itemsets (Market basket analysis – Data Mining) 
Happy Learning!!!

Day #326 - Plumbing my old Notes - Selling Pitch for Retail AI Use Cases (2016 Notes)

How I tried to influence my Ex-Organization to implement Analytics.
  • Problem Statement
  • Technicality
  • Business Value
  • Next Logical Steps 





This way done a few years back (2016). I liked the way I tried to educate, suggest and sell. Those were the learning / transformational moments in my Life!!!

Happy Learning!!!

Day #325 - Time Series Forecasting Models

Models learn below features from input dataset and predict accordingly
  • Trend
  • Seasonality
  • Smoothing Factor
Models
  • Moving Average - Leverage recent weeks data
  • Weighted Moving Average - Give varying weights for recent data
  • Single Exponential - Smoothing factor added in the cost function
  • Double Exponential - Smoothing & Trend
  • Triple Exponential - Smoothing, Trend & Seasonality
  • ARIMA - Auto Regression Integrated Moving Average
  • Linear & Polynomial Models - Based on Features collected
Happy Learning!!!

Day #324 - Plumbing Old 'R' Code

What I like in my Approach
  • R Example with 
  • Data Normalization
  • Cluster Data
  • Build Regression on top of Clustered Data


Happy Learning!!!

Interesting Stats

Why Israel Tops Innovation?

The number of Researchers per million is 8K.
Which nationalities spend more time in Social Media


Happiest Countries


Insights
  • Top 10 Happiest Countries <> Top 10 Countries spending time in Social Media
  • Happiness is not spending time in Social Media or watching others Lives
  • Education quality is not determined by Multiple Choice Questions but how it motivates the whole population and takes them to the next level
Keep Going!!!



February 18, 2020

Things to do during vacation before next move

  • Feels like Summer Vacation, We don't forget things when we take a break. Its to unlearn, relearn, rejuvenate ourself
  • No Deadlines is Freedom. Able to work on things you wanted to do. Clearing technical debts, meeting people, spending time with your parents
  • Complete your pet projects
  • Airplane mode - Reduce small talk/gossips. Focus on making it meaningful
  • Prepare your health, mind for the next role. Since it's going to be again the learning curve, execution, focus on keeping your morale, health high
Keep Going!!!

Day #323 - Tech Talk Series - Data Science

Session #1 - Feature Engineering for Tabular Data

Key Notes
  • Column Aggregates
  • Independent Columns
  • Derive New Features
  • Target Encoding (Categorical Features)
  • Global Feature Encoding (Categorical Features)
Derive Features
  • Time - Months, Years, Days, WeekDays, Periods, Distance
  • Missing or Not Missing
  • Numerical Feature - Scale change, log, exp (Feature Transformations)
  • Integer Value, Decimal Value, Mod, Dividend
  • Categorical - Merge, One Hot Encoding
  • Group Features, Time them, divide them
  • Ratio Conversions
  • Binning Columns
  • Remove Outliers
  • Cluster Data and Perform Regression on it








Talk #2 - ML for Optimization Problems

Key Lessons
  •  Maximum Something (reward), Minimise something (cost)
Solution Approach
  • Linear Optimization
  • Linear Objective and set of Linear constraints
  • Dynamic Optimization (Reinforcement Learning)
  • Non Linear Optimization (Generic Algos, Simulations)
Simulation Optimization
  •  Build simulation of real life problem
ML
  • Simulate with decision variables










Happy Learning!!!

February 17, 2020

Datascience Perspectives

I have worked in #database (Transactional Data), #businessintelligence (Historical Data) and then added science on top of data,  #datascience (Future Predictions). Everything is #connectedknowledge. #datascience sits on top of domain and data knowledge. Any #discipline can step up into #datascience with adequate #domain, #dataknowledge, and #upskilling

#ROI in data science is #valueaddition #preparedness for the future. For example, In the case of #salespredictions, It is the preparedness to address/meet the required sales numbers.  #Keepthinking #Moreperspectives

Keep Learning!!!

Day #322 - Timeseries 101

Key Notes
  • Data points indexed in time order is Time Series Order
  • Observations measured over time (Regular Interval Times)
Properties of Time Series
  • Level - Average value (mean) of the series
  • Trend - Gradual upward or downward movements of data over time
  • Seasonality - Variation that repeats itself over time (Holidays, Promos)
  • Cycles - Business Cycles, Economic Cycles, etc
  • Randomness - Variation that cannot be explained by trend/seasonality / caused by chance

Time Series Decomposition
  • Level + Trend + Seasonality
  • (Level + Trend) * Seasonality

Stationary
  • All part of time series mean-variance and seasonality remain constant over time

Forecasting (Questions to Ask)
  • What are you trying to predict
  • Do you know how the measurements were taken
  • Handling missing values (Null, Moving Average)
  • Seasonality / Trend
  • Shape of Data
  • Assumptions being made




Feature Engineering
  • Mean Every 2 Weeks
  • Incremental Differences Everyday
  • Mean / Variance based Features
  • Rolling Window with adjusting training size




ARIMA - Auto Regression Integrated Moving Average, Hard to fine-tune
Single Exponential Smoothing, Double Exponential Smoothing, Holt-Winters Exponential Smoothing

Building Data
  • Time-based values - Hour of Day
  • Week based values - Week Count
  • Adding Seasonality  
  • Adding Promo
  • Adding Averages / Means - Weekly Average, Daily Average
  • Artificial X (Index from 0 to N)







Feature Generation List
  • Level + Trend + Seasonality
  • (Level + Trend) * Seasonality
  • Mean Every 2 Weeks
  • Incremental Differences Everyday
  • Time-based values - Hour of Day
  • Week based values - Week Count
  • Adding Seasonality  
  • Adding Promo
  • Adding Averages / Means - Weekly Average, Daily Average

Happy Learning!!!