"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

September 21, 2018

Day #133 - Data sets, Data Challenges in Machine Learning

Google blogs and research papers are mother of all Data Analysis work. Rather than jumping directly executing pieces of code, Its very interesting to understand the perspective and practices for data collection and maintenance. Listed below are good summary from my readings from google papers / blogs

Practical advice for analysis of large, complex data sets

Technical - Ideas to Analyse Data
  • Look at distributions within data
  • Look for examples for validate understanding
  • Consider outliers
  • Check for consistency over time (Validity over period of time)
Process - Recommendations for Data Collection
  • Data collection setup
  • Reproducible
  • Exploratory Data Analysis
Social - Communicating your insights
  • Data Analysis starts with questions not with code or data
  • Accept ignorance and mistakes
  • Be skeptical
  • Educate Consumers
Crawling the internet: data science within a large engineering system
  • Identify and compute the refresh rate pattern and accordingly refresh data 
Machine Learning: The High-Interest Credit Card of Technical Debt
Very interesting article on data related risks / challenges.
  • Unstable Data Dependencies
  • Underutilized Data Dependencies
  • Legacy Features
  • Correction Cascades
  • When Correlations No Longer Correlate
Happy Learning!!!

September 18, 2018

Day #132 - Sequence Learning Paper

Understanding NLP and Deep learning requires understanding the research papers behind it. Listed below are readings and important points for my reference (copied from the paper)

Paper #1 - Sequence Learning 
Captured are summary of keypoints for Sequence Learning paper

Introduction
  • Multilayered Long Short-Term Memory (LSTM)
  • DNNs can only be applied to problems whose inputs and targets can be sensibly encoded with vectors of fixed dimensionality
  • We are mapping a sequence of words representing the question to a sequence of words representing the answer
  • LSTM learns to map an input sentence of variable length into a fixed-dimensional vector representation
Model
  • Map the input sequence to a fixed-sized vector using one RNN
  • The goal of the LSTM is to estimate the conditional probability
  • First, we used two different LSTMs: one for the input sequence and another for the output sequence
Paper #2 - Massive Exploration of Neural Machine Translation Architectures
https://arxiv.org/pdf/1703.03906.pdf

NMT - an end-to-end approach to automated translation
  • Based on an encoder-decoder architecture consisting of two recurrent neural networks (RNNs) and an attention mechanism that aligns target with source tokens
  • Shortcoming - amount of compute required to train them
NMT
  • Encoder-decoder architecture with attention mechanism
  • An encoder function fenc takes as input a sequence of source tokens x and produces a sequence of states h 
  • Decoder is an RNN that predicts the probability of a target sequence y
  • Decoder RNN also uses context vector - called the attention vector and is calculated as a weighted average of the source states
Attention Mechanism
  • Commonly used attention mechanisms are the additive 
  • Given an attention key h (an encoder state) and attention query s (a decoder state), the attention score for each pair is calculated
Paper #3 - NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE - https://arxiv.org/pdf/1409.0473.pdf

Happy Learning!!!

September 17, 2018

Day #131 - Dataset collection and Standardization process

A very important paper I came across today - Datasheets for Datasets

This paper provides key checklist for Data Collection. Some of the important sections to refer are
  • Dataset Composition
  • Data Collection Process
  • Data Preprocessing
  • Dataset Distribution
  • Dataset Maintenance
  • Legal & Ethical Considerations
A great reference to check back and use it.

Happy Learning!!!

Day #130 - Chatbot Architecture

Goal - Oriented Bots
  • Narrow Domain
  • Specific tasks
  • Example Call Center
  • Model - Retrieval Based
  • Use Predefined responses
General Chat bots
  • General Conversation
  • Generative Models
  • For general entertainment
  • Generate new responses
Sequence to Sequence
  • Incoming message - Encoder
  • Decoder for response
  • Attention or least reserved input
  • Have fixed length to Padding is done
Padding
  • EOS - End of Sentence
  • PAD - Filler
  • GO - Start Encoding
  • UNK - Unknown word not in vocabulary
Bucketizing
  • Opportunity to avoid padding by bucketizing
  • Place them in different batches for RNN
  • RNN to keep track of intent of conversations
Cons
  • To Dramatic responses
  • Based on Domain of data
Intents Clustering
  • Graph of different responses
  • Labels to cluster them
  • Propagate the knowledge to other labels of graph
  • Expander library is used for this purpose
Updated - Jule 2022

Interesting Reads - LaMDA: Language Models for Dialog Applications


Key Notes
  • Language Models for Dialog Applications
  • Metrics - (sensibleness, specificity, and interestingness)

  • BERT and GPT-3, it’s built on Transformer
  • Meena, a 2.6 billion parameter end-to-end trained neural conversational mode
  • At its heart lies the Evolved Transformer seq2seq architecture, a Transformer architecture discovered by evolutionary neural architecture search to improve perplexity.

Happy Learning!!!

Day #129 - Neural Network for Words

Bag of Words
  • Vectorize each word with one hot encoder vector
  • Bag of Words representation - Sum of the individual One hot encoded vectors
  • BOW - Sum of sparse one hot encoded vectors
Neural Network for words
  • Dense Representation
  • Each word represented by Dense Vector
  • Word2Vec Embedding - Done in unsupervised manner
  • Sum of word2vec is feature representation
  • Convolutional filters to compute 2-gram words
  • Similar words have similar cosine distance in word2vec
  • Good Embedding + Convolution we can get more high level meaning 
  • Maximum pooling over time (Just like we do in images) - Input Sequence - Convolutional filter - Slide in one direction - Maximum Activation - Select Output
Architecture
  • 3,4,5 gram window - For each ngram we learn 100 filters
  • Obtain embedding of input sequence
  • Apply multi layer perceptron on those 300 features
Paper - https://arxiv.org/pdf/1408.5882.pdf

Apply Convolutions for Text
  • One hot encoded characters
  • 1000 Kernels, 1000 filters
  • Apply same pattern Convolution - Pooling - Convolution - Pooling
  • Moving window with stride of two, Obtain pooling output
Encoder-Decoder Architecture
  • Attention Mechanism
  • Encoder - Hidden representation of input sentence (Encodes thought of sentence)
  • Types of encoders (RNN, CNN, Hierarchical structures)
  • Decoder - Decode the task / sequence from other language
  • LSTM / RNN encodes input sentence (End of Sentence token)
  • Decoding - Conditional Language Modelling
  • Feed output of previous state as input for next state
  • Stack several layers of LSTM model
  • Every state of decoder has three errors (Error from previous state, Error from context vector, Current input)
Encoder - Maps the source sequence to hidden vector (RNN)
Decoder - Perform Language modelling of given vector (RNN) but more inputs, three errors
Prediction - Conditional probability (Softmax)

Attention Mechanism
  • Powerful Technique in Neural Networks
  • Encoder has H states, Decoder - X states
  • Helps focus on different parts of sentence
Compute Similarities
  • Additive Attention
  • Multiplicative Attention
  • Dot Product
Local Attention 
  • Predict best place
Happy Learning!!!

September 14, 2018

Day #128 - NLP Basics - Demo and Notes

Found some code snippets for quick reference. Adding code examples and Basics Concepts for NLP Learning kit



Happy Learning!!!

September 09, 2018

Working on Research papers / GATE

It is absolutely important to follow your passion and burn yourself focusing on areas of interest, improving on your skills, balacing your work and learning.

A phd / title / tag need not be associated for pursuing such things. Small amounts of continuous focused learning effort is very imporant and key to move forward. Going forward will post my research / Gate learning from notes.

Reads - Ref1, Ref2

Happy GATE and Happy learning!!!

September 06, 2018

Day #127 - SSD, Yolo Paper Reading Notes

Only the key summary points. These are selected lines (copied) for my quick reference and understanding

Yolo Notes
  • Resize Image 
  • Run CNN (A single convolutional network simultaneously predicts multiple bounding boxes and class probabilities for those boxes)
  • Non-max Suppression
Alternative Techniques
  • Sliding window and region proposal-based techniques
Implementation Details
  • YOLO sees the entire image during training and test time so it encodes contextual information about classes as well as their appearance
  • Our system divides the input image into a S × S grid
  • If the center of an object falls into a grid cell, that grid cell is responsible for detecting that object.
  • Each bounding box consists of 5 predictions: x, y, w, h, and confidence
  • Network architecture is inspired by the GoogLeNet model for image classification
  • YOLO predicts multiple bounding boxes per grid cell
Limitations of YOLO
  • Struggles to generalize to objects in new or unusual aspect ratios or configurations
Other Detection Systems
  • Haar, SIFT, HOG, convolutional features
What is Non-max Suppression

All modern object detectors follow a three step recipe:
(1) proposing a search space of windows (exhaustive by sliding window or sparser using proposals),
(2) scoring/refining the window with a classifier/regressor, and
(3) merging windows that might belong to the same object.

Non-Max Suppression - The algorithm greedily selects high scoring detections and deletes close-by less confident neighbours since they are likely to cover the same object

R-CNN [10] - Replaced features extraction and classifiers by a neural network

Related work - Viola&Jones, deformable parts model (DPM), clustering algorithms, mean-shift clustering, agglomerative clustering, affinity propagation clustering

Deformable parts models. Deformable parts models (DPM) use a sliding window approach to object detection

R-CNN. Region proposals instead of sliding windows to find objects in images. Selective
Search [34] generates potential bounding boxes, a convolutional network extracts features, an SVM scores the boxes, a linear model adjusts the bounding boxes, and non-max suppression eliminates duplicate detections.

SSD Notes

Discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location
  • The core of SSD is predicting category scores and box offsets for a fixed set of default bounding boxes using small convolutional filters applied to feature maps
  • Based on a feed-forward convolutional network that produces a fixed-size collection of bounding boxes and scores for the presence of object class instances in those boxes
  • Ground truth information needs to be assigned to specific outputs in the fixed set of detector outputs
These papers I need to revisit next couple of months to understand it better.

Happy Learning!!!

Day #126 - Deep Learning Class Notes


Lesson 8: Deep Learning Part 2 2018 - Single object detection

Advice
  • If you have not come across doesn't mean its hard
  • Type it out yourself all code everytime
  •  Don't wait to be perfect before you start communicating
Neural Network Architecture
  • Dataset
  • Newtork Architecture (Number of convolution layers, pooling, dropouts, activation functions)
  • Loss Function
Flow of Architecture for Single Object
1. I/p Image
2. ConvNet
3. Output Tensor vector

Flow of Architecture for Multiple Object
1. I/p Image
2. ConvNet
3. Output Tensor vector
4. 16 set of outputs

Notes
  • Bounding Boxes
  • Take Labelled data and generate classes
  • Labeling is expensive 
  • Pascal VOC Dataset
  • Bounding Box with coordinates, category, image_id
Steps (Pytorch coding)
  • Build Classifier
  • Finding biggest object in each image and classify
  • Go through each bounding box in image
  • Get Largest One
  • Using Restnet to Classify
  • Model with 4 activations, mean square loss functions
  • Multiple label classification
  • Add Rotations, Flips, Constrast Changing
Architecture
  • Flatten
  • RELU
  • Dropout
  • Linear
  • Batch Normalization
  • Dropout
  • Linear
  • Loss functions
SSD
  • Single Shot Detection
  • Conv2D
  • Number of anchor boxes
Analysis
  • Transfer learning is done on top of it 
  • Identify the highlighted sections
My Thoughts
  • Perform Segmentation
  • Pick the objects
  • Train and Classify them
To Learn Items
  • Python Debugger pdb.set_trace()
  • Detail Specific Code Walkthrough
  • lambda functions in Python
Adam
  • Momentum on gradient
  • Past Squared gradient
Happy Learning!!!