"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

October 11, 2021

BERT QnA Example

 Some examples are very good to pick on ideas and customize as we need. BERT based QnA Example



Text Clustering - Did a decent job to cluster based on JD Types - Cloud, Server, ML etc..



Unsupervised NER using BERT

Document search with fragment embeddings

Finbert

Finbert

Keep Exploring!!!

October 10, 2021

Forecast - Planning - Recommendations - Paper Reads

Paper #1 - Maximizing Store Revenues using Tabu Search for Floor Space Optimization

Key Notes

  • Floor space is a valuable and scarce asset for retailers
  • Connected multi-choice knapsack problem with an additional global
  • constraint and propose a tabu search based metaheuristic that exploits the
  • multiple special neighborhood structures
  • Over the last decade, the number of products competing for limited space increased by up to 30%
  • The product mix of categories, merchandising rules, sales patterns and characteristics of display furniture 
  • (1) develop a statistical model to measure the space elasticity; and 
  • (2) formulate and solve an optimization problem for each store to determine the optimal assignment of planograms to maximize total revenue subject to certain business constraints

Paer #2 - Reversing ShopView analysis for planogram creation

Key Notes

  • ShopView can build the planogram without the need of manually creating it in software
  • OCR in the identification of products
  • Planograms specifies the absolute physical locations of the products, and the amount of space each type of product should occupy
  • Planogram compliance using template images
  • Vision - Object Recognition based on attributes, Template and Feature Matching, Optical Character Recognition (OCR)
  • Custom Dictionary - Implementing a custom dictionary for the OCR engine seemed a good strategy since at first glance it would improve the performance of the OCR algorithm

Paper #3 - Deep Learning based Recommender System: A Survey and New Perspectives

Key Notes

  • Collaborative €ltering makes recommendations by
  • learning from user-item historical interactions, either explicit (e.g. user’s previous ratings) or implicit feedback (e.g. browsing history)
  • Content-based recommendation is based primarily on comparisons across items’ and users
  • Hybrid model refers to recommender system that integrates two or more types of recommendation strategies
  • Strengths of deep learning based recommendation models - Nonlinear Transformation, Sequence Modelling

Paper #4 - Fashion Retail: Forecasting Demand for New Items

Key Notes

  • Merchandising Factors - Discount, Visibility, Promotion
  • Derived Features - Age of Style, Trend and Seasonality, Cannibalisation

Paper #5 - Time Series Forecasting With Deep Learning: A Survey

More Reads

Keep Exploring!!!

October 09, 2021

Technology - Consulting - Coding - Domain Learning

At a senior role, what are things we can accomplish. I agree with the perspective and the work that is called out Link 

Technical Work

  • Review for technically design/architecture
  • Analyze for with security/scalability of design 
  • Collaborate with other technical teams to agree on interfaces and common APIs

People Work

  • 1-1s on a weekly basis
  • Regular feedback

Plus my own additions

  • Patenting / Knowledge Sharing
  • Building your point of view
  • Be on top of tech - Code as you need

Ongoing

  • Teaching, mentoring and coaching
  • Technical conversations and reviewing designs

Plus a perspective on mastering technology vs domain I like this article

Adding my top reasons to solve problems and not to master tech - Work On Interesting problems not Technologies

  • Ideas take time and need refinement
  • As you keep coding, you keep building perspectives
  • Working prototype creates more interest/excitement and keep improving
  • Your interest will not die down as you are solving newly known challenges
  • Scope, features you will balanace when you spot the unknowns
  • Its your idea you will not kill it :)

On WLB - Link

  • We collectively create the culture we live in, changes comes from healthy WLB
  • 20% of your work produces 80% of your value. Prioritize over priorities

Myth of super performers

I loved the below lines, I have seen this specific behavior. People who deliver but do not share, collaborate within the team. Adding my own perspectives

  • X is the only developer who gets anything done
  • Do not actively share knowledge with his peers
  • Good at communication but bad at collaboration
  • Explain simple things in a complicated way
  • Good, connect at Leadership Level. Over-communication at the leadership level, limited collaboration at ground level
  • Instead, make more people productive will reap the greatest benefits
  • Turn our attention from individuals to groups of people
  • Don’t mistake humility for ignorance - There are a lot of software engineers out there who won’t express opinions unless asked
Agile principles alternative definition
  • Empathy for customer needs
  • Actually getting stuff done
  • A bird’s-eye view of the product vs market 
  • Able to balance birds-eye view to product view vs component view
More Reads

20 Things I’ve Learned in my 20 Years as a Software Engineer

Keep Thinking!!!

Dark side of profits

  • The dark side of  analytics - mobile apps - facebook - youtube - amplify the #engagement for the sake of profit 
  • If you aren't the paying customer, you are the product. #google #facebook #android
  • Anger/ Hate / Excitement / Drugs creates dopamine addiction and keeps the conversation going
  • For the cab sharing, delivery partners - The illusion of guaranteed income while the variable incentives seem attractive initially but mental and physical costs would take a big toll soon
  • High dopamine low effort entertainment (video games, drugs, porn, Netflix), it becomes the default way to spend leisure time really quickly

I will let this happen to my own kid vs can I leverage everything outside my home as an untapped market.

As an end-user think

  • How much time does Zuck spend on FB every day
  • Will they let their kids spend so much time FB that a typical teen does
Until we recognize we are in this trap of low-cost internet, virtual addiction you will never come out of this virtual trap - low-cost mobile phone, free internet, engagement vs leaving away your goals in life.

The same happens in every other domain, Why do restaurants don't hesitate to use outdated/expired products in their food. It boils down to one's own integrity vs profits. 

How I Met & Surpassed My Career Goals While Following One Actionable Rule

Keep Thinking!!!

Perspective of Learning

During schools days

  • Why should I learn?
  • Life will be same, I will become driver / cleaner ?
  • What are ways to quit education, All my friends started working
  • I don't apply anything I learn why should I learn ?
  • My dad work vs what he learns is not connected

Now

  • Learn to know state of art
  • Learn to design things 
  • Solve business problems in your own way
  • Learn to review others work
  • Learn to have good domain knowledge
  • Learn to be employable, do better contributions

Now I learn more sincerely than my school days :) :) :)


NLP - NER - Papers

Paper #1 - Recent Trends in Named Entity Recognition (NER)

Key Notes

  • ‘Named Entity Recognition’ refers to identifying person, organization, location
  • NER belongs to a general class of problems in NLP called sequence tagging 
  • Prominent supervised learning methods - Hidden Markov Models (HMM), Decision Trees, Maximum Entropy Models (ME)

  • Unsupervised clustering method using lexical resources eg. Wordnet

Paper #2 - A Survey on Deep Learning for Named Entity Recognition

Key Notes

  • Rule-based approaches, which do not need annotated data as they rely on hand-crafted rules
  • Unsupervised learning approaches, which rely on unsupervised algorithms Feature-based supervised learning approaches, which rely on supervised learning algorithms

  • 71% of search queries contain at least one named entity



  • word-level representation  - continuous bagof-words (CBOW) and continuous skip-gram models
  • Commonly used word embeddings include Google Word2Vec, Stanford
  • GloVe, Facebook fastText and SENNA.
  • CharNER considers a sentence as a sequence of characters and utilizes LSTMs to extract characterlevel representations.
  • Besides word-level and character-level representations, some studies also incorporate additional information (e.g., gazetteers [18], [108], lexical similarity [109], linguistic dependency [110] and visual features [111]) into the final representations of words



Paper #3 - Document Ranking for Curated Document Databases using BERT and Knowledge Graph Embeddings: Introducing GRAB-Rank

  • Key Notes
  • Curated Document Databases (CDD) play an important role in helping researchers find relevant articles in scientific literature
  • Document ranking has been extensively used in the context of document retrieval
  • Recent work on Learning to Rank (LETOR) has used word embeddings of various kind as the input
  • Word embeddings can be learnt from scratch or a pre-trained embedding model can be adopted
  • A popular algorithm for generating vector representations of words is GloVE (Global Vectors for Word Representation), an unsupervised learning algorithm that operates by aggregating global word-word co-occurrence statistics
  • Semantic document ranking models take into account the context of terms in relation to their neighbouring terms
  • Context of the word “bank”, either as: (i) an organisation for investing and borrowing money, (ii) the side of a river or lake, (iii) a long heap of some substance
  • A popular choice of pre-trained contextual model is the Bidirectional Encoder Representations from Transformer (BERT)
  • An alternative contextual model that can be used is the embeddings from Language Model ELMo
  • A knowledge graph is a collection of vertices and edges where the vertices represent entities or concepts, and the edges represent a relationship between entities and/or concepts. 

  • OIE4KGC (Open Information Extraction for Knowledge Graph Construction)

More Reads

October 05, 2021

Time series variables and Insights

Time series variables and Insights

Good read - Link

  • Discrete variables - Discrete data is information that can only take certain values. Discrete data refers to individual and countable items (discrete variables). Countable, Point in time data (Bank balance). Looks like clusters, points. The number of customers who bought different items. The number of computers in each department. The number of items you buy at the grocery store each week
  • Continuous Variables - Continuous data is data that can take any value. Takes any measured value within a specific range. Height, weight, temperature and length are all examples of continuous data. Some continuous data will change over time. Looks like line graphs, continuous.
  • Univariate analysis is the simplest form of data analysis where the data being analyzed contains only one variable
  • Bivariate data – This type of data involves two different variables. The analysis of this type of data deals with causes and relationships and the analysis is done to find out the relationship among the two variables.
  • Multivariate analysis is the analysis of three or more variables.
    • Multiple linear regression
    • Multiple logistic regression
    • Multivariate analysis of variance (MANOVA)
    • Factor analysis
    • Cluster analysis
    • The aim of multivariate analysis is to find patterns and correlations between several variables simultaneously
    • Simple regression pertains to one dependent variable and one independent variable
    • Multiple regression (aka multivariable regression) pertains to one dependent variable and multiple independent variables
    • Multivariate regression pertains to multiple dependent variables and multiple independent variables
  • A stationary (time) series is one whose statistical properties such as the mean, variance and autocorrelation are all constant over time. Hence, a non-stationary series is one whose statistical properties change over time.

Keep Exploring!!!

October 04, 2021

Dark Side of Social Media

I was expecting such evidence-based insights to understand social media manipulation.

Key Notes from Link1, Link2

  • There were conflicts of interest between what was good for the public and what was good for Facebook. And Facebook, over and over again, chose to optimize for its own interests, like making more money
  • Facebook has realised that if they change the algorithm to be safer, people will spend less time on the site, they'll click on less ads, they'll make less money
  • The version of Facebook that exists today is tearing our societies apart and causing ethnic violence around the world,” says former Facebook employee France Haugen.

Paper #1 - THE WELFARE EFFECTS OF SOCIAL MEDIA

  • Adverse outcomes such as suicide and depression appear to have risen sharply over the same period that the use of smartphones and social media
  • Social media may create ideological “echo chambers” among like-minded friend groups, thereby increasing political polarization
  • Deactivating Facebook freed up 60 minutes per day for the average person in our Treatment group
  • Facebook deactivation significantly reduced news knowledge and attention to politics. 

More Read

POVERTY, DEPRESSION, AND ANXIETY: CAUSAL EVIDENCE AND MECHANISMS

Social Media can be addiction, It can make you feel lonely. We are not monitored for mental health. Talk more, Walk more. Quit Social Media!!!!



Data Curation paper - Reads

Paper #1 - A Survey on Data Cleaning Methods for Improved Machine Learning Model Performance

  • Two aspects of data cleaning: what to clean and how to clean

Key Notes

  • SampleClean: Simulated Clean Data Instances - SampleClean suggests a solution to sampling the raw data that can better present clean data instances.
  • Approximate Query Processing (AQP). The AQP consists of two steps: first, in Direct Estimate (DE), a set of k rows is sampled randomly and cleaned, and the training result is returned independently of the dirty data. The correction step is used to reweight the sample based on the contribution of the cleaned data
  • ActiveClean: Incremental Data Cleaning in Convex Models. ActiveClean gradually cleans a dirty dataset to learn a convex-loss model, such as Logistic Regression and Support Vector Machine (SVM).
  • HoloClean: Holistic Data Repairs With Probabilistic Inference
  • AlphaClean: Generate-Then-Search Parallel Data Cleaning
  • CPClean: Reusable Computation in Data Cleaning

ML Papers - Learning-with-Label-Noise

Paper #2 - Advancing Data Curation With Metadata and Statistical Relational Learning

Key Notes

  • We refer to data science as an umbrella term gathering algorithms and techniques from several disciplines, such as statistics, software engineering, and machine learning
  • Data is inconsistent, duplicated, stale, incomplete, and/or inaccurate. Data errors, such as outliers, duplicates, missing values, and inconsistencies.
  • Mapping Metadata to Data Quality Issues
  • Error Detection
  • Joint Error Detection and Repair Suggestion


Data Quality fundamentals

  • The Consistency dimension refers to the validity and integrity of values and tuples with respect to defined inter- and intra-relational constraints that exist within either single or multiple relations
  • The accuracy dimension identifies correct and true values of the entities presented by data.
  • Completeness is a degree to which values are included in a data collection
  • Timeliness dimension reflects the change and update of data by identifying the most current value of an entity in a database
  • Core data quality dimensions, the violation of Accuracy, Consistency,
  • Uniqueness, Completeness and Timeliness lead to data quality issues

  • Metadata is "structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource"


Single-Column Profiling Tasks

  • Cardinalities refers to the counts of values
  • Number of rows: the number of entities which are available in the table;
  • Distinctness: the number of distinct values of the single attribute;
  • Uniqueness: the ratio of the number of distinct values to the number of rows

Value Distribution refers to the distribution of values on the column. This category includes:

  • Constancy: the ratio between the most frequent value count and the number of rows;
  • Extreme values: minimum and maximum values in numeric columns; shortest and
  • longest strings in categorical, alphanumeric or text columns;
  • Histogram: values distribution summary on an attribute
  • Quartiles: three points that divide numeric distribution into four equal groups;
  • Inverse distribution: an inverse frequency distribution (a distribution of the frequency distribution);

Patterns

  • Patterns refers to the syntactic properties on the values of the individual column.
  • Lengths, which specifies the descriptive statistics of the column value lengths
  • Decimals, which determines the number of decimals in numeric columns

Multi-Column Profiling Tasks

  • Functional dependencies
  • What. The first dimension captures common data quality issues and typical data cleaning tasks, which had been found in the literature.
  • How. The second dimension reflects differently focused data cleaning approaches.

Rule-Based Approaches

  • Data cleaning rules or integrity constraints to detect and repair various error types in the dataset.

Statistical Approaches

  • DEC (DetectExplore-Clean) framework [22] uses statistical and other analytical techniques, such as the Fleiss’ kappa measure, to compute the glitch score, which identifies and scores the data glitches

Probabilistic and Machine Learning-Based Approaches

  • The BoostClean system [141] addresses the domain value violations while cleaning training data for predictive models
  • The HoloClean system [202] considers error detection as a black-box component and expects the specification of integrity constraints-aligned data quality rules to make probabilistic suggestions on how to repair erroneous data values.
  • Interactive Data Cleaning
  • Numerous data cleaning systems use crowdsourcing for duplicate detection and resolution






Supervised Error Detection with Metadata


1) an Error Detection Suite, which includes pluggable error detection systems that function as black boxes to our system.

2) a Metadata Profiler Suite, which extracts various metadata categories, and 

3) an Aggregation Suite, which combines the output of the error detection suite and the profiler. In the following, we describe each of the components.

Keep Exploring!!!

Probabilistic Forecasting Reads

Paper - Master's Thesis : Comparison of probabilistic forecasting deep learning models in the context of renewable energy production

  • DeepAR
  • Wavenet
  • Transformer
  • Temporal Fusion Transformer
  • Prophet

Awesome Reads

Timeseries ML

Code - Link

  • Naive forecasting models (Naive, Seasonal Naive, Moving Average, etc)
  • MXNet [10], developed by Amazon Web Services
  • GluonTS has been developed by a Amazon Web Service team to fill the gap of time series modeling toolkit
  • MQCNN, MQRNN, NBEATS and Wavenet does not outputs samples of a distribution function, but quantiles of the distribution itself
  • NPTS is the implementation of the “Non-Parametric Time Series Forecaster” model
  • MQCNN is the implementation of one variant of the model described in paper ”A Multi-Horizon Quantile Recurrent Forecaster”
  • The model Transformer is the implementation of “Transformer” model architecture, as it was defined in paper [22]. It is described in this paper as ”The first sequence transduction model based entirely on attention, replacing the recurrent layers most commonly used in encoder-decoder architectures with multi-headed self-attention”
  • The model Wavenet is the implementation of ”Wavenet” model architecture, as it was defined in paper [23], with a quantized target. This model network is composed of dilated causal convolutional layers. Both residual and parameterised skip connections are used throughout the network,to speed up convergence and enable training of much deeper models
  • DeepAR - global model from historical data of all time series. Similar to LSTM-based recurrent neural network architecture to the probabilistic forecasting problem
  • Binomial distribution - Two possible outcomes (the prefix “bi” means two, or twice)
  • Assumptions - Each trial is independent. The probability of success (tails, heads, fail or pass) is exactly the same for each trial
  • Poisson distribution - Gives us the probability of a given number of events happening in a fixed interval of time
  • Continuous distribution - data can take on any value within a specified range
  • Discrete distribution is one in which the data can only take on certain values, for example integers
  • RNN architecture for probabilistic forecasting, incorporating a negative Binomial likelihood
  • Monte Carlo (MCMC) methods comprise a class of algorithms for sampling from a probability distribution.