Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database): January 2022

January 31, 2022

January 30, 2022

Every new technology sometimes I feel I need to understand out of FOMO - Fear of missing out. Metaverse - Meta represents beyond, Verse represents the universe, beyond the universe. Ideally, different components of metaverse existed earlier. We had AR, VR, AI, Robotics, Computer Vision, and adoption has been successful/ongoing. A system that would encompass everything and layout a virtual platform is a metaverse.

Today you collaborate with the Mural board, Teams, Whiteboard. Oculus, Hololens, Google glass many of these projects tried to project some form of information into the real world. Something similar if it could encompass and provide Engagement, experience, and emotions is all what metaverse to go about.

Microsoft, Facebook, Google have all been trying to get a break on this. We know what happened to google glass, What happed to Nokia. Some of the projects end up as kitchen sink. They never make it to production. Although there are benefits of these components the First-mover cost, experimentation, getting the tech breakthrough may involve a lot more costs.

Essentially when it becomes affordable it may end up as another commodity device, a play store kind of ecosystem where apps get published.

5G may power provide the required network speed but the rest of AR, VR, Robotics, Vision, Blockchain may still need some iterations to get some use case beyond what we have individually for AR, VR, Vision, Robotics etc.

Keep Thinking!!!

January 28, 2022

Interesting JD - DataScience Roles - Evaluation Software Engineer

I have come across several d#atascience #jobs, This one Evaluation Software Engineer is very interesting

The JD States below things

Organize neural network challenge scenarios and route them to the appropriate evaluation suites
Collaborate with engineers and program managers to identify which neural network challenge cases are the highest priority to improve
Investigate if the challenge persists in newer versions of models

My version of understanding

For a vision model for a failed use case - pedestrian not detected, vehicle not detected they triage / prioritize / address

Why a use case fails, How to enrich the dataset
Do the key regions are activated when we interpret feature activation across layers
Prioritize / Add data / customize network if needed / train / validate it is fixed

This reflects how much every scenario is validated, prioritized, and ensured models reflect the real-world scenarios. Most of the time we see ML, DL jobs but not this level of details and clarity.

The JD link

This is the difference between prototype vs production vs updates and how forward-looking they are in the future to handle all scenarios :), Behind all #autopilot models there would be tons of #scenarios and multiple Evaluation Software Engineers and automated suites validating it.

I have never seen a similar type of JD anywhere except Tesla :)

Keep Exploring!!

January 27, 2022

Dart based forecasting

Few experiments Transformer based / Exponential based

Time Series Made Easy in Python

More reads

Key concepts

Past Covariates denote time series whose past values are known at prediction time.
Future Covariates These can for instance represent known future holidays, or weather forecasts.

Multi-step Time Series Forecasting with Machine Learning
Multivariate-Time-series-Analysis-using-LSTM-ARIMA
Multivariate-Time-Series-Forecasting-with-LSTMs-in-Keras-for-CORN-SWEET-Terminal-Market-Price

DeepETA

Keep Exploring!!!

surgeahead - Airavana, BambooBox, Blend , #hellomida - Perspectives with ML Lens

What are potential ML use cases powering this use cases, Lets discuss and explore

Keep Exploring!!

January 26, 2022

Career, Growth, Trust

Why do we switch careers?

Better Salary
Better Job profiles
Relocation
Brand

What drives you?

Unsure of growth in the current place
Not getting recognition

What is unsure of growth?

The trust deficit between you and your manager
I may not grow here or I don't sense my growth here

How you may grow?

When you get your right projects
When your work is appreciated
It has both internal and external factors
Your skills, projects, Contributions

Does it end with a Job Change?

New job, the new team may be the same story
Match of skills
Culture, Able to question and ask with clarity
Contribute, question, collaborate

In long term, you are your own competition. Titles - Jobs - Salary is point in time but what you learn, build, make your own uniqueness, strength is the key.

Good read - link

Transparently sharing it without request
Make it clear that it's just opinions and not decisions

Keep Thinking!!!

Hiring perspectives

I have two candidates

Candidate #1 - Cleared Data Science Test, Good Deep Learning, Has not worked in Pharma domain but good at ML
Candidate #2 - Just passed ML Test, No Deep Learning Expertise but worked in Pharma, Has domain and insights, Data knowledge

Both performed well in other behavioral rounds. If we end up hiring multiple types of Candidate 1 effectively you will end up with #duplicate skill set. I would prefer to have a mix of both in a team. Skills need to complement and add different perspectives to the problem. In a team, we need a combination of both #1 and #2. We need to have different #proportions of #skills and assessments to get a good mix of talent that can look at #same #problem in multiple #perspectives

Keep Thinking!!

January 25, 2022

What questions we need to ask Facebook

What Affirmative action is taken to reduce extensive user amplification/engagement
What plans do you have to engage kids towards creativity/thinking, a platform for self-development vs content engagement
50% of the world's population still being offline, How has FB impacted in positive ways vs migrating/working for #metaverse

Keep Thinking!!

Setting up cockroachDB

Signup for cockroachDB
Install DBweaver
From cockroachDB signup get user, host, database, port details
Set password for user it must be 12 digits minimum
Took existing SQL scripts of table creation ported it to postgres with online converter link
Data generation script does not work as is need to tweak it in postgres.
Tables created, Data need to be populated, run some load tests :)

Keep Exploring!!!

January 23, 2022

30K, 10K feet to Building Algo perspectives

Get the 30K Feet Big Picture of Algo
Know the functions/methods implement
Map it do applicable domains / use cases
Apply First principles to learn basics
Find gaps in fundamental assumptions vs algos vs intermediate gaps in learning
Start Filling the gaps iteratively
Build your own version of algo with first principles + domain + data + algo implementation

#KnowledgeBuilding #MLBytes

Keep Exploring!!!

January 22, 2022

A / B Testing Revisions

Session #1

Notes

#A/BTesting = #Randomizedcontrolledtrials of two versions of same application

Running experiments
Used in Testing search, pricing algos
Origins of A/B testing 100 years back
Testing two fertilizers
Commit to sample size, Number of pots to use
More pots, Higher sample size honest assessments
Analyze by hypothesis test
The difference between two options

Binary classification false positives
What might have happened if no difference

Current state
Adjust test lengths in realtime
Adjust duration of test

How long test running
Variations of page
Views / Example clicks
Compared to baseline comparison
A-A Testing both variations are same
5% False positive probability
Number of experiments, Sample size, 50% of actual size of customerbase
Optimal procedure - Monitoring and Stopping early

Statistical significance of confidence level

Staging trials
Wait to see p value < 5
Secondary diagnostics
Sample size calculation
How many samples to wait for

Session #2

Conversions / clicks measure
Prior Beleif + Update prior belif and Probability distribution

Sample sizes

Some marketing experts even recommend sample sizes of up to 5,000 people.
One way is to run A/B tests separately for specific devices and browsers.
statistical significance of 95%

P > 0.05 is the probability that the null hypothesis is true

P ≤ 0.05 means that the test hypothesis is false or should be rejected

A/B test sample size calculator

How to Calculate A/B Testing Sample Sizes?

On the Complexity of A/B Testing

An architecture for enabling A/B experiments in automotive embedded software

Offline A/B testing for Recommender Systems

Keep Exploring!!!!

AI Solutions in SaaS model

AI solutions in saas model, #Data as #service, #Insights as #service #Forecasting, #Recommendations as #service #quantumics.ai #abacus.ai #graphext.com #deepstack.cc

Keep Exploring!!!

January 20, 2022

Research Reads - Markdown pricing

Paper #1 - Markdowns in E-Commerce Fresh Retail: A Counterfactual Prediction and Multi-Period Optimization Approach

Key Notes

Due to the limited shelf life of perishable products and the limited opportunity of price changes, it is difficult to predict
sales of a product at a counterfactual price, and therefore it is hard to determine the optimal discount price to control inventory and to maximize future revenue.
Sequential pricing strategy by Markov decision process, and design a two-stage algorithm to solve it
Many perishable products, such as vegetables, meat, milk, eggs, bread, have a limited shelf life promotional markdown is a common approach for e-commerce fresh retails
The normal channel, where goods are sold by no-discount retail price
Markdown channel, where customers can buy goods by discount under the condition that their total purchase has reached a certain amount

Key Questions

First, can goods be sold out with the retail price before its expiry date?
Second, if not, what is the optimal discount price for promotional markdown to ensure the goods being sold out while maximizing the profit?
The first problem is about sales forecasting
Second problem is about price-demand curve fitting
We observe the sales of a product with price A and B, we aim to predict the sales of a product with price C, which is counterfactual

To avoid price discrimination, the discounts of the same product in different stores within the same region should be all equal
To optimize the discount price, we need to take all stores in a region into consideration
We collect a set of observable covariate features 𝒙𝑖 ∈ R, including categories, holidays, event information, inherent properties and historical sales of products and shops
The key of pricing decision making is to accurately predict the demand of products at different discount prices
We aggregate data of all products by using the category information and learn the causal effect of each product jointly
The price elasticity is daily updated once the new transaction data is collected

Paper #2 - Markdown Pricing Under Unknown Demand

Unimodal Multi-Armed Bandit problem where the goal is to find the optimal price under an unknown unimodal reward function
“optimal” solutions exist under numerous variations on (a) the set of demand functions allowed, on (b) how inventory is treated, and on (c) the frequency at which prices are allowed to change, just to name a few.
A Markdown Policy and Performance Guarantee: We introduce a policy which satisfies the markdown constraint
Optimality via a Minimax Lower Bound: We prove that our policy is in fact orderoptimal by showing

Paper #3 - Markdown Pricing Under Unknown Parametric Demand Models

Markdown Policies with Theoretical Guarantees
Tight Minimax Lower Bound
Impact of Smoothness
In the Discrete Multi-armed Bandit problem, the player is o↵ered a finite set of arms, with each arm providing a random revenue from an unknown probability distribution specific to that arm. The objective of the player is to maximize the total revenue earned by pulling a sequence of arms

Keep Exploring!!!

January 17, 2022

Validation Loss vs Accuracy

I see both of them a bit differently, Both represent different aspects.

Loss

A loss function is used to optimize a machine learning algorithm.
Validation loss is measure of how much our predictions differ from what they should be before we put them through the threshold.

Accuracy

An accuracy metric is used to measure the algorithm’s performance (accuracy) in an interpretable way.
Empirically, accuracy seems like quite a limited measure of quality of predictions. To predict whether an example belongs to some class, our model outputs a number (whatever we put through sigmoid or softmax) between 0 and 1.

Ref - Link

Keep Exploring!!!

Vision Lessons

Some use cases convey how we simplify implementation with the setup/environment

Vision Lessons

Plate as a base and black background
The black background will reduce External noise
Spread uniformly
Easy to identify/report

Keep Exploring!!!

January 16, 2022

#keras #experiments #ParallelNetworks #Merge

Experiments to build hybrid approach of models. Leverage different convolutions, activation functions.

For custom training vision tasks. Get Features from both vggnet, resnet

Resnet - 224 x 224 x 3
Vgg16 - 224 x 224 x 3

Feature Vectors

VGG16 feature shape — (1L, 7L, 7L, 512L)
VGG19 feature shape — (1L, 7L, 7L, 512L)
InceptionV3 feature shape — (1L, 5L, 5L, 2048L)
ResNet50 feature shape — (1L, 1L, 1L, 2048L)

Inputs

Sobel, Laplace Transformations
Shareped X / Y Axis edges
Multiple inputs

Further Techniques

Apply different convolution filters
Apply different activation functions
Append different weights and analyze

Keep Exploring!!!

January 15, 2022

Food Research Papers - Food Science = Data + Data Science = Spend more $$ - Optimize Supply chain

Paper #1 - AI-enabled Efficient and Safe Food Supply Chain

Key Notes

Predicting plant growth and tomato yield in greenhouses
Optimizing energy consumption across large networks of food retail refrigeration systems
Optical recognition and verification of food consumption expiry date in automatic inspection of retail packaged food
Long Short-Term Memories (LSTM) are a variation of the Recurrent Neural Network (RNN) architecture
Networks composed of LSTM units have been able to solve the gradient vanishing problem met in long-term time series analysis
To achieve this, the LSTM structure contains three modules: the forget gate, the input gate and the output gate
LSTM-based encoder-decoder models
Attention mechanisms help to focus on feature segments of high significance
Output Predictions can be derived using the conditional probability distribution of the input signal and of the previous samples of the output.

Yield Prediction

Tomato crop growing in greenhouse environments is a dynamic and complex system
A linear relationship between flowering rate and fruit growth
Weekly yield fluctuations in terms of fruit size and harvest rate.
The environmental data were collected on an hourly basis, while the yield on a weekly basis.

Food Retailing Refrigeration Systems

Nemesyst system [60] has been capable of predicting which refrigerators to select and how long to turn them off, whilst maintaining food quality and safety
In the experimental study the target was to predict the time (in seconds) until the refrigerator temperature rises from the point it is switched off until it breaches a food safety threshold

Quality Control in Retail Food Packaging

Incorrectly labelled product information on food packages, such as the expiry date, can cause food safety incidents, like food poisoning.
The Food Packaging Image dataset used next consists of more than 30,000 images classified in two categories (existing valid date and non-existing or non-valid date)

Paper #2 - Food Supply Chain and Business Model Innovation

Food supply chain (FSC) consists of a chain of activities elaborating how a product is produced and delivered to the final consumers
Farmers, processors, distributors, and retailers

Four main aspects of a business:

value proposition, which refers to the products and services the business is providing
value delivering, which implies the mechanisms the business is connected with its final customers to deliver the products and services to them
value creation, points out the main activities which are necessary to create and deliver the values to the customer
value capturing, which indicates the ways a business makes money through the value creation and delivering processes

Five strategies to innovate their business model:

1) innovating the value proposition,
2) reconsidering the value delivering mechanisms,
3) innovating the value creation processes,
4) providing new value capturing models, and
5) proposing a quite new business model.

Value Delivering - One of the most important issues in the FSC is food distribution, where cold chain management plays a vital role. Having a frozen storage with the risk of high-energy consumption and cool storage with the threat of bacterial decay is a dilemma the distributors in the food industry deal with

Flight kitchen business model is quite similar to CVS convenience store (CVS) indirect delivery business
Model where the only difference is the lower supply volume and fewer supply spots

Paper #3 - Demand forecasting in supply chain: The impact of demand volatility in the presence of promotion

We decompose demand into baseline and promotional demand and propose a hybrid model to forecast demand.
CoV to measure the volatility of demand and propose appropriate forecasting models
Pearson’s correlation between demand uplift only due to promotions and price
Coefficient of variations (CoV) where promotion causes volatility over the entire demand series
CoV by definition is the sample standard deviation divided by the sample mean
Low volatility demand where CoV is smaller than 0.5
Moderate volatility where CoV is greater than 0.5 and smaller than one
High volatility where CoV is greater than one.

Paper #4 - Mathematical modeling on tomato plants: A review

Crop variables
Climatic conditions (air
Temperature, CO2 concentration, humidity and
Photosynthetically active radiation (PAR))

Paper #5 - Using Deep Learning to Predict Plant Growth and Yield in Greenhouse Environments

The environmental data were collected on an hourly basis, while the yield on a weekly basis. To deal with these data characteristics, we performed data augmentation, through interpolation of weekly data, resulting in daily data measurements
Plant density was approximately 15 pots per 𝑚2, where every pot contained 3 cuttings

More Reads

Keep Exploring!!!

January 14, 2022

Socialmedia - Mentalwellness - Pornography - Privacy

10 mins Quick summary on the collective impact of

Smartphones
Social media
Tracking
Privacy
Pornography
MentalHealth

Keep Thinking!!!

January 13, 2022

Data Science Skills / Challenges

Ideate - Domain Knowledge, Contextual AI / Data Knowledge
Design - Algorithms, Data, Features
Develop - Cloud, Data, ML
Implement AI - Deploy, End to End Architecture

Observations / Challenges

Desire to satisfy their intellectual curiosity rather than because a project or technique stands
Business contexts are those who think about how to scale their algorithms in production
Passionate about building faster, more efficient data pipelines
Experienced DS managers who serve as product managers to guide/develop / envision
The greater the ambiguity of a situation or project, the more important it is to hire people with a commercial and strategic understanding
Hire talent that has both specialized expertise in AI and business acumen

Keep Exploring!!!

January 11, 2022

Flipping roles - Different Learning Situations

Learning for Finding Ideas / Inspirations
Learning to implement / Coding up Ideas / Evaluate
Learning to understand and Review work
Learning to unblock potential issues
Learning to keep up with Tech around my work

Keep Thinking!!!

Hiring thoughts

Hire for generalists - Domain Expertise / Products, Fintech, Agriculture, Automotive, Digital Twins - Telematics
Hire for specialists - MLOps, Kubernetes, Stats
Hire for interests - Outside work, domains/product observations, How you see tech and business future
Code for Solutions and Products - Build a comprehensive solution as well include both aspirations/team mix / demonstrated past contributions outside work
Map the charter - Inhouse products / SaaS products / Competency building

Keep Exploring!!!

January 10, 2022

Research Papers Reads - RPA - Robotic Process Automation

Paper #1 - Robotic Process Automation - A Systematic Literature Review and Assessment Framework

Key Notes

Robotic Process Automation (RPA) is the automation of rule-based routine processes to increase efficiency and to reduce cost
‘robotic process automation’ OR ‘intelligent process automation’ OR ‘tools process automation’ OR ‘artificial intelligence in business process’ OR ‘machine learning in business process’ OR ‘cognitive process automation’

What is RPA and what are the differences between RPA and related technologies

Software-based solution
Mimics human behaviour
Routine tasks with structured data
RPA is a novel technology starting to emerge in 2015.

Paper #2 - From Robotic Process Automation to Intelligent Process Automation

Business Process Automation (BPA)
RPA is an emerging technology in BPA that creates software robots that perform tasks previously done by humans
All these algorithms rely on humans in the loop.
Learned process activities from text documents using supervised machine learning, namely feature extraction using WordNet

Chatbots - Reducing the need for direct human involvement with the business process is one of the main goals of automation
Popular topics revolve around process mining and automation (particularly from natural language), automated synthesis and composition of processes

Paper #3 - A Conversational Digital Assistant for Intelligent Process Automation

RPAs have leveraged diverse technological advancements in the fields of artificial intelligence and software development
optical character recognition (OCR) and classification models in document flow automation within a debt collector business process
RPAs identified relationships between tasks from user behavior

Conversational digital assistant paradigm
Dialog agents provide more human-like interactions. They are composed of an understand skill and a respond skill to answer user queries.
Information retrieval agents query information sources to achieve their goal. They perform advanced reasoning to respond to user queries or information retrieval tasks in a process.
Task execution agents perform tasks within a business process that change the state of the world by moving the business process forward. Examples include submitting applications, filling in information in forms
Alerting agents allow users to conversationally customize alerts and notifications triggered by the occurrence of specific events

Paper #4 - Automated Discovery of Data Transformations for Robotic Process Automation

Robotic Process Automation (RPA) is a technology for automating repetitive routines consisting of sequences of user interactions with one or more applications.
Two broad types of RPA use cases are generally distinguished: attended and unattended.
Unattended bots are used for back-office functions.
A UI log is a chronologically ordered sequence of actions performed by a user during interactions with various applications

More Reads

Keep Exploring!!!

January 09, 2022

Anamoly Detection

Paper #1 - A review on outlier/anomaly detection in time series data

Key Notes

An observation which deviates so much from other observations as to arouse suspicions that it was generated by a different mechanism

(Univariate time series) A univariate time series X = {xt }t ∈T is an ordered set of real-valued observations, where each observation is recorded at a specific time
(Multivariate time series) A multivariate time series X = {xt }t ∈T is defined as an ordered set of kdimensional vectors, each of which is recorded at a specific time

Outliers Type

Point outliers. A point outlier is a datum that behaves unusually in a specific time instant
Subsequence outliers. This term refers to consecutive points in time whose joint behavior is unusual, although each observation individually is not necessarily a point outlier

multivariate time series is composed of more than one time-dependent variable a univariate analysis can be performed for each variable to detect univariate point outliers

Paper #2 - A Survey on GANs for Anomaly Detection

Notes

GANs are a framework for the estimation of generative models via an adversarial process in which two models, a discriminator D and a generator G, are trained simultaneously
The generator G aim is to capture the data distribution, while the discriminator D estimates the probability that a sample came from the training data rather than G

Paper #3 - Anomaly Detection of Time Series with Smoothness-Inducing Sequential Variational Auto-Encoder

Notes

time series anomaly detection can be in general divided into two settings:
i) subsequence or whole sequence level anomaly whereby a subsequence xm,t1:t2 is labeled as an anomaly;
ii) point level anomaly for which a measurement xm,t at time t in sequence m is treated as an anomaly.

Note - Overview of GAN Structure

Notes

The generator learns to generate plausible data. The generated instances become negative training examples for the discriminator.
The discriminator learns to distinguish the generator's fake data from real data. The discriminator penalizes the generator for producing implausible results.

More Reads

Keep Exploring!!!

Fraud Detection Research Papers

Paper #1 - Credit Card Fraud Detection in e-Commerce: An Outlier Detection Approach

Notes

No prior knowledge of outliers or inliers is needed
The proposed algorithm is easy to scale as it can easily be implemented in a distributed manner
Proposed algorithm is general in nature and does not require k-means algorithm as the only base clustering algorithm.
Can estimate a measure of consistent behavior (good behavior) for each data point then we can identify outliers as data points with low consistency score.
Attempt the problem of outlier detection by estimating a consistency score
In our experiments we found that incrementally increasing k with a xed step works just as well as the ensemble created by carefully selecting k using a principled approach such as Silhouette Score
For #Fraud #detection with limited dataset, Algorithms to get started to find potential transactions #IsolationForecast, #OneClassSVM, #Clusteringbasedoutlierdetection

Paper #2 - A Comparison Study of Credit Card Fraud Detection: Supervised versus Unsupervised

Notes

6 supervised classification models, i.e., Logistic Regression (LR), K-Nearest Neighbors (KNN), Support Vector Machines (SVM), Decision Tree (DT), Random Forest (RF), Extreme Gradient Boosting (XGB)
4 unsupervised anomaly detection models, i.e., One-Class SVM (OCSVM), Auto-Encoder (AE), Restricted Boltzmann Machine (RBM), and Generative Adversarial Networks (GAN)
Supervised Learning Methods
Logistic regression allows us to estimate the probability of a categorical response based on one or more predictor variables x.
KNN algorithm essentially boils down to forming a majority vote between the K most similar instances to a given unseen observation
SVM is to derive an optimal hyperplane that maximizes the margin between two classes
Decision trees are simple but intuitive models that utilize a top-down approach in which the root node creates binary splits until a certain criteria is met
XGB uses gradient descent for optimization to improve the predictive accuracy at each optimization step by following the negative of the gradient as we are trying to find the sink in a n-dimensional plane

Unsupervised Learning Methods

OneclassSVM - The algorithm learns a soft boundary in order to embrace the normal data instances using the training set, and then, using the testing instance, it tunes itself to identify the abnormalities that fall outside the learned region
RBM model consists of visible and hidden layers, which are connected through symmetric weights. The objective of the generative training in RBM is to learn the unknown (h) iteratively using the input (x).
An auto-encoder (AE) learns to map from input to output through a pair of encoding and decoding phases
GAN AnoGAN by simultaneously learn an encoder E that maps input samples x to a latent representation z, along with a generator G and discriminator D during training.

Paper #3 - xFraud: Explainable Fraud Transaction Detection

Key Notes

Fraudster user detection
Fraud transaction detection
Methods that do not need to define meta-paths a priori, instead are able to automatically learn these patterns using a GNN.

xFraud detector. We are inspired by Transformer [39] and HGT [18], when designing the xFraud detector incl. heterogeneous mutual attention and heterogeneous message passing with key, value, and query vector operations (self-attention mechanism).

Paper #4 - TitAnt: Online Real-time Transaction Fraud Detection in Ant Financial

Key Notes

Rule-based methods have been extensively studied over the years [46] for fraud detection problem
several unsupervised learning and anomaly detection methods are introduced
Recurrent neural network to exploit temporal information of account behavior
Anomaly detection methods, such as isolation forest sheds light on fraud detection tasks

Paper #5 - A Comprehensive Survey on Machine Learning Techniques and User Authentication Approaches for Credit Card Fraud Detection

Key Notes

Combination of Hidden Markov Model (HMM) and K-Means algorithms was used in (Kumari and Choubey, 2017) to identify the fraudulent activities on credit cards
A transaction is considered suspicious if its distance to the center of the cluster exceeds a pre-set threshold
Self-Organizing Map (SOM) is an unsupervised neural network learning model, which has been used to form customer profiles and visualize fraudulent patterns

Paper #6 - A Survey of Credit Card Fraud Detection Techniques: Data and Technique Oriented Perspective

Key Notes

A Hidden Markov Model is a double embedded stochastic process which is applied to model much more complicated stochastic processes as compared to a traditional Markov model
Genetic algorithms have been used in data mining tasks mainly for feature selection.
A Bayesian network is a graphical model that represents conditional dependencies among random variables. The underlying graphical model is in the form of directed acyclic graph

Indian #Fintechstartups #MLOpportunities #Navi #Credavenue #Lendingkart

Indian #Fintechstartups #MLOpportunities #Navi #Credavenue #Lendingkart - ML Opportunities, Use cases, Domain Specific Features

Keep Exploring!!!

Fintech Research papers

Paper #1 - Credit risk prediction in an imbalanced social lending environment

Key Notes

Credit risk prediction is an effective way of evaluating whether a potential borrower will repay a loan
Borrowers benefit from lower interest rates; lenders receive a higher return than they would from a bank
Class imbalance is a common problem in loan default prediction
The under-sampling approach includes random under-sampling (RUS), and instance hardness threshold(IHT) algorithms
For over-sampling approach, random over-sampling (ROS), synthetic minority over-sampling technique (SMOTE), and adaptive synthetic sampling (ADASYN) are studied
publicly available datasets released by the Lending Club, a well-known P2P lending platform(lendingclub.com)

Paper #2 - Machine Learning in FinanceEmerging Trends and Challenges

Key Notes

inevitable trust deficit in deploying them in critical and privacy-sensitive applications, the so-called “black-box” nature of such models
Risk modeling - operational risk management, compliance, and fraud management
Portfolio management: The portfolios are designed based on the recommendations of smart algorithms that optimize various parameters with return and risk being the two most important ones
Algorithmic trading: Algorithmic trading exploits the use of algorithms to carry out stock trading in an autonomous manner with the minimal human intervention
Fraud detection and analysis: Fraud detection and analysis is one of the most critical machine learning applications in the finance industry
Financial chatbots

Paper #3 - Recommendation Engine for Lower Interest Borrowing on Peer to Peer Lending (P2PL) Platform

Key Notes

a recommendation framework for borrowers to help them borrow with lower interest rates
Bidding loan: first and foremost, borrowers themselves decide the maximum interest rate they are willing to pay.
machine learning models to classify if a given borrower will succeed on the bidding loan platform
machine learning models to predict the interest rate payable for bidding and traditional loans

contains 12,006 loans (both funded and nonfunded loans) with 12 features and 2 response variables — the borrower’s interest rate and the status of the bidding loan

Predicting the success rate of funding bidding loans

Paper #4 - Determinants of Interest Rates in the P2P Consumer Lending Market: How Rational are Investors?

Key Notes

The (1) loan-specific view analyzes elements such as loan volume and the loan period by investigating the effects of these elements on the interest rate for P2P consumer loans
(2) borrower-specific factors focus on aspects that affect a borrower's credit rating

Paper #4 - Deep Learning for Financial Applications : A Survey

Key Notes

Paper #5 - MACHINE LEARNING ALGORITHMS FOR FINANCIAL ASSET PRICE FORECASTING

Investment professionals often refer to this non traditional data as “alternative data" [12]. Examples of alternative data include the following:

Satellite imagery to monitor economic activity. Example applications: Analysis of spatial car park traffic to aid the forecasting of sales and future cash flows of commercial retailers. Classifying the movement of shipment containers and oil spills for commodity price forecasting [13]. Forecasting real estate price directly from satellite imagery [14].
Social-media data streams to forecast equity prices [15], [16] and potential company acquisitions [17].
E-commerce and credit card transaction data [18] to forecast retail stock prices [19].
ML algorithms for patent analysis to support the prediction of Merger and Acquisitions (M&A)

Capital Asset Pricing Model (CAPM)

The CAPM holds the following main assumptions:

One-period investment model: All investors invest over the same one-period time horizon.
Risk averse investors: This assumption was initially developed by Markovitz and asserts that all investors are
rational and risk averse actors in the sense that when choosing between financial portfolios investors aim to optimize the following:
(a) Minimize the variance of the portfolio returns.
(b) Maximize the expected returns given the variance.
Zero transaction costs: There are no taxes or transactional costs.
Homogenous information: All investors have homogenous views and information regarding the probability distributions of all security returns.

In the context of financial asset price forecasting the information processing problem we are trying to solve is the prediction of an asset price t time steps in the future - we are effectively trying to solve a non-linear multivariate
time series problem

Paper #6

FinBrain: When Finance Meets AI 2.0

January 31, 2022

January 30, 2022

January 28, 2022

January 27, 2022

January 26, 2022

January 25, 2022

January 23, 2022

January 22, 2022

January 20, 2022

January 17, 2022

January 16, 2022

January 15, 2022

January 14, 2022

January 13, 2022

January 11, 2022

January 10, 2022

January 09, 2022

Git Code Repository

About Me

What is your Expertise

Search This Blog

Translate

About Me and Disclaimer

Labels

Data Science Good Reads

Cloud, Datacentre, BigData and NOSQL Blogs

SQL Links

Archecture Blog List

Programming Problems

Startup - Reads

Perl-Python-Ruby-Linux-Oracle

Management + Leadership Blogs

Research Papers & Podcasts

My Wordpress

Interesting Reads

Useful Links - C# and .NET

Java, Selenium, QTP and Test Tools Learning

Agile Testing

Reverse Logistics Reads

Biztalk Blogs

MS BI Links

Process - Learnt it :)

Usability Guidelines - Building Better Sites

.NET Test Tools and Other Interesting Reads

Review Checklist

Blog Archive

Live Traffic

Total Pageviews

Popular Posts