"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

October 25, 2018

Day #144- Uber Engineering - NLP, Machine Learning Notes

Uber engineering blog provides great insights on ML / Big Data / NLP Initiatives and guidelines. Sharing Some of Key Learning's from blog posts.

Link #1 - Forecasting at Uber

Need for Forecasting
  • Identify High Demand Areas
  • Predict User Supply Demand
  • Identify Trends, Seasonality, Competition, Pricing
  • Data Driven marketing decisions
Visualization Key Points
  • Daily Trends
  • Hourly Trends
  • Weekly Trends
  • Weekend Trends
Forecasting Techniques
  • Statistical - ARIMA, Holt-Winter
  • ML Based - RNN, Quantile Regression Forest (QRF), Gradient Boosting Trees (GBM), Support Vector Regression (SVR), Gaussian process regression (GP)
Summary
  • Understand Trend, External Factors like Weather, Concerts
  • Quantile Regression Forest (QRF) -  provide predictions at percentiles
  • Gradient Boosting Trees (GBM) - a prediction model in the form of an ensemble of weak prediction models, typically decision trees 
  • Support Vector Regression (SVR)
  • Gaussian process regression (GP) - The prediction is probabilistic (Gaussian)
  • Central Limit Theorem - Almost all measurable "random" variables in real world follow some kind of normal distribution
  • Binomial - Only two outcomes of trail
  • Poisson - Events that take place over and over again. Rate of Event denoted by lambda
Link #2 - COTA: Improving Uber Customer Care with NLP & Machine Learning

NLP Models built to interpret
  • Phonology
  • Morphology
  • Grammar
  • Syntax
  • Semantics
Character level, word-level, pharase-level or sentence level or document level language modelling

Key Concepts in Implementation
  • Fetch Trip Data
  • Fetch Ticket Text
  • Preprocessing (Stemming, Lowercasing, Stop-word removal, Lemmatization)
  • Feature Engineering (LSI / TF_IDF)
  • Cosine Similarity
  • ML Algorithm for Ranking (Predictions - Issues / Solutions)
  • Topic modeling transformation is carried out on the bag-of-word representation
Link #3 - Applying Customer Feedback: How NLP & Deep Learning Improve Uber’s Maps

Key Concepts
  • Tickets encoded using One Hot Encoder
  • Classification for type of ticket
  • Alternatively used word2vec based representation
  • WordCNN and LSTM based models
More Reads - Uber’s Big Data Platform: 100+ Petabytes with Minute Latency
Databook: Turning Big Data into Knowledge with Metadata at Uber

Happy Learning!!!

No comments: