Link #1 - Forecasting at Uber
Need for Forecasting
- Identify High Demand Areas
- Predict User Supply Demand
- Identify Trends, Seasonality, Competition, Pricing
- Data Driven marketing decisions
- Daily Trends
- Hourly Trends
- Weekly Trends
- Weekend Trends
- Statistical - ARIMA, Holt-Winter
- ML Based - RNN, Quantile Regression Forest (QRF), Gradient Boosting Trees (GBM), Support Vector Regression (SVR), Gaussian process regression (GP)
- Understand Trend, External Factors like Weather, Concerts
- Quantile Regression Forest (QRF) - provide predictions at percentiles
- Gradient Boosting Trees (GBM) - a prediction model in the form of an ensemble of weak prediction models, typically decision trees
- Support Vector Regression (SVR)
- Gaussian process regression (GP) - The prediction is probabilistic (Gaussian)
- Central Limit Theorem - Almost all measurable "random" variables in real world follow some kind of normal distribution
- Binomial - Only two outcomes of trail
- Poisson - Events that take place over and over again. Rate of Event denoted by lambda
NLP Models built to interpret
- Phonology
- Morphology
- Grammar
- Syntax
- Semantics
Key Concepts in Implementation
- Fetch Trip Data
- Fetch Ticket Text
- Preprocessing (Stemming, Lowercasing, Stop-word removal, Lemmatization)
- Feature Engineering (LSI / TF_IDF)
- Cosine Similarity
- ML Algorithm for Ranking (Predictions - Issues / Solutions)
- Topic modeling transformation is carried out on the bag-of-word representation
Key Concepts
- Tickets encoded using One Hot Encoder
- Classification for type of ticket
- Alternatively used word2vec based representation
- WordCNN and LSTM based models
Databook: Turning Big Data into Knowledge with Metadata at Uber
Happy Learning!!!
No comments:
Post a Comment