- Data based Decisions than opinions/perspectives
- Data-Driven Thinking / Measure what you can collect/interpret
- Staying unbiased / finding missing data
- Use logical decisions / thoughts for skewness / relationships / trends
- Data is everywhere but interpreting it correctly is a skill, Conveying fact without overselling or missing the point is also a key skill
- Using Data with Caution and proactive about making changes as conditions change
- Agile, Observe, Adapt, Change and Monitor
August 29, 2020
My Perspectives on Interpreting Data
Labels:
My Perspectives
August 27, 2020
Technology / Job Trends
It appears like boom but many Skills will Converge in the next 5 years.
10 Years Back
Interesting Read - ML feedback
10 Years Back
- OLTP - Real-time
- OLAP - BI
- Real-time OLAP - Columnstore Databases - Vertica
- Data Aggregation - Across SQL, NoSQL, Data lake
- In-memory Real-time machine learning - Spark
- Data Science - Forecasting, Clustering, Anamoly, Churn, CLV, Recommendations - Built on top of Data lakes
- Features Stores - Evolving / Embracing products - Feast, Hopsworks
- Real-time analytics - Translytics (Microservices + Shard Data) - Newer forms of data store / Analytics
- Dockerized + Kubernetes + KfServing - Everything as API
- Leverage more analytics at every stage of data pipeline - KSQL (Kafka SQL, Spark ML
- Unified FeatureStores to access - Realtime, Trends, ML Features, More and more tools will automate everything
Interesting Read - ML feedback
Keep Thinking!!!
Labels:
My Perspectives
August 22, 2020
Research Paper read - Display Advertising with Real-Time Bidding (RTB) and Behavioural Targeting
Research Paper read - Display Advertising with Real-Time Bidding (RTB) and Behavioural Targeting
Key Notes
The different components and interaction is displayed in below picture
Realtime behavioral targeting
A user is typically identified by an HTTP cookie, designed to allow websites to remember the status of an individual user, including remembering shopping items added in the cart in an online store or recording the user’s previous browsing activities for generating personalized and dynamical content
Personalized workflow
This is an interesting pic. How many cookies present in NYT page. Cookie Syncing is done to keep track/sync all cookies of a particular user.
ML Use Case for Click-through rate prediction
Look-alike modeling - on the basis of the learned user profiles, identify and target unknown users who have similar interests and commercial intents with the known (converted) customers
Conversion over multiple touchpoints
Key Concepts
CTR, Click-Through Rate - the probability of a specific user in a specific context clicking a specific ad
CVR, Conversion Rate - the probability of the user conversion is observed after showing the ad impression
Keep Thinking!!!
Key Notes
- RTB - Real-time bids. The mechanism to buy and sell ads
- Key components - Demand-side platform, Supply-side platform, Real-time bidding
- Input Signals - Image, Video, Audio
The different components and interaction is displayed in below picture
Realtime behavioral targeting
- Collect all traits
- Monitor and Alert
- Bid and reach out with relevant ads
A user is typically identified by an HTTP cookie, designed to allow websites to remember the status of an individual user, including remembering shopping items added in the cart in an online store or recording the user’s previous browsing activities for generating personalized and dynamical content
Personalized workflow
This is an interesting pic. How many cookies present in NYT page. Cookie Syncing is done to keep track/sync all cookies of a particular user.
ML Use Case for Click-through rate prediction
Look-alike modeling - on the basis of the learned user profiles, identify and target unknown users who have similar interests and commercial intents with the known (converted) customers
Conversion over multiple touchpoints
Key Concepts
CTR, Click-Through Rate - the probability of a specific user in a specific context clicking a specific ad
CVR, Conversion Rate - the probability of the user conversion is observed after showing the ad impression
Keep Thinking!!!
Labels:
Advertising,
Research Papers
Day #334 - Exploring - Featuretools
Have been listening/hearing about Feature generation, feature management. There are a couple of tools/frameworks in this perspective.
For a typical ML product level use case
Analysis - Basically like connecting few tables, doing that analysis of unique, average its all taken care after you define the entities, It is like prebuilt analysis based on identified associations
For a typical ML product level use case
- Who defines the problem - Domain Expert / Product Manager
- Who knows the data sources - BA / Database Developer / Product Manager
- Raw Data -> Processed data - DB Developer
- Data Exploration / Analysis / Feature Creation - BI / DB / ML Developer
- Model Development / Validation - ML Developer
- Deployment / Monitoring / Improvement - Devops / ML Developer
Installing Featuretools
Analysis - Basically like connecting few tables, doing that analysis of unique, average its all taken care after you define the entities, It is like prebuilt analysis based on identified associations
Experimenting this on colab - Colab notebook link
From Link , Feature comparison between different feature stores
Paper - Link
Key Notes
- Handling Data Ingestion
- Aggregating data from diverse sources
- Access controlled and versioned
Key Offerings
- Automated Feature Generation
- Access to generated feature
- Data Privacy / Data Governance
- Data Visualization
My Thoughts
- Today with all cloud trend all the data OLTP, OLAP, SQL, NoSQL sit next to each other
- Generating reports aggregating all sources in near real-time fashion is possible
- Some features/variables can be pulled from OLTP tables
- In a Data Lake / DW, Some of the insights would be already present in computed reports
- Metadata management would already be available in the system which will handle data quality aspects
- ML systems will work together as part of larger Data ecosystem comprising of OLTP, OLAP, SQL, NoSQL system. A lot of feature store workloads are already handled by other pieces.
Keep Thinking!!
Labels:
Data Science,
Data Science Tips
August 21, 2020
August 19, 2020
Research paper read - Serverless inferencing on Kubernetes
Serverless inferencing on Kubernetes
Key Notes
Example #1
Provide Inference Location
Canary Location
Monitoring and explainability of models in production
Success Metrics for ML Model
1. Monitoring model performance
2. Monitoring metrics related to incoming data
3. Detecting outliers and drift
4. Explaining model predictions
Key aspects
Monitoring system requires functionality to determine when significant changes to data and predictive distributions happen
Seldon Core provides a dedicated /send-feedback API endpoint accepting labels and performing user-defined metric calculations
Drift Detector - The goal of the drift detector is therefore to identify when the distribution of the requests for the deployed model starts to diverge from the training data and model predictions
Model Monitoring - a KNative broker which can farm these out as desired via programmable triggers to serverless components such as outlier, drift and adversarial detection
More Reads - Minio - High performance object storage
Keep Thinking!!!
Key Notes
- KNative serverless paradigm to provide a serverless machine learning inference solution
- Frameworks - MLFlow, Kubeflow
- Handling multiple machine learning frameworks in a consistent manner.
- Updating running models with new versions.
- Scaling models appropriately with constraints.
- Monitoring models.
- Canaries allow users to split a small percentage of traffic to their new model
- KFServing is a project that was created within the Kubeflow
- Transformers allow focused data transformations of the request and response from the model
Example #1
Provide Inference Location
- Create a storage initializer to download the artifacts from any popular storage (Google Storage, Amazon S3, Azure, local disk) and load onto the server.
- Wire up networking so an endpoint is made available for inference requests
Canary Location
Monitoring and explainability of models in production
Success Metrics for ML Model
1. Monitoring model performance
2. Monitoring metrics related to incoming data
3. Detecting outliers and drift
4. Explaining model predictions
Key aspects
Monitoring system requires functionality to determine when significant changes to data and predictive distributions happen
Seldon Core provides a dedicated /send-feedback API endpoint accepting labels and performing user-defined metric calculations
Drift Detector - The goal of the drift detector is therefore to identify when the distribution of the requests for the deployed model starts to diverge from the training data and model predictions
Model Monitoring - a KNative broker which can farm these out as desired via programmable triggers to serverless components such as outlier, drift and adversarial detection
More Reads - Minio - High performance object storage
Keep Thinking!!!
Labels:
KFServing,
Research Papers
Research Paper Reads - MODELING USERS FOR ONLINE ADVERTISING
Paper #1 - MODELING USERS FOR ONLINE ADVERTISING
Key Notes
Publishers, Advertisers, Ad-networks, Online users
Research Directions
Do ads target user profiles in the field?
What are the ads shown to different users?
How do ads impact users profiles?
Data - The capability to gather display ads and video ads from across the web is central to our work
Profile-driven crawling - Enables each crawler instance to interact with the ad ecosystem as though it were a unique user with particular characteristics.
The Anatomy of Online Advertising
Video ads - Pre-roll, mid-roll, post-roll, Overllay-ads, Sponsored Videos
User Modeling on Mobile
Real-time Attention Based Look-alike Model for Recommender System
Key Notes
RALM
System Architecture
Offline Training
Metrics
Comprehensive Audience Expansion based on End-to-End Neural Prediction
Keep Thinking!!!
Key Notes
- Contribution - a neural network model (app2vec) to vectorize mobile apps by studying how users employ these apps
- User activity data
- User behaviors
- Logging user activities
- Contents consumed by users
- Anonymous browser cookie syncing technique
- Targeting audiences
- User profiling
- Ads based on their activity history across the web
- Users watching polymorphic videos are likely to have similar interests
- US mobile users download more than eight apps per month on average
- 90% of the time spent on mobile devices was spent using apps
- Data - users browsing, app usage,
- and other activities on the Internet
- Targeting - site/page context, placement size, user behavior and geolocation
Publishers, Advertisers, Ad-networks, Online users
Research Directions
- Cross-device user tracking - Users access online content through multiple devices
- Value of user profile - Different costs associated with them, Ad targeting on user profile
Do ads target user profiles in the field?
What are the ads shown to different users?
How do ads impact users profiles?
Data - The capability to gather display ads and video ads from across the web is central to our work
Profile-driven crawling - Enables each crawler instance to interact with the ad ecosystem as though it were a unique user with particular characteristics.
The Anatomy of Online Advertising
- Advertisers - Advertiser reach out to potential customers.
- Publisher View - premium campaigns (specific advertisers, ad networks, ad exchanges)
Video ads - Pre-roll, mid-roll, post-roll, Overllay-ads, Sponsored Videos
User Modeling on Mobile
- app2vec to represent apps in a vector space without a priori knowledge of their semantics
- app2vec to cluster apps based on app distances in their vector space
- Computing app similarity is through the bag-of-words method using app meta information
- A simple similarity-based look-alike system can use direct user-2-user similarity to search for users that look like (or in other words, be similar to) seeds
- Another type of look-alike audience systems for online advertising is built with Logistic Regression (LR)
- User segments can be user characteristics such as user interest categories.
Real-time Attention Based Look-alike Model for Recommender System
Key Notes
- Real-time attention based look-alike model (RALM) for recommender systems
- Deep neural networks (DNNs) and recurrent neural networks (RNNs) are more and more popular on recommendation task
- "Matthew effect" - low quality and poor diversity of recommended contents.
RALM
- RALM is a similarity based look-alike model, which consists of user representation learning and look-alike learning
- Deep interest network for multifields user interests representation learning
- Local representation of seeds should be processed online in real-time
- k-means clustering to partition seeds into k clusters
- Similarity based methods determine similarity between seeds and users based on distance measurement.
System Architecture
Offline Training
- User Representation learning. The user representation model is developed based on deep learning network
- Look-alike learning is based on attention model and clustering algorithm
- User feedback monitor: The audience extension system updates the seeds of candidates through monitoring the click behaviors of all WeChat users in real-time
- online serving - The lookalike model predicts the global embedding of seeds through global attention unit
- CTR (Click-through Rate): As audience increased, many new users sharing the same interests with seeds are reached. Therefore, CTR is expected not to decrease
- Category & Diversity. One of our purposes is enriching user’s interest in our system, so we define a metric named diversity. It is represented by a number of content categories or tags a user has read in a day. With a more comprehensive user representation, more kinds of contents will be reached and category&tag diversity is expected to increase
Comprehensive Audience Expansion based on End-to-End Neural Prediction
Keep Thinking!!!
Labels:
Advertising,
Research Papers
August 14, 2020
Subscribe to:
Posts (Atom)