"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

December 18, 2021

Good Reads - Building Usecases - NLP & ML

Sometimes we need to have the right set of tools, patterns to build our idea. Bookmarking some interesting links

Keep Thinking!!!


Daily observations on Model building and Feature engineering

 Reality is far from the well-structured features listed. Some things that we discuss on the daily basis

  1. Feature not available in the transaction table, asking business users to provide data
  2. Currently, it's in excel need to have a process to streamline
  3. Feature Data is currently not captured but will be done in future
  4. The model overfits because this feature data is currently not captured, Can we leverage/infer through any existing features?
  5. The reason for the spike in sales is due to a certain reason which is not captured currently

The question of #Why? #Find? #Add feature data and iterate it is the real crux of learning. Before we talk about feature stores we need to know/collect/understand get all features under one roof.

#datascience #features #perspectives #learning

Domain + Data + Algo,  Connecting, Collecting all of them in a consistent repeatable way with the right data every month only can get consistent results :)

December 17, 2021

What do I do in my work?

Do I code full day? No

When do I code? When I pick up on building ideas, building a prototype, analyzing issues/data related observations

What do I do in my work? Clearing red flags, Discuss, Review, Recommend based on literature reads, ML ideas, techniques relevant to the context

What are my strengths? Data, Domain, and then ML. Seeing everything with a blended lens matters where we need to map both customers vs solutions vs timelines

What things do I read? Yes, you need a lot of ideas to give quality review comments, competitive products, algorithms at work, arxiv papers, domain-related reads, tech blogs. You need to build an idea repository

Sometimes I feel I am busy but not productive. Sometimes, Weekends provide the window to learn things. Product perspective, tech perspective, customer perspective come with empathy, understanding, and technical acumen. 

Keep going!!!

December 14, 2021

Times person of year 2021

Successful in one industry vs Success in every industry

Strategy, Guts, Hardwork, Consistency, Vision = Elon


December 10, 2021

Career Stages, Perspectives, Growing together

  • Sprints are very time-bound and coming up with MVP needs quick experiments, spotting the blocks of vision to implementation, and working as a team to get connected from Day 1 to MVP to building blocks. 
  • This needs a good mix of experienced folks to connect domain, and data and communicate the technical vision with the development team. 
  • This is a mix of aspects balancing engineering aspects of functionality, and scalability in early design vs making it working/operations in minimal time. 
  • It is easier said and done but that is where the crux of experience lies, working with clarity in a chaotic situation. 
  • Communication becomes the essence of both stakeholders and the team. 
  • Bringing the best, being able to contribute, clearing the red flags within the team, and effective collaboration with clients make the mark of a mature leader.

From link

Key points I like

  • Don't fall for the hype without first conducting a production-grade proof of concept
  • There should be no single point of failure in an app; always have a fallback and thoroughly test it
  • Seniority is defined by collaboration, not technical expertise.
  • Measure everything - each and everything
  • Create your own dashboard for operational and technological excellence.
  • Make sure tech is building with business metrics in mind.

##EngineeringLeadership



December 05, 2021

Data Science Skills

  • Domain - When we had the best algos but accuracy was poor we figured out domain aspects we missed
  • Algos - Some advanced algos Transformer-based models did give us a good baseline, They do help with their latest improvements. Best Algo plus domain knowledge to find relevant variables is key for success. 
  • Data Engineering - Time to incorporate, design, manage features has to be future driven not one at a time
  • Communication - Explaining this feature is needed and connects this way to the big picture matters

Data science use case has multiple stakeholders Every aspect of listed points helps to bring everyone/address clarification that arises while implementation.

Keep Thinking!!!

November 28, 2021

Everything is not same - Perspectives and Clarity matters

20 years of __________________________________________

  • 20 years of experience = 20 years of the same project / different projects?
  • 20 years of experience = 20 years of same role / multiple roles?
  • 20 years of experience = 20 years of services/ product building
  • 20 years of experience = 20 years of 9-6 or 9-12 ?
  • 20 years of experience = How many endless weekends / production go lives
  • 20 years of experience = How many learning migration on skills / domain / data
  • Titles vs experience vs Expertise vs Being aware of true self matters
With experience

  • Balance both journey and current tasks
  • Code to convince someone this is what I meant
  • Code to unblock/find next steps
  • Code to validate this idea works
  • Prototype to share this is feasible
Young folks need time to trust. More than experience connecting with them with all skills/code/experience matters. 

Keep Thinking!!

November 10, 2021

Zillow Machine Learning Fallout

Good read - Link

Machine learning is no silver bullet if you do not consider domain, data, changing environmental factors. A classic case of missing domain knowledge is flagged in this story.

  • Zillow does Real estate - selling, buying, renting, and financing
  • Zillow home value estimation models failed.
  • Assumption - assumption that housing prices would continue to climb without interruption at a stable rate
  • The domain experts warned of issues with the predictions.
  • The business went ahead anyway. Finally, it bombed

Lessons

  • Domain expert warnings considered as Go / No-go for production, not just model accuracy
  • Learn / Incorporate Data Changes to understand changing trends
  • Performing A/B Experiments to understand customer behaviors and leverage optimal values based on outcomes
  • Better model/feature management / keep improving on features / incorporate external factors based on domain expert perspectives #machinelearning #technology #datascience #domainknowledge

Another good read Zillow, Prophet, Time Series, & Prices


WHY IS INTERMEDIATING HOUSES SO DIFFICULT? EVIDENCE FROM IBUYERS

  • Predict that households’ wiliness to pay for liquidity is highest in those markets
  • Sophisticated algorithmic pricing

My Perspectives
  • I love the housing.com approach to rank an area based on amenities, wellness, connectivity
  • Plus a pricing range based on amenities and facilities provided
  • Plus growth potential / Availability
  • Demand vs Supply
A combination of this would suggest a recommended price that a domain expert could adjust based on other external factors. ML is a guideline, not a blind predictor

Keep Thinking!!!

November 09, 2021

Leaf Classification

Leaf Classification

Paper #1 - Plant identification using deep neural networks via optimization of transfer learning parameters

Key Notes

  • 1.2 million labeled images of 1,000 different categories from the ImageNet = one thousand two hundred per class
  • LifeCLEF 2015 - 91,758 labeled images of different plant organs (e.g. flowers, fruits, leaves, and stems), from 1,000 - 91 per class

Parts of Plant

  • Branch 
  • Entire 
  • Flower 
  • Fruit 
  • Leaf 
  • LeafScan 
  • Stem 
  • Overall




  • Increasing the batch size from 20 to 60 improves the overall accuracy
  • 80 patches for data augmentation

Paper #2 - Multi-Organ Plant Classification Based on Convolutional and Recurrent Neural Networks

Key Notes

  • Feature engineering approaches such as Scale-invariant
  • feature transform (SIFT), Bag of Word (Bow), Speeded-Up
  • Robust Features (SURF), Gabor, Local Binary Pattern (LBP).
  • Most generally used features to distinguish leaves of different species
  • Hybrid generic-organ convolutional neural network, abbreviated HGO-CNN
  • Three different sizes: 256, 384 and 512
  • Crop 256 × 256 center pixels
  • Multi-Scale Plant Images Generation
  • During network training, 224 × 224 pixels are randomly cropped from the rescaled images and fed into the network


Keep Exploring!!!

November 04, 2021

Face Swapping - Research Reads

Paper #1 - FaceShifter: Towards High Fidelity And Occlusion Aware Face Swapping

Key Notes

  • Early replacement-based works simply replace the pixels of inner face region
  • GAN-based works  have illustrated impressive results
  • GAN-based network, named Adaptive Embedding Integration Network (AEI-Net)
  • Adaptive Embedding Integration Network (AEINet) to generate a high fidelity face swapping result


  • DeepFakes, and FSGAN all follow the strategy that first synthesizing the inner face region then blending it into the target face

Paper #2 - Face Swapping: Automatically Replacing Faces in Photographs



Paper #3 - Face Detection, Extraction, and Swapping on Mobile Devices

The Face Swap algorithm consists of five main steps:

  • Viola-Jones face detection using Haar-like features [1], Active Shape Model fitting [4], face rotation, skin-tone matching, and smoothing using Laplacian Pyramids [2]. The Viola-Jones face detection uses an OpenCV library [5] to detect faces from a frontal view. 
  • Laplacian Pyramid for face 1
  • Laplacian Pyramid for face 2
  • Laplacian Pyramid after Swapping
  • Final Collapsed Pyramid
  • Image blending Example
  • faceswap-GAN
  • FaceSwap
  • Faceswap Dev
  • Deepfake Faceswap
  • DeepFake Tools

More Reads

Keep Exploring!!!