"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

August 28, 2023

The Phase of Be Aware - When Enough Might be Enough

Motivation and Consistency

1. Identifying Your Most Powerful Leavers: It involves understanding your strengths and weaknesses, and identifying the most effective strategies that push you towards your goals. This could be a particular skill, a set of practices, a mindset, or a network of supportive people. Identifying these levers is crucial as they offer the most significant potential for advancement and improvement.

2. Doubling Down on What Moves the Needle: Once you have identified the most efficient lever, it would be advantageous to double down on that aspect to maximize your performance. This means investing your time, effort, and resources, and focusing on activities, tasks, or strategies that have proven to progress significantly.

Willingness to go the extra mile

3. Never Stop Learning: Learning is an ongoing process, and with the rapid advancement in every field, it swiftly becomes evident that there is always something new to learn. Keeping up with the latest advances in your area of interest, attending webinars or seminars, and studying new skills can maintain the brain's resilience and adaptability, contributing to your ongoing success.

4. Anchor Your Self-Assessment in Reality, Not Assumption: It is easy to deceive oneself about personal progress or skills. Instead of making assumptions based on personal perceptions, always refer to concrete data or feedback for self-assessment. This could involve seeking out other people’s opinions, comparing your performance metrics to benchmarks, or looking for measurable signs of improvement or progression. It helps keep a grounded and realistic view of your capabilities and progress, allowing for more effective planning and decision-making.

Caught in the Loop of people/process

5. Be Aware When Enough Might be Enough: Always striving for more and setting higher goals can be motivating, but you should also be aware of your capacity and avoid burnout. This may involve managing your time efficiently, knowing when to say no, and setting realistic boundaries and expectations. Also, it’s necessary to recognize when a specific strategy or project has reached its maximum potential and any more investment of time or resources would give diminishing returns. 

  • Sometimes you have to stop earning, to start learning
  • Clocking 10 hours online vs. Reading/coding 4 hours without interruptions
  • Managing your energy to learning vs convincing people to learn
  • Sharing information and convincing with every level
  • Getting into conversations that deter both energy and purpose
  • Not aligned with the right set of values / Things built on purpose or built to work with limited functionality

Keep Exploring!!!

August 25, 2023

GPT to improve Product Search

Prompt - Return attribute, value coreference for below query <Return all black shoes with prices less than 50 dollar>

  • Attribute: Color, Value: Black
  • Attribute: Price, Value: Less than 50 dollars
  • Attribute: Product type, Value: Shoes

Return attribute, value coreference for below query <Cricket is interesting game, India is in Top 10 cricket playing countries>

  • Attribute: Game, Value: Cricket
  • Attribute: Rank, Value: Top 10
  • Attribute: Country, Value: India

Keep Exploring!!!

August 24, 2023

Vision - Virtual Try on - Tryonexample with InstructPix2Pix

InstructPix2Pix: Learning to Follow Image Editing Instructions

  • To obtain training data for this problem, we combine the knowledge of two large pretrained models—a language model (GPT-3) and a text-toimage model (Stable Diffusion)
  • Instruction-based image editing as a supervised learning problem
  • An approach that combines two large pretrained models, a large language model and a text-toimage model, to generate a dataset for training a diffusion model to follow written image editing instructions

Input Image

Prompt for Background
Prompt for Clothing



Keep Exploring!!!

August 23, 2023

Extract Reviews with Curl Command

 Google Chrome + Inspect Option 


Keep Exploring!!!

August 21, 2023

GenAI - Lessons getting things to production

  • Exploratory Analysis: Use case identification
  • Investigative Analysis: Data Exploration and Use Case Identification
  • Leadership Buy-in: Presentation, Discussion, Understanding the Prospects and Advantages of GenAI
  • Impactful Use Cases: Prototype use cases with available data. Select a handful of use cases for immediate progression to production
  • Hands-on Session: Elevate Business Awareness about GenAI and its Opportunities. Educate Businesses on GenAI and opportunities
  • Development: Develop identified use cases, fast track to production
  • Evaluate: Evaluate in Pre-prod and get aligned. Test in Pre-production Environment and Align Accurately
  • Move selected use cases into production
  • Automating Current Manual Workflows: Work on Production End to End architecture automating some of the manual processes currently. Implement Full-Scale Production Architecture, Automating Current Manual Workflows

Continue the Iteration Process for Constant Progress!

Keep Exploring!!!


August 20, 2023

GPT and token count

One token - ~4 characters of text for common English text.

Tokenizer


Pricing costs Links


Keep Exploring!!!

Plant Identification Using Convolution Neural Network and Vision Transformer-Based Models

Recently, my team published a vision paper, providing valuable insights and lessons which will benefit our future work. Here I highlight those key experiences and challenges: 

Plant Identification Using Convolution Neural Network and Vision Transformer-Based Models

  • First, we grappled with open-ended questions in our problem statement, requiring us to think critically and flexibly.
  • Second, we used past experiences, research approaches, and current vision models to craft our unique approach for this paper.
  • Third, was the phase of experimenting which we had to analyze, timebox, and finalize.
  • We also faced data challenges, drawing inspiration from similar research papers to overcome this hurdle.
  • An important achievement for us was reaching state-of-art accuracy in our findings.
  • We considered the scalability of our approach, contemplating how it can be implemented as we include multiple categories/classes.
  • Focus was directed toward developing a repeatable architecture and effectively capturing feedback for continuous improvement.
  • A significant portion of our time was dedicated to extensive documentation, conducting numerous experiments, and evaluating metrics.
  • We navigated through the publication process, ensuring our work reached the right platforms.
  • Lastly, we sought collaboration with like-minded clients, with whom we could work on making our learning reusable.

This experience has been thoroughly enriching for our team and we remain excited about our journey ahead

Keep Learning!!!

August 19, 2023

Aspect-based sentiment analysis - NLP

Aspect-based sentiment analysis is about identifying different aspects in a given topic. TextBlob will help us to calculate the polarity of the sentiments






  • context exclusion
  • intent extraction
  • co-reference resolution
  • sentiment analysis

Keep Exploring!!!

August 12, 2023

Google QnA - vs OpenAI + Pinecone

Thanks to Pradeep blog, Well documented. A few changes had to do to make things work

Changing index size to 1536 to map OpenAI embedding results



I was looking for similar offering from google, This post was useful 





More Reads - Link

Seems everyone catching up on LLM Race!!!

LLM SURVEY REPORT from MLOPS Community points out issues clearly

Landscape of GenAI

The challenges in LLMs


What options to leverage and how can we build solutions?


Ref - Link

Keep Exploring!!!


Google Vertex Image Classification Step by Step

Summary notes from Vertex AI Classification task

1. Setting up Experiment Classification - Multiple Labels


2. Upload data and label the dataset. We need to automate, For learning purposes experimented upload option


3. Huges files do take time to upload



4. Once uploaded create labels for them, Now dataset is labelled, label creation, and mapping done here




5. Training model, Setting up configuration options



6. Accuracy vs Latency Tradeoff 

7. Hardware allocation 

8. Dataset Mapping



9. Training Results and Summary



Happy Learning!!!

What is difference between Experience vs Expertise

To differentiate we need to understand below bias in our decisions, perspectives, views

Confirmation Bias - This refers to interpreting new information in a way that confirms our pre-existing beliefs. For instance, if you have knowledge of SQL Server, you may believe that the index storage patterns in NoSQL are similar. While there can be similarities, it's crucial to be aware of the differences as well.

Misconceptions of Skills - Often, we mistake awareness of technology as expertise. Merely being able to compile and produce output doesn't necessarily mean understanding how it works or its intricate mechanics.

Halo Effect - This is when you either completely like or dislike everything about a person or thing, with no middle ground. Judging a technology without a thorough understanding of it is an example of this effect. Intellectual humility - Acknowledging what you don't know is the drawing of wisdom

WYSIATI (What You See Is All There Is) - Here, you cannot consider what you do not know. It's about having a balanced view versus a mindset of 'I know it.

Consequences of this

  • We have a larger number of data scientists who possess a basic understanding of a variety of domains, rather than deep expertise in a few domains. 
  • Moreover, we have more data scientists who are algorithm-oriented compared to those who can skillfully blend algorithms, data, and common sense. 
  • The absence of awareness in domains and data will only result in solutions similar to those found on platforms like Kaggle, which don't necessarily meet business requirements.
  • Preparation - No amount of preparation is enough to face the customer, Every possible scenario needs to be tested. The only thing that matters is the correctness of your analysis

Keep Thinking!!!

August 08, 2023

Bridging Data Science Theory vs Practice, Practice Domain + Data blended learning

  • Spending more time on data, fields, insights
  • Break down the problem statement into multiple areas, more minor, manageable parts 
  • Provide deeper details of each sub-task/approach 
  • Verbal communication with actionable outputs to customers
  • Outlining problems vs probe the root cause, implications, and possible solutions
  • Proactive to call out issues / discuss options than waiting for it
  • Track work planned vs actuals, the spot where plans are missing
  • Own up to mistakes, Plan to avoid repetitive patterns of issues

Theory - I know this algorithm will solve the problem
Skills - I can generate output, and I have a view of the solution
Expertise - This will work for these cases, fail for these cases, This has these patterns to be considered

Sometimes - The person who tells the opinion is more important than the opinion



Keep Exploring!!!

August 06, 2023

Myth of productivity / Effort / Deadlines / Creativity

Loved the critical things in this write-up. Things align with my views I have reposted here

  • Doing “something” is not necessarily better than doing nothing.
  • The pressure of deliverables kills creativity and time
  • Lead someone on a project and be led by someone on some other projects. 
  • If there is one person who always dominates and talks over you, I actively avoid that person
  • Hours do not correlate with productivity, Meaningless calls/status reports are only energy drainers
  • Set your boundaries and make them clear to everyone. Takes breaks and focus on your Life
  • Aspire to Maintain the highest standard in everything, Better presentations, crisp answers, and one-liner replies
  • Talk slowly and clearly. Use precise words
  • Take some challenging work/teaching, It will help you get better at crisp concepts
  • Becoming a better prester can only be done by trying out / practicing larger audiences / accepting bad criticism
  • Accept you are not special, people can hate you for no-reason
  • Do something mindless to detach
  • Stop the bad habit of checking for notifications after hours.
  • Collect ideas, Read more relevant content, Revise, Relook at options, Read - Pause - Reflect for solutions
Good reads - Link

Keep Walking!!!





August 05, 2023

Good vs Bad prompting practices

Okayish one

Your task is to - Write a one liner to summarize batman movie

You should write in positive tone

You should write about the positive intent of movie

You should make it inspiring note

You should write in less than 400 characters

Result - "Batman: A captivating and inspiring tale of resilience and justice, portraying a fearless vigilante driven to protect his city and battle the forces of corruption and evil, reminding us that everyone can make a difference."


Good one

Your task is to - Write a one liner to summarize batman movie

You should write in positive tone, positive intent with inspiring words and total length less than 200 words

Result - "An inspiring and thrilling journey of Bruce Wayne's transformation into Batman, a symbol of justice and hope that courageously battles the pervasive darkness in Gotham City."

  • Words are better than character limitations
  • One liner condensed instruction is much better than additive instructions
I observe a lot of fluffy descriptions


Ref - Link

Keep Prompting!!!


August 01, 2023

Code Review pointers

  • Purpose of the code, the approach taken, and any specific areas 
  • Error and Exception Handling: Look for proper error and exception handling for robustness. The application should be able to survive erroneous situations.
  • Separating source code, tests, and configuration files neatly.
  • Separate Data Preprocessing, Modeling, and Evaluation Code: It’s crucial to separate stages of the development process into distinct steps. This allows both modularity and an easier debugging experience, as issues can be located more swiftly in well-defined sections.
  • Design a Configurable Pipeline: A pipeline that can be configured allows for easy adjustments to be made and allows various models, data, and preprocessing steps to be swapped out with one another efficiently.
  • Breaking up a configuration into separate classes can increase modularity. Each config class can be responsible for one part of the application's configuration. This makes the code easier to read and maintain.
  • How easy to test, How standalone it is
  • Key approaches applied / Do's and do
  • Is it at a level people can come/contribute / Open to DS forecast tracking people
  • Basics needed to work on it
  • Basics needed to understand it / Any references/patterns to cross-check
  • Does it have good enough documentation/testing 
  • Have we implemented for one DB, or have we implemented for others to follow the pattern?
  • Have we tested end-to-end in one flow?
  • How do we manage configurations across data, ML, jobs, and results? Are these separate classes? 
  • This current work is fetching results, running pipeline, fetch status, Is there equivalent work done?
  • What flow do we need porting type work? Mimic the same patterns?
  • How to test standalone?
  • What minimal knowledge to operate on this

Keep Exploring!!!

Prompt Examples - Zero shot - Few Shot

Zero-Shot

Classify which of the below foods is vegetarian. Given it vegetarian or  non-vegeterian. The foods to classify are

  1. Carrot
  2. Coconut
  3. Brinjal
  4. Chicken
  5. Egg

Few shot

Given a mapping of fruits, the person prefers

  1. <carrot, count, brinjal - likes>
  2. <Chicken, egg - does not like>

Suggest whether a person likes potatoes or meatballs. Give answer as 

1. item - like or don't like

Keep Exploring!!!

PDF Data Extraction

  • Evaluate if there is problem in data extraction. unistructured - https://pypi.org/project/unstructured/
  • yMuPDF, also known as Fitz, is a Python binding for the MuPDF library
  • pdfplumber - https://pypi.org/project/pdfplumber/
  • Camelot - https://camelot-py.readthedocs.io/en/master/
  • img2table - https://github.com/xavctn/img2table
Keep Exploring!!!