"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;
Showing posts with label Adoption. Show all posts
Showing posts with label Adoption. Show all posts

January 14, 2025

Optimizing AI Models for Low Latency: Techniques and Best Practices

GenAI product building has three key components: consistency, accuracy, and latency. These components are crucial and should be implemented in stages:

  • Build a solid data foundation.
  • Develop an approach that ensures consistent results.
  • Ensure the results are accurate.
  • Optimize for latency.

In every real-time implementation:

Once consistency and accuracy are achieved, latency plays a key role.

Techniques for Low Latency Optimization

After achieving accuracy, focus on these techniques to optimize latency:

  • Semantic Cache Implementation for similar questions.
  • Disable Logging in the production environment.
  • Database Optimization: Ensure proximity to the model serving region.
  • Multi-Prompt Steps in messaging.
  • Low Latency Models: GPT-4o-mini.
  • Text Optimization: Balance cost and performance (e.g., Claude 3.5 Sonnet).
  • Complex Reasoning: Use Gemini 1.5 Pro (gemini-1.5-pro).
  • Optimize Values: Fine-tune input tokens, output tokens, temperature, and max tokens.
  • Prompt Optimization: Leverage model context support.
  • Utilize Larger Context Windows: Implement multitask prompts.

Infrastructure and Cost Considerations

  • Quantization Effects: Using reduced precision (e.g., int8 instead of float32) may introduce minor, predictable delays due to quantization and dequantization steps.
  • Fine-Tuned GPT Models: Require high-quality data for implementation.

Top 5 Practices to Master GenAI Product Development

  • Solve the GenAI Aspect: Focus on prompt engineering and model versioning.
  • Scale for Multiple Formats: Use prompt catalogs and maintain prompt versions.
  • Optimize for Low Latency: Implement caching for key data, reuse existing data, and leverage retrieval-augmented generation (RAG) over documents, graphs, and summarized data.
  • Ensure Accuracy Across the Board: Preprocess, normalize, and organize data effectively for the use case, using RAG for enhanced results.
  • Focus on Safe Usage: Enforce guardrails to ensure responsible and secure deployments.

Entry of Agents

  • Once the foundational aspects are achieved, you can migrate to an agentic approach. Ensure robust controls for seamless transitions.

Personal Note

My focus has been on solving and solutioning diverse product use cases. Being an independent consultant has allowed me to concentrate on solutioning aspects of GenAI, LLMs, unstructured data, prompt optimization, and latency reduction. It’s a tradeoff between working on focused areas versus engaging across different layers of implementation.

Happy to collaborate if you are working on GenAI product building or Enterprise GenAI adoption!

Happy Learning!!!


February 23, 2023

Disruptive, Fast Forward GPT Days

News #1 - Bain & OpenAI Lineup to Solve more cases

Key use cases they target with ChatGPT are

  • Building next-generation contact centers for retail banks, telco and utility companies to support sales and service agents with automated, personalized, and real-time scripts, and to improve customer experience.
  • Boosting turn-around time for leading product and service marketers by using ChatGPT and DALL·E to develop highly personalized ad copy, rich imagery, and targeted messaging.
  • Helping financial advisors improve their productivity and responsiveness to clients through the analysis of client dialogues and financial literature, and the generation of digital communication.

News #2 - AWS and Hugging Face Team up for more adoption, focused solutions

Hugging Face has become the central hub for machine learning, with more than 100,000 free and accessible machine learning models. More solutions with AWS Platform

News #3 - You can also leverage ChatGPT to build your own model

OpenAI’s Foundry will let customers buy dedicated compute to run its AI models

The cost will be expensive though - Instances won’t be cheap. Running a lightweight version of GPT-3.5 will cost $78,000 for a three-month commitment or $264,000 over a one-year commitment. To put that into perspective, one of Nvidia’s recent-gen supercomputers, the DGX Station, runs $149,000 per unit.

News #4 - Coca-Cola Signs As Early Partner for OpenAI’s ChatGPT, DALL-E Generative AI

Coca-Cola will team with OpenAI and Bain & Company to use OpenAI’s ChatGPT and DALL-E platforms to craft personalized ad copy, images, and messaging, the companies announced in a press release. 

Expect more tags, oneliners, new images, and eye-catching images.

Three Phased Approach

  • Custom NER, Parsing, Intent Extraction (Level 1 - Inhouse)
  • Leverage Other lightweight fine-tuned models for summarization, topics (Level 2 - Finetuned)
  • Leverage LLM models to rank/ Evaluate results (Level 3 - LLM Inputs)

Keep Exploring!!!

February 19, 2023

ChatGPT, LLM, Adoption

Tech Adoption and Other Challenges

From the post, Customizing adding my perspectives 

Interesting Summary, Some key points I liked from the list

  • They are useful as writing aids.
  • Better systems will come
  • Current LLMs should be used as writing aids - Content Writing Systems
  • People will use them for what they are helpful with

I hope a few more things can be added to list

  • Transparency around which sources of data
  • Explainable Answers - transparency around how the model arrived
  • Regulating models for fair use, Intentional misuse to change facts/relevance 

Already products are on the market

  • Tome - Generative storytelling
  • runwayml - GAN Vision / Image / Text Models
  • jasper.ai - Get high-quality copy written fast with AI

Future could be Bring your own data - Build your own model. In every domain, Healthcare, Banking, Fintech, History, Geography, and Aerospace we could build custom models which could be used for assistance/learning / generating code/facts/formulas. AI assistants as Teaching assistants :)

"In a few years, people will be comfortable interacting with models and even trust them as their main source of guidance / know by questioning"

Another interesting post on the current shortcomings, I would apply the same to bridge with AI

Bridging the Divided Population - The Big Gap

  • We see that of around 6,900 languages currently alive
  • Just 291 have Wikipedia
  • That is just 4,2% of all languages

The Big Opportunity

  • Use Vision / OCR to Digitize all the literature
  • Use Neural Machine Translation to translate and increase more Digital Artifacts / Promote more reading / diverse topics
  • Create new artifacts, Reach broader population
  • Embrace AI tech to aid education in developing countries

In Long Term

  • Finetune models to understand social biases such as gender, race, religion

To go far, it's important to go and grow together :)

Most LLM are from the below set of companies


Ref - Link


Keep Exploring!!!

January 25, 2023

AI vs Natural Intelligence

AI - any device that does things that associate with human intelligence
Natural Intelligence - Imagination, creativity, fantasy, intuition, problem-solving. AI and natural intelligence are two different things, each supplies a lack of other 
Human Interpretation is leaps ahead of Machine Interpretation

Right Use Case

With the Right context / matured model, Tesla Safety Stats - Q3 2022 - In the 3rd quarter, we recorded one crash for every 6.26 million miles driven in which drivers used Autopilot technology.


Purposeful Bound to Fail 

In some cases, where machines can fail, for example in a classification problem


Selecting the Right Use Cases


ML is more of a collaborative effort. Business needs - You spot it, and you can demonstrate competency to get a buy-in to implement. Alignment is critical to understand the short-term/long-term impact. You may not be immediately right but eventually, you may be right

Spotting what others miss needs all three views - domain, data, and ML opportunities.

Datasets Search

  • https://datasetsearch.research.google.com/
  • https://github.com/awesomedata/awesome-public-datasets
  • https://archive.ics.uci.edu/ml/index.php
  • https://www.kaggle.com/datasets
  • https://msropendata.com/

  • Customer related
  • Operations related
  • Risk Avoidance
Ref - Link

Ref - Link

Keep Exploring!!!

January 20, 2023

Business of AI is hard

The business of AI is hard

Ref Article

Key Challenges

  • Heavy Cloud Infra Training needs
  • Ongoing human support
  • Problem of edge cases
  • Commoditization of AI challenges (You build someone gives for free API)

Observation - AI companies resemble traditional IT services

Hidden debts

  • Retraining is increasingly recognized as an ongoing cost
  • Operate on rich media like images, audio, or video
  • Human-in-the-loop systems take two forms, contributing to lower gross margins for many AI startups.
  • Social media companies, for example, employ thousands of human reviewers to augment AI-based moderation systems. 

Accuracy vs Cost

  • Financial resources are tied up until model accuracy reaches an acceptable level.
  • Visibility into the distribution of the customer’s data and eliminates some edge cases prior to deployment

While there is no silver bullet to reaching this ideal state, one key is to understand as much as possible about your customers – and their data – before agreeing to a deal

Keep Exploring!!!