"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

October 05, 2024

Prompt Caching Analysis

Prompt Caching Analysis

Caching is enabled automatically for prompts that are 1024 tokens or longer. 

Prompt Caching is enabled for the following models:

  • gpt-4o (excludes gpt-4o-2024-05-13 and chatgpt-4o-latest)
  • gpt-4o-mini
  • o1-preview
  • o1-mini

Usage Guidelines

1. Place static or frequently reused content at the beginning of prompts: This helps ensure better cache efficiency by keeping dynamic data towards the end of the prompt.

2. Maintain consistent usage patterns: Prompts that aren't used regularly are automatically removed from the cache. To prevent cache evictions, maintain consistent usage of prompts.

3. Monitor key metrics: Regularly track cache hit rates, latency, and the proportion of cached tokens. Use these insights to fine-tune your caching strategy and maximize performance.

Ref - Link1, Link2

Keep Exploring!!!


No comments: