Prompt Caching Analysis
Caching is enabled automatically for prompts that are 1024 tokens or longer.
Prompt Caching is enabled for the following models:
- gpt-4o (excludes gpt-4o-2024-05-13 and chatgpt-4o-latest)
- gpt-4o-mini
- o1-preview
- o1-mini
Usage Guidelines
1. Place static or frequently reused content at the beginning of prompts: This helps ensure better cache efficiency by keeping dynamic data towards the end of the prompt.
2. Maintain consistent usage patterns: Prompts that aren't used regularly are automatically removed from the cache. To prevent cache evictions, maintain consistent usage of prompts.
3. Monitor key metrics: Regularly track cache hit rates, latency, and the proportion of cached tokens. Use these insights to fine-tune your caching strategy and maximize performance.
Keep Exploring!!!
No comments:
Post a Comment