Key Notes
- Create endpoint
- Expose as Rest API
- Realtime / Batch inference
- Async for image
- Deploy on CPU / GPU
Factors for Deployment
- Model Complexity
- Payload
- Complex Workflow
- Compute, Storage, Networking Cost
Large Language Models Supported
- Parallelize the model
- Latency of low millisecond
Async / Vision
- Input uploaded in the bucket
- Output loaded in bucket
Keep Exploring!!!
No comments:
Post a Comment