- Published on
Learn how we scaled a RAG pipeline to 10M embeddings with sub-100ms latency using Qdrant, cutting infrastructure costs by 67% while achieving 99.9% uptime.
Learn how we scaled a RAG pipeline to 10M embeddings with sub-100ms latency using Qdrant, cutting infrastructure costs by 67% while achieving 99.9% uptime.
A comprehensive exploration of AI model optimization techniques, focusing on distillation, fine-tuning and retrieval-augmented generation (RAG)