RAG systems, which have shown great promise in development environments, can be particularly challenging to deploy in production environments. One major hurdle is ensuring access and security. In production, RAG systems must handle a large volume of user requests while maintaining the security and integrity of the data. This requires robust access controls, encryption, and monitoring, which can be difficult to implement and maintain. In contrast, development environments often have more relaxed security settings, making it easier to test and iterate on RAG systems without the added complexity of security protocols.
“The case for production-grade RAG systems in enterprises warrant much deeper scrutiny over system design, given performance, cost and security considerations.”
In this 9 part series, we discuss various system design considerations that directly impact RAG system performance, cost and security which serves as a guide for CTOs, CISOs and AI Engineers.
Download the complete guide on Advanced RAG for Enterprises
Choice of vector databases based on types of supported search algorithm and security options
When selecting a vector database, it is crucial to conduct a thorough assessment of your specific use cases and technical requirements prior to making a decision. This is because different vector databases exhibit varying degrees of support for diverse search algorithms (such as K-Nearest Neighbors (KNN) and Approximate Nearest Neighbors (ANN) ) and distance calculation algorithms (like L2, Cosine etc.). Furthermore, they differ in terms of scalability, security, and options for performance optimization.
By carefully evaluating your use cases and technical needs, you can determine which database best aligns with your requirements. For example, if your application necessitates exact matching, you may prioritize a database that supports KNN search, whereas if you are working with large-scale data sets, you may opt for a database that offers efficient ANN search and horizontal scaling capabilities. By considering your specific needs, you can choose a vector database that optimizes performance, security, and scalability for your applications, ensuring a robust and efficient development process.
A common starting point is to assess whether KNN or ANN search is most relevant. Avoiding the K-problem, speed, size of database and the need for exact matches are examples of considerations for KNN vs ANN search.
Examples of instances favouring ANN search
Unknown Optimal K Value – You’re building a recommendation system, and you’re unsure what the optimal K value should be for KNN search. In this case, ANN search is more relevant because it can efficiently search for nearest neighbors without requiring a fixed K value. ANN search can provide a comprehensive set of recommendations, while KNN search may miss relevant items if the chosen K value is too low.
Large-Scale Dynamic Data – You’re working with a massive, constantly evolving dataset (e.g., user behavior, sensor data), and you need to perform frequent nearest neighbor searches. ANN search is more relevant here because it can handle dynamic data and provide efficient search results without requiring a fixed K value. KNN search may become computationally expensive and slow with large, dynamic datasets, making ANN search a better choice.
In both scenarios, ANN search provides a more comprehensive and efficient solution, especially when the optimal K value is unknown or the dataset is large and dynamic.
Examples of instances favouring KNN search
Exact Matching – You’re building a fraud detection system that needs to identify exact matches between transaction patterns. In this case, KNN search is more relevant because you need to find the exact nearest neighbors (e.g., identical transaction patterns) to determine if a new transaction is fraudulent. ANN search, which approximates the nearest neighbors, may not be suitable since it may return similar but not exact matches, leading to false positives or false negatives.
Small Dataset – You’re working with a small dataset of product features (e.g., color, size, material) and need to find the most similar products to a given query product. Since the dataset is small, the computational overhead of KNN search is manageable, and you need exact nearest neighbors to ensure accurate results. In this case, KNN search is more relevant, and ANN search may not be necessary since the dataset is small enough to compute exact distances efficiently.
In both scenarios, KNN search provides exact nearest neighbors, which is crucial for accurate results, whereas ANN search may introduce approximations that could lead to errors.
Authentication
The adoption of RAG systems has been popularized by many opensource RAG frameworks, including some that are low/no-code. Many of these frameworks and tools prioritize ease of switching between models and handling upstream processes (e.g. document processing) but lack robustness and secure means for interfacing with vector databases.
Even if a RAG system is intended for internal use within a private network, it is important to recognize that bad actors can exist within local area networks and that even the most secured systems can be backdoored. Organizations working with vector databases as part of production RAG systems should therefore approach security the same way they would with any SQL database systems, paying close attention to authentication, encryption and access control.
Overall, production environment RAG systems presents the following key questions. Dive into each of the subtopics through the links below:
- API model access vs hosted model on self-managed instances
- Choice of model and precision as a trade-off between performance and running cost
- Choice of vector databases based on types of supported search algorithm and security options
- Data pipeline design for reliability, content safety and performance-focused data pre-processing
- Choice of chunking approach based on type of content: length, sentences or logical chunks
- Pre-retrieval filters and transformations for security and retrieval performance optimization
- Post-retrieval ranking and stacking approaches for performance and cost optimization
- Guardrail implementation with consideration for different modalities of inputs and outputs
- Logging mechanisms to facilitate performance, cost and security analyses
Download the complete guide on Advanced RAG for Enterprises