Retrieval Augmented Generation (RAG) has become a standard approach for enhancing LLM responses with domain-specific knowledge. While basic RAG implementations can be straightforward, building production-grade solutions often requires more sophisticated techniques. Spring AI provides several advanced RAG features that can help you build more robust and effective applications.
Getting started with documents
Before diving into advanced features, let's look at how to load documents into the VectorStore
:
@Service
public class DocumentLoader {
private final VectorStore vectorStore;
public void addDocument(MultipartFile file) throws IOException {
var resource = new InputStreamResource(file.getInputStream());
vectorStore.write(new TokenTextSplitter().apply(
new TikaDocumentReader(resource).read()));
}
}
This basic setup uses Spring AI's TikaDocumentReader
to handle various document formats and TokenTextSplitter
to break documents into manageable chunks.
The RetrievalAugmentationAdvisor
Spring AI recently introduced the RetrievalAugmentationAdvisor
, an experimental feature that provides a modular approach to building RAG pipelines. While the classic QuestionAnswerAdvisor
works well for basic scenarios, the RetrievalAugmentationAdvisor
offers more control over the RAG process.
Here's a basic example:
Advisor advisor = RetrievalAugmentationAdvisor.builder()
.documentRetriever(VectorStoreDocumentRetriever.builder()
.similarityThreshold(0.50)
.vectorStore(vectorStore)
.build())
.build();
String answer = chatClient.prompt()
.advisors(advisor)
.user(question)
.call()
.content();
Advanced query processing
One common challenge in business applications is handling imprecise or contextual queries. Spring AI provides several query transformation techniques to address this:
Query Expansion
When users might phrase the same question differently, the MultiQueryExpander
can help:
MultiQueryExpander queryExpander = MultiQueryExpander.builder()
.chatClientBuilder(chatClientBuilder)
.numberOfQueries(3)
.build();
List<Query> queries = expander.expand(
new Query("What's our refund policy?")
);
This generates multiple variations of the query, increasing the chances of finding relevant documents. This is particularly useful for customer service applications where users might use different terminology than your documentation.
Context-aware queries
For applications that need to maintain conversation context, the CompressionQueryTransformer
can help:
QueryTransformer transformer = CompressionQueryTransformer.builder()
.chatClientBuilder(chatClientBuilder)
.build();
Query query = Query.builder()
.text("What about premium users?")
.history(previousMessages) // Previous conversation context
.build();
Query transformedQuery = transformer.transform(query);
Document processing pipeline
The RetrievalAugmentationAdvisor
allows you to customize how documents are processed before being used for generation:
Document selection
For large document collections, you can implement more sophisticated document selection strategies:
DocumentRetriever retriever = VectorStoreDocumentRetriever.builder()
.vectorStore(vectorStore)
.similarityThreshold(0.73)
.topK(5)
.filterExpression(new FilterExpressionBuilder()
.eq("department", "legal")
.build())
.build();
This example shows how to filter documents by metadata and adjust similarity thresholds, which is useful when working with documents from different departments or categories.
Error handling and edge cases
In production environments, you need to handle cases where relevant documents might not be found. The ContextualQueryAugmenter
can be configured to handle these situations:
Advisor advisor = RetrievalAugmentationAdvisor.builder()
.documentRetriever(retriever)
.queryAugmenter(ContextualQueryAugmenter.builder()
.allowEmptyContext(true)
.build())
.build();
This configuration allows the system to fall back to general responses when no relevant documents are found, rather than failing completely.
Implementing guardrails
When deploying AI applications in a business context, it's important to establish appropriate boundaries and controls to ensure that the chat is on topic and safe. Spring AI provides several advisor implementations to help implement these guardrails.
Safety and moderation
The SafeguardAdvisor
helps ensure responses meet your application's requirements by filtering sensitive content:
Advisor safeguardAdvisor = SafeguardAdvisor.builder()
.chatClient(moderationClient)
.sensitiveWords(List.of(
"confidential",
"secret",
"internal",
"proprietary",
"classified"
))
.build();
ChatClient chatClient = ChatClient.builder(chatModel)
.advisor(safeguardAdvisor)
.build();
You can combine this with other advisors to create layered protection:
ChatClient chatClient = ChatClient.builder(chatModel)
.advisors(List.of(
safeguardAdvisor,
retrievalAugmentationAdvisor,
new TopicBoundaryAdvisor("technical documentation")
))
.build();
This approach allows you to:
- Filter inappropriate content
- Keep discussions on-topic
- Implement custom validation logic
- Chain multiple validation steps
Conclusion
Spring AI's advanced RAG features make it easier to build robust, production-ready applications that combine the power of LLMs with your business's specific knowledge and requirements. The modular architecture of the RetrievalAugmentationAdvisor
allows you to start simple and add sophistication as your needs evolve.