Advanced RAG techniques with Spring AI

Retrieval Augmented Generation (RAG) has become a standard approach for enhancing LLM responses with domain-specific knowledge. While basic RAG implementations can be straightforward, building production-grade solutions often requires more sophisticated techniques. Spring AI provides several advanced RAG features that can help you build more robust and effective applications.

Getting started with documents

Before diving into advanced features, let's look at how to load documents into the VectorStore:

@Service
public class DocumentLoader {
    private final VectorStore vectorStore;

    public void addDocument(MultipartFile file) throws IOException {
        var resource = new InputStreamResource(file.getInputStream());
        vectorStore.write(new TokenTextSplitter().apply(
            new TikaDocumentReader(resource).read()));
    }
}

This basic setup uses Spring AI's TikaDocumentReader to handle various document formats and TokenTextSplitter to break documents into manageable chunks.

The RetrievalAugmentationAdvisor

Spring AI recently introduced the RetrievalAugmentationAdvisor, an experimental feature that provides a modular approach to building RAG pipelines. While the classic QuestionAnswerAdvisor works well for basic scenarios, the RetrievalAugmentationAdvisor offers more control over the RAG process.

Here's a basic example:

Advisor advisor = RetrievalAugmentationAdvisor.builder()
    .documentRetriever(VectorStoreDocumentRetriever.builder()
        .similarityThreshold(0.50)
        .vectorStore(vectorStore)
        .build())
    .build();

String answer = chatClient.prompt()
    .advisors(advisor)
    .user(question)
    .call()
    .content();

Advanced query processing

One common challenge in business applications is handling imprecise or contextual queries. Spring AI provides several query transformation techniques to address this:

Query Expansion

When users might phrase the same question differently, the MultiQueryExpander can help:

MultiQueryExpander queryExpander = MultiQueryExpander.builder()
    .chatClientBuilder(chatClientBuilder)
    .numberOfQueries(3)
    .build();

List<Query> queries = expander.expand(
    new Query("What's our refund policy?")
);

This generates multiple variations of the query, increasing the chances of finding relevant documents. This is particularly useful for customer service applications where users might use different terminology than your documentation.

Context-aware queries

For applications that need to maintain conversation context, the CompressionQueryTransformer can help:

QueryTransformer transformer = CompressionQueryTransformer.builder()
    .chatClientBuilder(chatClientBuilder)
    .build();

Query query = Query.builder()
    .text("What about premium users?")
    .history(previousMessages)  // Previous conversation context
    .build();

Query transformedQuery = transformer.transform(query);

Document processing pipeline

The RetrievalAugmentationAdvisor allows you to customize how documents are processed before being used for generation:

Document selection

For large document collections, you can implement more sophisticated document selection strategies:

DocumentRetriever retriever = VectorStoreDocumentRetriever.builder()
    .vectorStore(vectorStore)
    .similarityThreshold(0.73)
    .topK(5)
    .filterExpression(new FilterExpressionBuilder()
        .eq("department", "legal")
        .build())
    .build();

This example shows how to filter documents by metadata and adjust similarity thresholds, which is useful when working with documents from different departments or categories.

Error handling and edge cases

In production environments, you need to handle cases where relevant documents might not be found. The ContextualQueryAugmenter can be configured to handle these situations:

Advisor advisor = RetrievalAugmentationAdvisor.builder()
    .documentRetriever(retriever)
    .queryAugmenter(ContextualQueryAugmenter.builder()
        .allowEmptyContext(true)
        .build())
    .build();

This configuration allows the system to fall back to general responses when no relevant documents are found, rather than failing completely.

Implementing guardrails

When deploying AI applications in a business context, it's important to establish appropriate boundaries and controls to ensure that the chat is on topic and safe. Spring AI provides several advisor implementations to help implement these guardrails.

Safety and moderation

The SafeguardAdvisor helps ensure responses meet your application's requirements by filtering sensitive content:

Advisor safeguardAdvisor = SafeguardAdvisor.builder()
    .chatClient(moderationClient)
    .sensitiveWords(List.of(
        "confidential",
        "secret",
        "internal",
        "proprietary",
        "classified"
    ))
    .build();

ChatClient chatClient = ChatClient.builder(chatModel)
    .advisor(safeguardAdvisor)
    .build();

You can combine this with other advisors to create layered protection:

ChatClient chatClient = ChatClient.builder(chatModel)
    .advisors(List.of(
        safeguardAdvisor,
        retrievalAugmentationAdvisor,
        new TopicBoundaryAdvisor("technical documentation")
    ))
    .build();

This approach allows you to:

Filter inappropriate content
Keep discussions on-topic
Implement custom validation logic
Chain multiple validation steps

Conclusion

Spring AI's advanced RAG features make it easier to build robust, production-ready applications that combine the power of LLMs with your business's specific knowledge and requirements. The modular architecture of the RetrievalAugmentationAdvisor allows you to start simple and add sophistication as your needs evolve.