In my previous article, I showed how to use an LLM to make sentiment analysis, draft responses to customer feedback, and create a chat-with-documents experience that uses your content.
In this blog post, I continue exploring business use cases for AI by summarizing an uploaded document. I use Apache Tika to read the file contents and support almost any file type. I then pass the extracted text to Open AI through Spring AI to generate a response.
Here are the main steps to summarizing text with AI in Java. You can find the full source code in my GitHub.
Add dependencies
This tutorial assumes you have a Spring Boot project. If you don't have one but want to follow along, download a Vaadin Flow project starter.
The three dependencies we need are:
- Apache Tika for parsing documents
- Spring AI for calling the LLM
- CommonMark for parsing the returned Markdown to HTML
Add the following dependencies to your pom.xml
:
<dependency>
<groupId>org.apache.tika</groupId>
<artifactId>tika-core</artifactId>
<version>2.9.2</version>
</dependency>
<dependency>
<groupId>org.apache.tika</groupId>
<artifactId>tika-parsers-standard-package</artifactId>
<version>2.9.2</version>
</dependency>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-openai-spring-boot-starter</artifactId>
</dependency>
<dependency>
<groupId>org.commonmark</groupId>
<artifactId>commonmark</artifactId>
<version>0.22.0</version>
</dependency>
Initialize the Spring AI ChatClient
Before using Spring AI, you need to add the following to your application.properties
:
spring.ai.openai.api-key=${OPENAI_API_KEY}
spring.ai.openai.chat.options.model=gpt-4o-mini
The example above reads the API key from an environment variable. You can also write yours inline for testing. Select the model you want to use, in this case gpt-4o-mini
.
Inject a ChatClient.Builder
in your Vaadin Flow view (or any Spring component) and configure the ChatClient
:
public SummarizeView(ChatClient.Builder chatClientBuilder) {
chatClient = chatClientBuilder
.defaultSystem("""
Summarize the following text into a concise paragraph that captures the main points and essential details without losing important information.
The summary should be as short as possible while remaining clear and informative.
Use bullet points or numbered lists to organize the information if it helps to clarify the meaning.
Focus on the key facts, events, and conclusions.
Avoid including minor details or examples unless they are crucial for understanding the main ideas.
""")
.build();
}
Here, we define a system message that instructs the LLM how to behave. Modify the prompt as needed. Save the chatClient
to a field for later use.
Read the uploaded file with Apache Tika
We have a Vaadin Upload component hooked up to a FileBuffer
for receiving the file.
var fb = new FileBuffer();
var upload = new Upload();
upload.setReceiver(fb);
upload.addSucceededListener(e -> {
var tmpFile = fb.getFileData().getFile();
parseFile(tmpFile);
tmpFile.delete();
});
Once a file is uploaded, we parse it with Apache Tika before summarizing it. The AutoDetectParser
automatically detects the file type.
private void parseFile(File tmpFile) {
var parser = new AutoDetectParser();
var handler = new BodyContentHandler();
try (InputStream stream = TikaInputStream.get(tmpFile)) {
parser.parse(stream, handler, new Metadata());
summarizeFile(handler.toString());
} catch (WriteLimitReachedException ex) {
Notification.show(ex.getMessage());
summarizeFile(handler.toString());
} catch (Exception ex) {
output.add(new H2("Parsing Data failed: " + ex.getMessage()));
throw new RuntimeException(ex);
}
Summarize the document with OpenAI
Once we have the contents of the file, all we need to do is call the chatClient
we configured earlier with the content and display the returned document. Here, I created a Markdown component to conveniently render the returned markdown.
private void summarizeFile(String content) {
var markdown = chatClient.prompt()
.user("Text to summarize: " + content)
.call()
.content();
output.removeAll();
output.add(new Markdown(markdown));
}
Conclusion
In this tutorial, you learned how to summarize a document in Java using Spring AI, Vaadin Flow, and Apache Tika. You can find the full source code for the example in GitHub.