As AI models have become better at handling multimodal input like images, new opportunities have opened up for us when building business applications. In this blog post, I'll examine how we can use Open AI and Spring AI to extract information from an image into a Java record that we can use in our application.
The blog post will only include relevant code highlights. You can find a link to the complete source code in my GitHub at the end of the tutorial.
Configuring Spring AI in our project
In order to extract image data, we need to use an LLM that supports image data. In this example, I'll be using Open AI's gpt-4o model together with Spring AI.
Add the Spring AI dependency to your Spring Boot project:
<repositories>
<!-- Other repositories -->
<repository>
<id>Spring Milestones</id>
<url>https://repo.spring.io/milestone</url>
</repository>
</repositories>
<dependencies>
<!-- Other dependencies -->
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-openai-spring-boot-starter</artifactId>
</dependency>
</dependencies>
Then, configure the model and API key in application.properties
:
spring.ai.openai.api-key=${OPENAI_API_KEY}
spring.ai.openai.chat.options.model=gpt-4o
Case 1: Extracting data from a receipt and displaying it in a Grid
The first thing we need to do is ensure Spring Boot is configured to accept large enough files to accommodate images. Add the following toapplication.properties
:
spring.servlet.multipart.max-file-size=10MB # Configure as needed
First, let's define the data structure we want for the output as Java Records. You can also use POJOs if you prefer.
public record LineItem(String name, int quantity, BigDecimal price) {}
public record Receipt(String merchant, BigDecimal total, List<LineItem> lineItems) {}
Next, we define an upload component and hook it up to call Spring AI's ChatClient
when an image is uploaded:
public ReceiptView(ChatClient.Builder builder) {
var client = builder.build();
var buffer = new MemoryBuffer();
var upload = new Upload(buffer);
upload.setAcceptedFileTypes("image/*");
upload.addSucceededListener(e -> {
var receipt = client.prompt()
.user(userMessage -> userMessage
.text("""
Please read the attached receipt and return the value in provided format
""")
.media(
MimeTypeUtils.parseMimeType(e.getMIMEType()),
new InputStreamResource(buffer.getInputStream())
)
)
.call()
.entity(Receipt.class);
showReceipt(receipt);
upload.clearFileList();
});
add(upload);
}
The prompt simply instructs the AI to read the image and return the contents in the specified value. By calling entity(Receipt.class)
, Spring AI appends a JSON spec to the prompt that defines our expected format. Spring AI then converts the JSON to a Java object for us automatically.
Finally, we display the information that we extracted from the receipt:
private void showReceipt(Receipt receipt) {
var items = new Grid<>(LineItem.class);
items.setItems(receipt.lineItems());
add(
new H3("Receipt details"),
new Paragraph("Merchant: " + receipt.merchant()),
new Paragraph("Total: " + receipt.total()),
items
);
}
Case 2: Importing a hand-written signup sheet
The second use case we'll look at is importing a hand-written sign-up sheet into an editable Grid. The technical implementation is similar to the first use case. The reason I wanted to include this use case is to show the flexibility of using AI for data extraction and hopefully spark some more ideas for your apps.
Again, we define our desired data structure as Java records. You can use plain Java objects as well.
public record Participant(String name, String company, String email, String tshirtSize) { }
public record SignUpSheet(List<Participant> participants) { }
Then, handle the uploaded file and call Spring AI to extract the data:
public SignupView(ChatClient.Builder builder) {
var client = builder.build();
// Set up upload
var buffer = new MemoryBuffer();
var upload = new Upload(buffer);
upload.setAcceptedFileTypes("image/*");
upload.setMaxFileSize(10 * 1024 * 1024);
upload.addSucceededListener(e -> {
var signUpSheet = client.prompt()
.user(userMessage -> userMessage
.text("""
Please read the attached event signup sheet image and extract all participants.
""")
.media(
MimeTypeUtils.parseMimeType(e.getMIMEType()),
new InputStreamResource(buffer.getInputStream())
)
)
.call()
.entity(SignUpSheet.class);
showParticipants(signUpSheet);
upload.clearFileList();
});
Text instructions = new Text("Upload an image of the event signup sheet. The AI will extract participant data and display it here.");
add(instructions, upload, createGridLayout());
}
I decided to use the Grid Pro component to display the results, as it allows me to easily edit the rows if the AI makes any mistakes. If you don't want to use a commercial component, you could set up a Grid using inline editing.
private Div createGridLayout() {
Div gridContainer = new Div();
gridContainer.setWidthFull();
grid.setSizeFull();
grid.setItems(participants);
grid.setAllRowsVisible(true);
grid.setEditOnClick(true);
grid.addEditColumn(Participant::name)
.text((participant, newValue) -> updateParticipant(participant, newValue, participant.company(), participant.email(), participant.tshirtSize()))
.setHeader("Name")
.setSortable(true)
.setAutoWidth(true);
grid.addEditColumn(Participant::company)
.text((participant, newValue) -> updateParticipant(participant, participant.name(), newValue, participant.email(), participant.tshirtSize()))
.setHeader("Company")
.setSortable(true)
.setAutoWidth(true);
grid.addEditColumn(Participant::email)
.text((participant, newValue) -> updateParticipant(participant, participant.name(), participant.company(), newValue, participant.tshirtSize()))
.setHeader("Email")
.setSortable(true)
.setAutoWidth(true);
grid.addEditColumn(Participant::tshirtSize)
.text((participant, newValue) -> updateParticipant(participant, participant.name(), participant.company(), participant.email(), newValue))
.setHeader("T-Shirt Size")
.setSortable(true)
.setAutoWidth(true);
gridContainer.add(grid);
return gridContainer;
}
Finally, we add methods for displaying and updating the records:
private void showParticipants(SignUpSheet signUpSheet) {
if (signUpSheet != null && signUpSheet.participants() != null) {
participants = new ArrayList<>(signUpSheet.participants());
grid.setItems(participants);
}
}
private void updateParticipant(Participant oldParticipant, String name, String company, String email, String tshirtSize) {
// Create a new Participant with updated fields
Participant updated = new Participant(name, company, email, tshirtSize);
// Replace the old participant in the list
int index = participants.indexOf(oldParticipant);
if (index >= 0) {
participants.set(index, updated);
grid.setItems(participants);
}
}
Conclusion
As you can see, Spring AI makes it easy to extract data from images into a Java object that we can use in our application. As always, AI can (and will) make mistakes, so be sure to verify the data before using it.
You can find the complete source code for this example in my GitHub repo.