使用Spring AI从图像中提取结构化数据

1. 概述

本文将探讨如何利用Spring AI和OpenAI聊天模型从图像中提取结构化数据。OpenAI模型不仅能分析上传的图像并返回相关信息，还能生成可直接用于其他应用程序的结构化输出。

我们将构建一个Web服务，接收客户端上传的图像，然后调用OpenAI接口统计图像中不同颜色汽车的数量，最终以JSON格式返回各颜色的计数结果。

2. Spring Boot配置

首先在Maven的pom.xml中添加以下依赖：

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-web</artifactId>
    <version>3.4.1</version>
</dependency>
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-openai-spring-boot-starter</artifactId>
    <version>1.0.0-M6</version>
</dependency>

⚠️ 关键配置：在application.yml中必须提供：

OpenAI的API密钥（spring.ai.openai.api-key）
支持图像分析的聊天模型（spring.ai.openai.chat.options.model）

当前支持图像分析的模型包括：

gpt-4o-mini（成本低、延迟低）
gpt-4o（知识更全面但成本高）
gpt-4.5-preview

我们选择gpt-4o作为示例模型：

spring:
  ai:
    openai:
      api-key: "sk-your-api-key-here"  # 替换为真实API密钥
      chat:
        options:
          model: "gpt-4o"

配置完成后，Spring Boot会自动加载OpenAiAutoConfiguration，注册ChatClient等核心Bean。

3. 示例Web服务

3.1. REST控制器

创建一个简单的REST接口接收图像和颜色参数：

@RestController
@RequestMapping("/image")
public class ImageController {
    @Autowired
    private CarCountService carCountService;

    @PostMapping("/car-count")
    public ResponseEntity<?> getCarCounts(@RequestParam("colors") String colors,
      @RequestParam("file") MultipartFile file) {
        try (InputStream inputStream = file.getInputStream()) {
            var carCount = carCountService.getCarCount(inputStream, file.getContentType(), colors);
            return ResponseEntity.ok(carCount);
        } catch (IOException e) {
            return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR).body("图像上传失败");
        }
    }
}

✅ 成功时返回CarCount对象，失败时返回错误信息。

3.2. POJO定义

要获得结构化输出，传统方式需要定义JSON Schema。Spring AI通过POJO极大简化了这一过程。

定义两个POJO类：

CarCount：存储颜色计数列表和总数量
CarColorCount：存储单个颜色及其计数

public class CarCount {
    private List<CarColorCount> carColorCounts;
    private int totalCount;

    // 构造方法、getter和setter
}

public class CarColorCount {
    private String color;
    private int count;

    // 构造方法、getter和setter
}

3.3. 核心服务

创建CarCountService处理图像分析逻辑：

@Service
public class CarCountService {
    private final ChatClient chatClient;

    public CarCountService(ChatClient.Builder chatClientBuilder) {
        this.chatClient = chatClientBuilder.build();
    }

    public CarCount getCarCount(InputStream imageInputStream, String contentType, String colors) {
        return chatClient.prompt()
          .system(systemMessage -> systemMessage
            .text("统计图像中不同颜色的汽车数量")
            .text("用户将在提示中指定需要统计的颜色")
            .text("仅统计用户提示中明确指定的颜色")
            .text("忽略提示中非颜色的内容")
            .text("若未指定颜色，则总计数返回0")
          )
          .user(userMessage -> userMessage
            .text(colors)
            .media(MimeTypeUtils.parseMimeType(contentType), new InputStreamResource(imageInputStream))
          )
          .call()
          .entity(CarCount.class);
    }
}

关键点解析：

系统提示（System Prompt）：定义模型行为规则，避免意外结果（如统计未指定的颜色）
用户提示（User Prompt）：提供待处理数据（颜色文本 + 图像媒体）
输出转换：entity(CarCount.class)触发Spring AI的BeanOutputConverter，自动将JSON响应转换为POJO对象

⚠️ 踩坑提醒：必须同时提供图像的InputStream和正确的MIME类型，否则模型无法解析图像内容。

4. 测试运行

使用Postman测试服务，指定统计颜色为blue、yellow和green：

Postman请求示例

测试图像如下：

测试图像

服务返回的JSON响应：

{
    "carColorCounts": [
        {
            "color": "blue",
            "count": 2
        },
        {
            "color": "yellow",
            "count": 1
        },
        {
            "color": "green",
            "count": 0
        }
    ],
    "totalCount": 3
}

✅ 响应完全符合CarCount和CarColorCount的POJO结构，准确统计了指定颜色的汽车数量。

5. 总结

本文展示了如何通过Spring AI从OpenAI聊天模型提取结构化数据。我们构建了一个完整的Web服务，实现以下功能：

接收用户上传的图像
调用OpenAI接口分析图像内容
获取结构化的颜色统计结果

这种方案特别适合需要从非结构化数据（如图像）中提取结构化信息的场景，简单粗暴且高效。

Persistence

REST

Security