在Spring AI和Ollama中使用Hugging Face模型

1. 概述

人工智能正在改变我们构建Web应用的方式。Hugging Face是一个热门平台，提供了大量开源和预训练的大语言模型（LLM）。

我们可以使用开源工具Ollama在本地机器上运行这些LLM。它支持运行来自Hugging Face的GGUF格式模型。

本教程将探索如何在Spring AI和Ollama中使用Hugging Face模型。我们将使用聊天补全模型构建一个简单聊天机器人，并使用嵌入模型实现语义搜索。

2. 依赖配置

首先在项目的pom.xml中添加必要依赖：

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-ollama-spring-boot-starter</artifactId>
    <version>1.0.0-M6</version>
</dependency>

这个Ollama启动器依赖帮助我们建立与Ollama服务的连接。我们将用它来拉取和运行聊天补全模型与嵌入模型。

由于当前版本1.0.0-M5是里程碑版本，还需要在pom.xml中添加Spring Milestones仓库：

<repositories>
    <repository>
        <id>spring-milestones</id>
        <name>Spring Milestones</name>
        <url>https://repo.spring.io/milestone</url>
        <snapshots>
            <enabled>false</enabled>
        </snapshots>
    </repository>
</repositories>

这个仓库专门发布里程碑版本，而非标准的Maven中央仓库。

3. 使用Testcontainers配置Ollama

为了简化本地开发和测试，我们将使用Testcontainers来设置Ollama服务。

3.1 测试依赖

首先在pom.xml中添加必要的测试依赖：

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-spring-boot-testcontainers</artifactId>
    <scope>test</scope>
</dependency>
<dependency>
    <groupId>org.testcontainers</groupId>
    <artifactId>ollama</artifactId>
    <scope>test</scope>
</dependency>

我们引入了Spring Boot的Spring AI Testcontainers依赖和Testcontainers的Ollama模块。

3.2 定义Testcontainers Bean

接下来创建一个@TestConfiguration类来定义Testcontainers的Bean：

@TestConfiguration(proxyBeanMethods = false)
class TestcontainersConfiguration {
    @Bean
    public OllamaContainer ollamaContainer() {
        return new OllamaContainer("ollama/ollama:0.5.4");
    }

    @Bean
    public DynamicPropertyRegistrar dynamicPropertyRegistrar(OllamaContainer ollamaContainer) {
        return registry -> {
            registry.add("spring.ai.ollama.base-url", ollamaContainer::getEndpoint);
        };
    }
}

创建OllamaContainer Bean时指定了Ollama镜像的最新稳定版本。

**然后定义DynamicPropertyRegistrar Bean来配置Ollama服务的base-url**，这样应用就能连接到启动的Ollama容器。

3.3 在开发中使用Testcontainers

虽然Testcontainers主要用于集成测试，但我们也可以在本地开发中使用。

为此，在src/test/java目录下创建一个单独的主类：

public class TestApplication {
    public static void main(String[] args) {
        SpringApplication.from(Application::main)
          .with(TestcontainersConfiguration.class)
          .run(args);
    }
}

我们创建TestApplication类，在其main()方法中启动主Application类并附加TestcontainersConfiguration。

这种设置让我们能运行Spring Boot应用并连接到通过Testcontainers启动的Ollama服务。

4. 使用聊天补全模型

现在本地Ollama容器已就绪，让我们使用聊天补全模型构建一个简单聊天机器人。

4.1 配置聊天模型和聊天机器人Bean

首先在application.yaml中配置聊天补全模型：

spring:
  ai:
    ollama:
      init:
        pull-model-strategy: when_missing
      chat:
        options:
          model: hf.co/microsoft/Phi-3-mini-4k-instruct-gguf

配置Hugging Face模型时使用hf.co/{username}/{repository}格式。这里我们指定了Microsoft提供的Phi-3-mini-4k-instruct模型的GGUF版本。

✅ 踩坑提醒：实现时不一定要用这个模型，**建议在本地搭建代码库并尝试更多聊天补全模型**。

此外，设置pull-model-strategy为when_missing，确保Spring AI在模型本地不存在时自动拉取。

配置有效模型后，Spring AI会自动创建ChatModel类型的Bean，让我们能与聊天补全模型交互。

用它来定义聊天机器人需要的额外Bean：

@Configuration
class ChatbotConfiguration {
    @Bean
    public ChatMemory chatMemory() {
        return new InMemoryChatMemory();
    }

    @Bean
    public ChatClient chatClient(ChatModel chatModel, ChatMemory chatMemory) {
        return ChatClient
          .builder(chatModel)
          .defaultAdvisors(new MessageChatMemoryAdvisor(chatMemory))
          .build();
    }
}

首先定义ChatMemory Bean并使用InMemoryChatMemory实现。它通过在内存中存储聊天历史来维护对话上下文。

接着使用ChatMemory和ChatModel Bean，创建ChatClient类型的Bean，这是我们与聊天补全模型交互的主要入口。

4.2 实现聊天机器人

配置就绪后，创建ChatbotService类。我们将注入之前定义的ChatClient Bean来与模型交互。

但先定义两个简单的record表示聊天请求和响应：

record ChatRequest(@Nullable UUID chatId, String question) {}

record ChatResponse(UUID chatId, String answer) {}

ChatRequest包含用户的question和可选的chatId（用于标识持续对话）。

类似地，ChatResponse包含chatId和聊天机器人的answer。

现在实现核心功能：

public ChatResponse chat(ChatRequest chatRequest) {
    UUID chatId = Optional
      .ofNullable(chatRequest.chatId())
      .orElse(UUID.randomUUID());
    String answer = chatClient
      .prompt()
      .user(chatRequest.question())
      .advisors(advisorSpec ->
          advisorSpec
            .param("chat_memory_conversation_id", chatId))
      .call()
      .content();
    return new ChatResponse(chatId, answer);
}

如果传入请求没有chatId，我们生成一个新的。这允许用户开始新对话或继续现有对话。

将用户的question传递给chatClient Bean，并设置chat_memory_conversation_id参数为解析出的chatId以维护对话历史。

最后返回聊天机器人的answer和chatId。

4.3 与聊天机器人交互

服务层实现完毕，在其上暴露REST API：

@PostMapping("/chat")
public ResponseEntity<ChatResponse> chat(@RequestBody ChatRequest chatRequest) {
    ChatResponse chatResponse = chatbotService.chat(chatRequest);
    return ResponseEntity.ok(chatResponse);
}

我们将通过这个API接口与聊天机器人交互。

使用HTTPie CLI开始新对话：

http POST :8080/chat question="Who wanted to kill Harry Potter?"

向聊天机器人发送简单问题，查看响应：

{
    "chatId": "7b8a36c7-2126-4b80-ac8b-f9eedebff28a",
    "answer": "Lord Voldemort, also known as Tom Riddle, wanted to kill Harry Potter because of a prophecy that foretold a boy born at the end of July would have the power to defeat him."
}

响应包含唯一的chatId和聊天机器人对question的answer。

使用上述响应中的chatId发送后续问题继续对话：

http POST :8080/chat chatId="7b8a36c7-2126-4b80-ac8b-f9eedebff28a" question="Who should he have gone after instead?"

检查聊天机器人是否能维护对话上下文并提供相关响应：

{
    "chatId": "7b8a36c7-2126-4b80-ac8b-f9eedebff28a",
    "answer": "Based on the prophecy's criteria, Voldemort could have targeted Neville Longbottom instead, as he was also born at the end of July to parents who had defied Voldemort three times."
}

可见聊天机器人确实维护了对话上下文，它引用了前一条消息中讨论的预言。

chatId保持不变，表明后续answer是同一对话的延续。

5. 使用嵌入模型

从聊天补全模型转向，现在使用嵌入模型在小规模名言数据集上实现语义搜索。

我们将从外部API获取名言，存储到内存向量存储中，并执行语义搜索。

5.1 从外部API获取名言记录

演示中我们将使用QuoteSlate API获取名言。

创建QuoteFetcher工具类：

class QuoteFetcher {
    private static final String BASE_URL = "https://quoteslate.vercel.app";
    private static final String API_PATH = "/api/quotes/random";
    private static final int DEFAULT_COUNT = 50;

    public static List<Quote> fetch() {
        return RestClient
          .create(BASE_URL)
          .get()
          .uri(uriBuilder ->
              uriBuilder
                .path(API_PATH)
                .queryParam("count", DEFAULT_COUNT)
                .build())
          .retrieve()
          .body(new ParameterizedTypeReference<>() {});
    }
}

record Quote(String quote, String author) {}

使用RestClient调用QuoteSlate API，默认数量为50，并用ParameterizedTypeReference将API响应反序列化为Quoterecord列表。

5.2 配置和填充内存向量存储

在application.yaml中配置嵌入模型：

spring:
  ai:
    ollama:
      embedding:
        options:
          model: hf.co/nomic-ai/nomic-embed-text-v1.5-GGUF

我们使用nomic-ai提供的nomic-embed-text-v1.5模型的GGUF版本。同样欢迎尝试用其他嵌入模型实现。

指定有效模型后，Spring AI会自动为我们创建EmbeddingModel类型的Bean。

用它创建向量存储Bean：

@Bean
public VectorStore vectorStore(EmbeddingModel embeddingModel) {
    return SimpleVectorStore
      .builder(embeddingModel)
      .build();
}

演示中我们创建SimpleVectorStore类的Bean。它是使用java.util.Map类模拟向量存储的内存实现。

为了在应用启动时用名言填充向量存储，创建实现ApplicationRunner接口的VectorStoreInitializer类：

@Component
class VectorStoreInitializer implements ApplicationRunner {
    private final VectorStore vectorStore;

    // 标准构造器

    @Override
    public void run(ApplicationArguments args) {
        List<Document> documents = QuoteFetcher
          .fetch()
          .stream()
          .map(quote -> {
              Map<String, Object> metadata = Map.of("author", quote.author());
              return new Document(quote.quote(), metadata);
          })
          .toList();
        vectorStore.add(documents);
    }
}

在VectorStoreInitializer中自动装配VectorStore实例。

在run()方法中，使用QuoteFetcher工具类获取Quote记录列表。然后将每个quote映射为Document，并将author字段配置为metadata。

最后将所有documents存储到向量存储中。调用add()方法时，Spring AI会自动将纯文本内容转换为向量表示再存储到向量存储中，无需显式使用EmbeddingModel Bean转换。

5.3 测试语义搜索

向量存储填充完毕，验证语义搜索功能：

private static final int MAX_RESULTS = 3;

@ParameterizedTest
@ValueSource(strings = {"Motivation", "Happiness"})
void whenSearchingQuotesByTheme_thenRelevantQuotesReturned(String theme) {
    SearchRequest searchRequest = SearchRequest
      .builder()
      .query(theme)
      .topK(MAX_RESULTS)
      .build();
    List<Document> documents = vectorStore.similaritySearch(searchRequest);

    assertThat(documents)
      .hasSizeBetween(1, MAX_RESULTS)
      .allSatisfy(document -> {
          String title = String.valueOf(document.getMetadata().get("author"));
          assertThat(title)
            .isNotBlank();
      });
}

这里使用@ValueSource向测试方法传入常见名言主题。然后创建SearchRequest对象，以主题为查询，MAX_RESULTS为期望结果数。

接着用searchRequest调用vectorStore Bean的similaritySearch()方法。与VectorStore的add()方法类似，Spring AI会先将查询转换为向量表示再查询向量存储。

返回的文档将包含与给定主题语义相关的名言，即使它们不包含确切关键词。

6. 总结

本文探讨了在Spring AI中使用Hugging Face模型的方法。

使用Testcontainers设置了Ollama服务，创建了本地测试环境。

首先使用聊天补全模型构建了简单聊天机器人，然后使用嵌入模型实现了语义搜索。

Persistence

REST

Security