语音转文字, 文字转语音 创作不易,觉得好,关注博主哦!


下面我将为你提供一个使用 Spring Boot 框架实现文字转语音和语音转文字功能,并对外提供接口的详细教程。

步骤 1: 创建 Spring Boot 项目

使用 Spring Initializr(https://start.spring.io/)创建一个新的 Spring Boot 项目。在添加依赖时选择 "Spring Web" 和 "Thymeleaf"。

步骤 2: 依赖配置

在项目的 pom.xml 文件中,添加百度语音合成和语音识别的依赖:

??? <!-- Spring Web -->
??? <dependency>
??????? <groupId>org.springframework.boot</groupId>
??????? <artifactId>spring-boot-starter-web</artifactId>
??? </dependency>

??? <!-- Thymeleaf for HTML templates -->
??? <dependency>
??????? <groupId>org.springframework.boot</groupId>
??????? <artifactId>spring-boot-starter-thymeleaf</artifactId>
??? </dependency>

??? <!-- Baidu AIP SDK -->
??? <dependency>
??????? <groupId>com.baidu.aip</groupId>
??????? <artifactId>java-sdk</artifactId>
??????? <version>4.20.0</version>
??? </dependency>

步骤 3: 配置 Baidu AIP

src/main/resources 目录下创建 application.properties 文件,添加百度语音合成和语音识别的配置:


步骤 4: 创建 Controller 类

src/main/java/com/example/demo 目录下创建一个名为 BaiduAipController.java 的类:

import com.baidu.aip.speech.AipSpeech;
import com.baidu.aip.speech.TtsResponse;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.http.MediaType;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;

import java.util.HashMap;

public class BaiduAipController {

??? @Value("${baidu.app.id}")
??? private String appId;

??? @Value("${baidu.api.key}")
??? private String apiKey;

??? @Value("${baidu.secret.key}")
??? private String secretKey;

??? private AipSpeech initAipSpeechClient() {
??????? return new AipSpeech(appId, apiKey, secretKey);
??? }

??? @PostMapping(value = "/text-to-speech", consumes = MediaType.APPLICATION_JSON_VALUE, produces = MediaType.APPLICATION_OCTET_STREAM_VALUE)
??? public byte[] textToSpeech(@RequestBody TextToSpeechRequest request) {
??????? AipSpeech client = initAipSpeechClient();

??????? HashMap<String, Object> options = new HashMap<>();
??????? options.put("spd", request.getSpeed());
??????? options.put("pit", request.getPitch());
??????? options.put("vol", request.getVolume());
??????? options.put("per", request.getPersonality());

??????? TtsResponse response = client.synthesis(request.getText(), "zh", 1, options);

??????? if (response.isSuccess()) {
??????????? return response.getData();
??????? } else {
??????????? throw new RuntimeException("Failed to convert text to speech. Error: " + response.getErrorCode() + ", " + response.getErrorMsg());
??????? }
??? }

??? @PostMapping(value = "/speech-to-text", consumes = MediaType.APPLICATION_OCTET_STREAM_VALUE)
??? public String speechToText(@RequestBody byte[] audioData) {
??????? AipSpeech client = initAipSpeechClient();

??????? HashMap<String, Object> options = new HashMap<>();
??????? options.put("dev_pid", 1536); // 普通话(支持简单的英文识别)

??????? String result = client.asr(audioData, "wav", 16000, options).toString();

??????? if (result.contains("err_msg")) {
??????????? throw new RuntimeException("Failed to convert speech to text. Error: " + result);
??????? } else {
??????????? return result;
??????? }
??? }

步骤 5: 创建请求模型类

src/main/java/com/example/demo 目录下创建一个名为 TextToSpeechRequest.java 的类:

public class TextToSpeechRequest {

??? private String text;
??? private String speed;
??? private String pitch;
??? private String volume;
??? private String personality;

??? // getters and setters

步骤 6: 创建前端页面

src/main/resources/templates 目录下创建一个名为 index.html 的HTML文件:

<!DOCTYPE html>
<html lang="en" xmlns:th="http://www.thymeleaf.org">
??? <meta charset="UTF-8">
??? <meta name="viewport" content="width=device-width, initial-scale=1.0">
??? <title>Baidu AIP Demo</title>

<h2>Text to Speech</h2>
<form action="/baidu-aip/text-to-speech" method="post" enctype="application/json">
??? <label for="text">Text:</label>
??? <input type="text" id="text" name="text" required>
??? <br>
??? <label for="speed">Speed (1-15):</label>
??? <input type="number" id="speed" name="speed" min="1" max="15" value="5">
??? <br>
??? <label for="pitch">Pitch (1-15):</label>
??? <input type="number" id="pitch" name="pitch" min="1" max="15" value="5">
??? <br>
??? <label for="volume">Volume (1-15):</label>
??? <input type="number" id="volume" name="volume" min="1" max="15" value="5">
??? <br>
??? <label for="personality">Personality (0 for female, 1 for male):</label>
??? <input type="number" id="personality" name="personality" min="0" max="1" value="0">
??? <br>
??? <button type="submit">Convert to Speech</button>


<h2>Speech to Text</h2>
<form action="/baidu-aip/speech-to-text" method="post" enctype="application/octet-stream">
??? <label for="audio">Upload Audio File:</label>
??? <input type="file" id="audio" name="audio" accept=".wav" required>
??? <br>
??? <button type="submit">Convert to Text</button>


步骤 7: 运行应用


mvn spring-boot:run

步骤 8: 访问应用
