使用百度语音识别和语音合成API搭配Flask框架做一个简单的页面

꧁是小阿狸꧂

已于 2023-10-26 10:01:17 修改

阅读量652

点赞数 4

文章标签：语音识别人工智能

于 2023-10-25 22:47:09 首次发布

本文链接：https://blog.csdn.net/weixin_63816885/article/details/134040849

版权

由于作业需要就做了一个小玩意，原本想自己去训练模型的，但奈何功底不够深，以及其他各种原因，所以只能调接口来实现。尽管是调接口，其他的也是需要一些功底的。

一、准备工作

首先，你需要去百度AI开放平台注册账号并实名验证，之后在百度智能云可以申请创建应用，然后就可以获得自己的ID，密钥和api key，这都是在之后需要认证身份用的，其次就是需要了解一些Flask框架的知识，最后，需要导入如下模块：

from flask import Flask, render_template, request, redirect, url_for, send_file
import sounddevice as sd
import soundfile as sf
from aip import AipSpeech

其次，如果你使用的是Pycharm 社区版的话也是可以运行Flask框架的，专业版只是自动帮你创建了几个目录，社区版手动创建也是可以的（VScode以及其他的我没有试过）。

在你的项目下面手动创建static目录和templates目录，然后创建一个app.py的python文件（当然叫其他名字也行，他们三个是同级）。你的HTML文件放在templates目录里，CSS文件放在static文件里。

二、开始

1、准备HTML文件

我将CSS代码放在了HTML文件里，如果你单独写出来了也可以用link关键字来引用CSS文件，具体是在head里面导入，使用如下语句：

<link rel="stylesheet" type="text/css" href="文件路径">

我准备了三个HTML文件，一个是主界面，一个是语音识别成功后做了一个，一个是语音合成成功后做了一个（没有详细去做失败的页面，你要想做的话也可以）。

1.1 主页面

我简单的做了一个界面，能突出主要功能就行（你也可以在此基础上更加细腻完善）。

<!DOCTYPE html>
<html>
<head>
    <meta charset="UTF-8">
    <title>语音识别和语音合成</title>
    <style>
        body {
            font-family: Arial, sans-serif;
            text-align: center;
            background-color: white;
            margin: 0;
            padding: 0;
        }
        h1 {
            color: black;
        }
        form {
            margin: 20px;
            padding: 20px;
            background-color: white;
            border-radius: 8px;
            box-shadow: 0 2px 4px rgba(0, 0, 0, 0.1);
        }
        label {
            font-weight: bold;
        }
        input[type="number"], textarea {
            width: 100%;
            padding: 10px;
            margin: 5px 0;
            border: 1px solid rgb(128, 128, 128);
            border-radius: 3px;
        }
        button {
            background-color: blue;
            color: white;
            border: none;
            padding: 10px 20px;
            border-radius: 10px;
            cursor: pointer;
        }
    </style>
</head>
<body>
    <h1>语音识别和语音合成</h1>

    <form method="POST" action="/record">
        <label for="duration">录制声音时长（秒）：</label>
        <input type="number" name="duration" min="1" required>
        <button type="submit">录制声音</button>
    </form>

    <form method="POST" action="/transcribe">
        <button type="submit">语音识别</button>
    </form>

    <form method="POST" action="/synthesize">
        <label for="text">输入文本：</label>
        <textarea name="text" rows="4" cols="50" required></textarea>
        <button type="submit">语音合成</button>
    </form>


</body>
</html>

然后样子是这样子的（我是在Pychram里面打开的）：

1.2 语音识别成功的页面

<!DOCTYPE html>
<html>
<head>
    <meta charset="UTF-8">
    <title>语音识别结果</title>
</head>
<body>
    <h1>语音识别成功</h1>
    <p>识别结果: {{ result }}</p>
</body>
</html>

使用模板语法将识别出来的结果显示在页面上：

1.3 语音合成成功的页面

<!DOCTYPE html>
<html>
<head>
    <meta charset="UTF-8">
    <title>语音合成成功</title>
</head>
<body>
    <h1>语音合成成功</h1>

    <p>您可以在下面播放您的录音：</p>

    <audio controls>
        <source src="{{ audio_url }}" type="audio/wav">
        Your browser does not support the audio element.
    </audio>
</body>
</html>

也是使用模板语法将合成的语音在网页上可以播放，可以支持语速和音量的修改：

2、准备app.py文件

这里就需要用到你从百度智能云哪里拿到的ID，密钥以及api key，当然还有你的Flask的知识。这里我就不拆开说了，就总的说一下了。

from flask import Flask, render_template, request, redirect, url_for, send_file
import sounddevice as sd
import soundfile as sf
from aip import AipSpeech

app = Flask(__name__)

# 在百度AI开发者平台创建应用程序后，将以下信息替换为您的应用程序信息
APP_ID = ''
API_KEY = ''
SECRET_KEY = ''

# 初始化AipSpeech客户端
client = AipSpeech(APP_ID, API_KEY, SECRET_KEY)

# 音频文件路径
audio_file = "output.wav"
@app.route('/')
def index():
    return render_template('index.html')

@app.route('/record', methods=['POST'])
def record_sound():
    filename = "input.wav"
    duration = int(request.form['duration'])
    sample_rate = 16000
    channels = 1

    recording = sd.rec(int(duration * sample_rate), samplerate=sample_rate, channels=channels)
    sd.wait()

    sf.write(filename, recording, sample_rate)

    return redirect(url_for('index'))

@app.route('/transcribe', methods=['POST'])
def transcribe_speech():
    filename = "input.wav"
    with open(filename, 'rb') as audio_file:
        audio_data = audio_file.read()

    result = client.asr(audio_data, 'wav', 16000, {
        'dev_pid': 1536,
    })

    if result['err_no'] == 0:
        text = result['result'][0]
        return render_template('transcribe_result.html', result=text)
    else:
        return "语音识别失败"

@app.route('/synthesize', methods=['POST'])
def text_to_speech():
    text = request.form['text']
    output_file = "output.wav"

    options = {
        'spd': 4,
        'pit': 5,
        'vol': 15,
        'per': 3,
    }

    result = client.synthesis(text, 'zh', 1, options)

    if not isinstance(result, dict):
        with open(output_file, 'wb') as f:
            f.write(result)
        return render_template('synthesis_success.html', audio_url=url_for('play_audio'))
    else:
        return "语音合成失败"

@app.route('/play')
def play_audio():
    return send_file(audio_file, mimetype="audio/wav")

if __name__ == '__main__':
    app.run(debug=True)

我就简单说一下上述代码，route后面就是这个网页的路由（URL，只有一个 / 的是根路由，最最开始的界面），也就是常说的网址（当然网址是全部部分，在打开的网页的网址部分就可以看到）。options里面的参数都是语音合成相关的一些参数，分别是语速、音调、音量、音别（男声、女声什么的，我选的是有感情的男声）。

三、结语

这只是我做的一个小小的玩意，写出来分享给大家，因为时间关系，我没有去连接数据库去做一个真正的前后端完整的玩意，那样子的话就需要再有数据库以及数据库与python语言的连接的知识了，以后我会慢慢探索和了解的ヾ(◍°∇°◍)ﾉﾞ

꧁是小阿狸꧂

关注

4
点赞
踩
4

收藏

觉得还不错? 一键收藏
打赏
2
评论
使用百度语音识别和语音合成API搭配Flask框架做一个简单的页面

这只是我做的一个小小的玩意，写出来分享给大家，因为时间关系，我没有去连接数据库去做一个真正的前后端完整的玩意，那样子的话就需要再有数据库以及数据库与python语言的连接的知识了，以后我会慢慢探索和了解的ヾ(◍°∇°◍)ﾉﾞ。
复制链接

扫一扫