DeepSeek语音助手

MA_Zhipeng

已于 2025-03-17 12:29:24 修改

阅读量301

点赞数 7

文章标签： xcode 语言模型 python

于 2025-03-17 12:11:21 首次发布

本文链接：https://blog.csdn.net/weixin_58278208/article/details/146311393

版权

python语言，调用DeepSeekAPI问答，科大讯飞语音听写与声音合成

整体结构：

#__________________________________________
    #|1|按下Q键开始录音------已完成
    #|2|等待录音完成------已完成
    #|3|录音文件格式转换------已完成
    #|4|问题语音识别---------已完成
    #|5|提取问题文本---------已完成
    #|6|deepseek回答-------已完成
    #|7|提取答案文本---------已完成
    #|8|声音合成------------已完成
    #|9|回复声音播放---------已完成
    #|10|--->1
    #——————————————————————————————————————————

演示视频：

DeepSeek语音助手

|1|keyboard

keyboard.wait(" ")

|2|使用麦克风录制MP3文件

def record_audio(duration, filename):
    # 录制参数
    sample_rate = 44100  # 采样率
    channels = 1         # 声道数

    # 录制音频
    print("开始录制...")
    recording = sd.rec(int(duration * sample_rate), samplerate=sample_rate, channels=channels)
    sd.wait()  # 等待录制完成
    print("录制结束")

    # 保存为临时WAV文件
    temp_wav_file = "temp.wav"
    write(temp_wav_file, sample_rate, recording)

    # 将WAV文件转换为MP3
    audio = AudioSegment.from_wav(temp_wav_file)
    os.remove(temp_wav_file)
    audio.export(filename, format="mp3")

    print(f"音频已保存为 {filename}")

|3|将MP3文件转化为pcm文件

import subprocess
from pydub import AudioSegment

def convert_audio(input_file, output_file):
    # FFmpeg 命令
    command = [
        'ffmpeg',
        '-y',  # 无需询问，直接覆盖输出文件
        '-i', input_file,  # 输入文件
        '-acodec', 'pcm_s16le',  # 音频编码器
        '-f', 's16le',  # 文件格式
        '-ac', '1',  # 单声道
        '-ar', '16000',  # 采样率 16k
        output_file  # 输出文件
    ]

    # 执行命令
    subprocess.run(command, check=True)

|4|语音识别--->科大讯飞听写部分

wsParam = SoundRecognition.Ws_Param(APPID = appid,APISecret = apisecret,APIKey = apikey,AudioFile = r'./test.pcm')
SoundRecognition.websocket.enableTrace(False)
wsUrl = wsParam.create_url()

ws = SoundRecognition.websocket.WebSocketApp(wsUrl, on_message=SoundRecognition.on_message, on_error=SoundRecognition.on_error, on_close=SoundRecognition.on_close)
ws.on_open = SoundRecognition.on_open
# 输出
ws.run_forever(sslopt={"cert_reqs": ssl.CERT_NONE})

|5|从返回结果中提取汉字

def WordExtract(file_path):
    with open(file_path, 'r+', encoding='utf-8') as file:
        lines = file.readlines()    # 读取文件的全部内容
        file.truncate(0)  # 清空文件内容

    # 提取汉字
    hanzi = []
    for line in lines:
        # 解析每一行的 JSON 数据
        json_data = json.loads(line.strip())
        for item in json_data:
            if "cw" in item:
                for cw_item in item["cw"]:
                    if cw_item["w"].strip():  # 去掉空格
                        hanzi.append(cw_item["w"])

    # # 输出提取的汉字
    # print("提取的汉字：", ''.join(hanzi))
    return ''.join(hanzi)

|6|DeepSeek回答

response = client.chat.completions.create(
        model=deepseek_model,
        messages=messages
    )
    # print("deepseek回复：",response.choices[0].message.content)
    with open("answer.txt", "a", encoding="utf-8") as file:
        file.write(response.choices[0].message.content)
        file.write("\n")

为实现连续提问，需将前文message使用.append添加到新message中

messages.append(response.choices[0].message)  # 连续问答|添加回复

其中，message格式如下（system为系统描述，user为用户信息）：

messages = [{"role": "system", "content": "你是一个具备实时语义理解与多轮上下文记忆能力,能通过动态情感分析生成带自然语气词的口语化回复，自动适配正式/非正式场景，集成实时知识校验与安全过滤机制，实现类人对话体验，回答时不需要说理由或者说很短的理由的智能助手。"},
    {"role":     "user", "content": question}]

|7|提取答案文本

def answer(file_path):
    with open(file_path, 'r', encoding='utf-8') as file:
        lines = file.readlines()
    # 删除空行和多余的回车
    cleaned_lines = [line.strip() for line in lines if line.strip()]
    # 将处理后的内容重新组合为一个字符串
    cleaned_content = '\n'.join(cleaned_lines)
    # 输出处理后的结果
    return cleaned_content

|8|科大讯飞文字转语音，文字转为pcm文件

wsParam = WordSound.Ws_Param(APPID = appid, APISecret = apisecret,
                           APIKey = apikey, Text = answer)
    WordSound.websocket.enableTrace(False)
    wsUrl = wsParam.create_url()
    ws = WordSound.websocket.WebSocketApp(wsUrl, on_message=WordSound.on_message, on_error=WordSound.on_error, on_close=WordSound.on_close)
    ws.on_open = WordSound.on_open
    ws.run_forever(sslopt={"cert_reqs": ssl.CERT_NONE})

|9|回复声音播放

def play_pcm(pcm_file, sample_rate=16000, bit_depth=16, channels=1):
    """
    播放PCM文件
    :param pcm_file: PCM文件路径
    :param sample_rate: 采样率，默认16000 Hz
    :param bit_depth: 位深，默认16位
    :param channels: 声道数，默认1（单声道）
    """
    # 从PCM文件加载音频
    audio = AudioSegment.from_file(
        file=pcm_file,
        format="raw",
        frame_rate=sample_rate,
        sample_width=bit_depth // 8,
        channels=channels
    )

    # 播放音频
    play(audio)

|10|------>|1|

注意事项:

缺的文件pip install *****下载即可，其中，先下载websocket，再下载websocket-client