语音评分模型实现和部署

最新推荐文章于 2024-09-16 00:00:00 发布

jl18

最新推荐文章于 2024-09-16 00:00:00 发布

阅读量1.8k

点赞数 24

文章标签： python

本文链接：https://blog.csdn.net/m0_61868996/article/details/139906195

版权

我们选择使用微软的发音评估模型来对用户发音进行细致评分，以向用户展示不同方面的发音分数，和对应单词的发音分数，使用户更好的发现自己的发音问题，从而达到精确纠正的目的。

官方文档地址：使用发音评估 - Azure AI services | Microsoft Learn

发音评估支持不间断的流式处理模式，所以可以通过流式处理语音信息，但是也可以一次性处理，为了与前端沟通方便，前端传回的wav文件，可以一次性进行分析，这里查看源码可以看到用了async来同步结果，实际上就是流式处理的进行缓存，再统一处理的结果，利用异步锁，识别完全部内容后再输入模型：

在 SpeechRecognizer 中，可以指定要学习或练习以改进发音的语言。默认区域设置为 en-US

必须创建PronunciationAssessmentConfig对象。可以设置 EnableProsodyAssessment 和 EnableContentAssessmentWithTopic 以启用韵律评估和内容评估

pronunciation_config = speechsdk.PronunciationAssessmentConfig( 
    reference_text="", 
    grading_system=speechsdk.PronunciationAssessmentGradingSystem.HundredMark, 
    granularity=speechsdk.PronunciationAssessmentGranularity.Phoneme, 
    enable_miscue=False) 
pronunciation_config.enable_prosody_assessment() 
pronunciation_config.enable_content_assessment_with_topic("greeting")

参数说明：

参数	说明
`ReferenceText`	用来对发音进行评估的文本。 `ReferenceText` 参数是可选的。如果要为阅读语言学习场景运行脚本化评估，请设置参考文本。如果要运行未脚本化评估，不要设置引用文本。有关脚本化评估与非脚本化评估之间的定价差异，请参阅定价。
`GradingSystem`	用于分数校准的分数系统。 `FivePoint` 给出 0-5 浮点分数。 `HundredMark` 给出 0-100 浮点分数。默认：`FivePoint`。
`Granularity`	确定评估粒度的最低级别。返回大于或等于最小值的级别分数。接受的值为 `Phoneme`（显示全文、单词、音节和音素级别的分数）、`Word`（显示全文和单词级别的分数）或 `FullText`（只显示全文级别的分数）。提供的完整引用文本可以是单词、句子或段落。具体取决于输入引用文本。默认：`Phoneme`。
`EnableMiscue`	将发音的字与引用文本进行比较时，启用误读计算。启用误读是可选的。如果此值为 `True`，则可以根据比较将 `ErrorType` 结果值设置为 `Omission` 或 `Insertion`。值为 `False` 和 `True`。默认：`False`。要启用错误计算，请将 `EnableMiscue` 设置为 `True`。可以参考表上方的代码片段。
`ScenarioId`	一个 GUID，表示自定义分数系统。

配置方法

方法说明

EnableProsodyAssessment 为发音评估启用韵律评估。此功能评估重音、语调、语速和节奏等方面。此功能可让你深入了解语音的自然性和表现力。

启用韵律评估是可选操作。如果调用此方法，将返回 ProsodyScore 结果值。

EnableContentAssessmentWithTopic 启用内容评估。内容评估是口语学习场景的未脚本化评估的一部分。通过提供说明，可以增强评估对谈论的特定主题的理解。例如，在 C# 调用 pronunciationAssessmentConfig.EnableContentAssessmentWithTopic("greeting"); 中。可以将“greeting”替换为所需的文本来描述主题。描述没有长度限制，目前仅支持 en-US 区域设置。

方法	说明
`EnableProsodyAssessment`	为发音评估启用韵律评估。此功能评估重音、语调、语速和节奏等方面。此功能可让你深入了解语音的自然性和表现力。启用韵律评估是可选操作。如果调用此方法，将返回 `ProsodyScore` 结果值。
`EnableContentAssessmentWithTopic`	启用内容评估。内容评估是口语学习场景的未脚本化评估的一部分。通过提供说明，可以增强评估对谈论的特定主题的理解。例如，在 C# 调用 `pronunciationAssessmentConfig.EnableContentAssessmentWithTopic("greeting");` 中。可以将“greeting”替换为所需的文本来描述主题。描述没有长度限制，目前仅支持 `en-US` 区域设置。

获取发音评估结果：

speech_recognizer = speechsdk.SpeechRecognizer(
        speech_config=speech_config, \
        audio_config=audio_config)

pronunciation_assessment_config.apply_to(speech_recognizer)
speech_recognition_result = speech_recognizer.recognize_once()

# The pronunciation assessment result as a Speech SDK object
pronunciation_assessment_result = speechsdk.PronunciationAssessmentResult(speech_recognition_result)

# The pronunciation assessment result as a JSON string
pronunciation_assessment_result_json = speech_recognition_result.properties.get(speechsdk.PropertyId.SpeechServiceResponse_JsonResult)

这两种方式，一个是通过SDK，一个是返回json，实验结果发现，json提供的内容非常复杂，而我们只需要特定的信息，即流畅性评分，精准性评分，发音评分，韵律评分，还有每个单词发音的准确性，以及错误类型

总体代码如下，这里封装了Flask接口，直接模块化输入输出，与前端进行对接：

from flask import Flask, request
import azure.cognitiveservices.speech as speechsdk
import os
import time

app = Flask(__name__)


@app.route('/speechtotext', methods=['POST'])
def upload_file():
    if 'file' not in request.files:
        return 'No file part'
    file = request.files['file']
    if file.filename == '':
        return 'No selected file'
    if file:
        filename = "uploaded.wav"
        file.save(filename)
        result = pronunciation_assessment_with_content_assessment(filename)
        # os.remove(filename)  # remove the file after processing
        return result

def pronunciation_assessment_with_content_assessment(filename):
    ###与音频文件的输入异步执行内容评估

    speech_config = speechsdk.SpeechConfig(subscription="c85d9bf805634270a809c0618689c677", region="eastus")
    audio_config = speechsdk.audio.AudioConfig(filename=filename)

    ## 创建发音评估配置，根据您的要求设置评分系统、粒度和是否启用错误。
    pronunciation_config = speechsdk.PronunciationAssessmentConfig(
        grading_system=speechsdk.PronunciationAssessmentGradingSystem.HundredMark,
        granularity=speechsdk.PronunciationAssessmentGranularity.Phoneme)
    pronunciation_config.enable_prosody_assessment()

    language = 'en-US'
    speech_recognizer = speechsdk.SpeechRecognizer(
        speech_config=speech_config, language=language, audio_config=audio_config)
    pronunciation_config.apply_to(speech_recognizer)
    speech_recognition_result = speech_recognizer.recognize_once()
    pronunciation_assessment_result = speechsdk.PronunciationAssessmentResult(speech_recognition_result)

    result = {
        "AccuracyScore": pronunciation_assessment_result.accuracy_score,
        "FluencyScore": pronunciation_assessment_result.fluency_score,
        "CompletenessScore": pronunciation_assessment_result.completeness_score,
        "ProsodyScore": pronunciation_assessment_result.prosody_score,
        "PronunciationAssessmentResult": pronunciation_assessment_result.pronunciation_score,
        "Words": [
            {
                "Word": word.word,
                "AccuracyScore": word.accuracy_score,
                "Errortype": word.error_type
            } for word in pronunciation_assessment_result.words
        ]
        # Possible values are None (meaning no error on this word), Omission, Insertion and Mispronunciation.
    }

    return result

if __name__ == '__main__':
    app.run(debug=False, port=5001)

这是我说的一句话，postman发送请求后，后端运行后返回参数如下：

{

"AccuracyScore": 81.0,

"FluencyScore": 84.0,

"PronunciationAssessmentResult": 74.0,

"ProsodyScore": 68.4,

"Words": [

{

"AccuracyScore": 35.0,

"Errortype": "Mispronunciation",

"Word": "what's"

},

{

"AccuracyScore": 92.0,

"Errortype": "None",

"Word": "wrong"

},

{

"AccuracyScore": 84.0,

"Errortype": "None",

"Word": "with"

},

{

"AccuracyScore": 100.0,

"Errortype": "None",

"Word": "you"

},

{

"AccuracyScore": 96.0,

"Errortype": "None",

"Word": "man"

}

]

}