微软语音合成(tts)服务申请和调用

1、申请账户:

https://azure.microsoft.com/zh-cn/free/

在这里插入图片描述
这里有个视频教程,根据此完成申请流程:
https://www.bilibili.com/video/BV15a4y1W7re?vd_source=bf07f28d37849885d215dc3aea189eba
申请完成后,就可以到这里申请资源:
https://portal.azure.com/#home

点击资源组,里面就有部署好的服务了
在这里插入图片描述
在这里插入图片描述
点击这里,可以获取 subscription_key,另外还有个就是位置service_region (上图就是east asia),这两个后面会用到。

2、调用服务

在完成微软azure服务账号申请后,就可以进行调用了。代码:

'''
After you've set your subscription key, run this application from your working
directory with this command: python TTSSample.py
'''
import os, requests, time
from xml.etree import ElementTree

# This code is required for Python 2.7
try: input = raw_input
except NameError: pass

'''
If you prefer, you can hardcode your subscription key as a string and remove
the provided conditional statement. However, we do recommend using environment
variables to secure your subscription keys. The environment variable is
set to SPEECH_SERVICE_KEY in our sample.
For example:
subscription_key = "Your-Key-Goes-Here"
'''

if 'SPEECH_SERVICE_KEY' in os.environ:
    subscription_key = os.environ['SPEECH_SERVICE_KEY']
else:
    print('Environment variable for your subscription key is not set.')
    exit()

class TextToSpeech(object):
    def __init__(self, subscription_key):
        self.subscription_key = subscription_key
        self.tts = input("What would you like to convert to speech: ")
        self.timestr = time.strftime("%Y%m%d-%H%M")
        self.access_token = None

    '''
    The TTS endpoint requires an access token. This method exchanges your
    subscription key for an access token that is valid for ten minutes.
    '''
    def get_token(self):
        fetch_token_url = "https://westus.api.cognitive.microsoft.com/sts/v1.0/issueToken"
        headers = {
            'Ocp-Apim-Subscription-Key': self.subscription_key
        }
        response = requests.post(fetch_token_url, headers=headers)
        self.access_token = str(response.text)

    def save_audio(self):
        base_url = 'https://westus.tts.speech.microsoft.com/'
        path = 'cognitiveservices/v1'
        constructed_url = base_url + path
        headers = {
            'Authorization': 'Bearer ' + self.access_token,
            'Content-Type': 'application/ssml+xml',
            'X-Microsoft-OutputFormat': 'riff-24khz-16bit-mono-pcm',
            'User-Agent': 'YOUR_RESOURCE_NAME'
        }
        xml_body = ElementTree.Element('speak', version='1.0')
        xml_body.set('{http://www.w3.org/XML/1998/namespace}lang', 'en-us')
        voice = ElementTree.SubElement(xml_body, 'voice')
        voice.set('{http://www.w3.org/XML/1998/namespace}lang', 'en-US')
        voice.set('name', 'en-US-Guy24kRUS') # Short name for 'Microsoft Server Speech Text to Speech Voice (en-US, Guy24KRUS)'
        voice.text = self.tts
        body = ElementTree.tostring(xml_body)

        response = requests.post(constructed_url, headers=headers, data=body)
        '''
        If a success response is returned, then the binary audio is written
        to file in your working directory. It is prefaced by sample and
        includes the date.
        '''
        if response.status_code == 200:
            with open('sample-' + self.timestr + '.wav', 'wb') as audio:
                audio.write(response.content)
                print("\nStatus code: " + str(response.status_code) + "\nYour TTS is ready for playback.\n")
        else:
            print("\nStatus code: " + str(response.status_code) + "\nSomething went wrong. Check your subscription key and headers.\n")
            print("Reason: " + str(response.reason) + "\n")

    def get_voices_list(self):
        base_url = 'https://westus.tts.speech.microsoft.com/'
        path = 'cognitiveservices/voices/list'
        constructed_url = base_url + path
        headers = {
            'Authorization': 'Bearer ' + self.access_token,
        }
        response = requests.get(constructed_url, headers=headers)
        if response.status_code == 200:
            print("\nAvailable voices: \n" + response.text)
        else:
            print("\nStatus code: " + str(response.status_code) + "\nSomething went wrong. Check your subscription key and headers.\n")

if __name__ == "__main__":
    app = TextToSpeech(subscription_key)
    app.get_token()
    app.save_audio()
    # Get a list of voices https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/rest-text-to-speech#get-a-list-of-voices
    # app.get_voices_list()

参考文档:
https://docs.microsoft.com/zh-cn/azure/cognitive-services/speech-service/
https://github.com/Azure-Samples/Cognitive-Speech-TTS/blob/28681c8292c95aebb36d3696b8822b4cd17c3c45/Samples-Http/OLD/Python/TTSSample.py

  • 3
    点赞
  • 16
    收藏
    觉得还不错? 一键收藏
  • 7
    评论
在Linux系统下安装高质量的中、英文语音合成TTS)可以使用微软提供的TTS引擎。 首先,你需要访问微软Azure语音服务的官方网站,并注册一个账号来获取访问密钥。注册成功后,通过在Azure门户中创建一个语音资源,你将得到一个订阅密钥和区域。 接下来,你需要在Linux系统中安装一个音频合成引擎,例如Festival或eSpeak。你可以使用命令行工具来安装它们,具体命令取决于你所使用的Linux发行版。例如,在Debian系列发行版中,你可以使用以下命令安装Festival和eSpeak: 对于Festival: ``` sudo apt-get install festival ``` 对于eSpeak: ``` sudo apt-get install espeak ``` 安装完成后,你可以使用命令行工具来实现简单的文本到语音转换。例如,在Festival中,你可以运行以下命令: ``` echo "你好,这是一个测试" | festival --tts ``` 下一步是使用微软提供的Python语音合成SDK进行安装,这将使你能够在Linux系统中使用微软的高质量TTS引擎。你可以使用pip来安装必要的Python库: ``` pip install azure-cognitiveservices-speech ``` 然后,你可以使用以下Python代码来实现中、英文TTS: ```python import azure.cognitiveservices.speech as speechsdk # 初始化语音配置 speech_key = "你的Azure语音密钥" service_region = "你的Azure区域" speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region) # 创建语音 synthesizer speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config) result = speech_synthesizer.speak_text("你好,这是一个测试") ``` 以上就是在Linux系统下安装高质量中、英文TTS的方法。通过微软的Azure语音服务和适当的开源工具,你可以在Linux系统中实现高质量的语音合成功能。
评论 7
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值