Qt Speech来到Qt 6.4

Qt Speech coming to Qt 6.4

Qt Speech来到Qt 6.4

Thursday May 05, 2022 by Volker Hilsheimer | Comments

​2022年5月5日星期四 沃尔克·希尔谢默 | 评论

Over the last couple of months we have ported the text-to-speech functionality in the Qt Speech module over to Qt 6, and it will be part of the Qt 6.4 release later in 2022.

​在过去的几个月里,我们已经将Qt Speech模块中的文本到语音功能移植到Qt 6上,它将成为2022年晚些时候Qt 6.4版本的一部分。

As with the Qt 5 version, Qt Speech provides application developers with a QObject subclass, QTextToSpeech, that provides an interface to the platform's speech synthesizer engine; and a value type QVoice that encapsulates voice characteristics. With those classes, applications can make themselves more accessible to users, and go beyond the screen-reader functionality of assistive technologies. Using non-visual channels to inform users about changes or events can be very useful in hands-free situations, such as turn-by-turn navigation systems. Content-focused applications like ebook readers could benefit from text-to-speech synthesis without depending on assistive technology.

​与Qt 5版本一样,Qt Speech为应用程序开发人员提供了QObject子类QTextToSpeech,该子类为平台的语音合成器引擎提供接口;以及封装语音特征的值类型QVoice。有了这些类,应用程序可以让用户更容易访问,并且超越了辅助技术的屏幕阅读器功能。在免提的情况下,使用非视觉渠道向用户通知更改或事件可能非常有用,例如轮流导航系统。以内容为中心的应用程序,如电子书阅读器,可以在不依赖辅助技术的情况下从文本到语音合成中受益。

The APIs in the Qt Speech module are practically unchanged, and most Qt 5 applications that use QTextToSpeech today will not have to make any code changes when moving to Qt 6.

Qt Speech模块中的API实际上没有改变,现在使用QTextToSpeech的大多数Qt 5应用程序在移动到Qt 6时不需要进行任何代码更改。

Supported engines

支持的引擎

We spent most of the time with reviewing, cleaning up, and improving the existing engine implementations. As with Qt 5, we will support the flite synthesizer and the speech-dispatcher on Linux. For flite, the minimum supported version will be 2.3, and we plan to support both static and dynamically linked configurations. For libspeechd we will require at least version 0.9, and the code working around shortcomings in older versions has been removed. If both engine plugins are present, then the speech-dispatcher plugin will have priority.

​我们花了大部分时间来审查、清理和改进现有的引擎实现。与Qt5一样,我们将在Linux上支持flite合成器和语音调度器。对于flite,支持的最低版本为2.3,我们计划同时支持静态和动态链接配置。对于libspeechd,我们将至少需要0.9版本,并且解决旧版本中缺陷的代码已被删除。如果两个引擎插件都存在,那么语音调度器插件将具有优先权。

On Windows, the default engine is a new implementation based on the Windows Runtime APIs in the Windows.Media.SpeechSynthesis namespace. The engine uses the low-level QAudioSink API from Qt Multimedia to play the generated PCM data. We also continue to support the SAPI 5.3 based engine; the technology is somewhat outdated and doesn't have the same amount and quality of voices as the more modern WinRT engine, but it is the only engine available when building Qt with MinGW.

​在Windows上,默认引擎是基于Windows中,Windows运行时  Windows.Media.SpeechSynthesis 的API新实现。引擎使用来自Qt Multimedia的低级QAudioSink API播放生成的PCM数据。我们还将继续支持基于SAPI 5.3的引擎;这项技术有些过时,声音的数量和质量不如更现代的WinRT引擎,但它是使用MinGW构建Qt时唯一可用的引擎。

On Apple platforms we continue to support two engines. The engine currently still known as "ios" is now also available on macOS, and we will probably rename it before long. It is based on the AVSpeechSynthesizer API, whereas the engine currently still known as "macos" is based on the NSSpeechSynthesizer AppKit API. As of macOS Mojave, the AVFoundation framework API is documented to be available on macOS, but the APIs turned out to be too buggy on macOS 10.14, which we still support. The NSSpeechSynthesizer API also has bugs and limitations, but at least basic functionality works. On macOS 10.15 and up, and on any iOS platform, the AVFoundation based engine is the right choice, and perhaps we find a way to make it work on macOS 10.14 as well.

​在苹果平台上,我们继续支持两个引擎。目前仍被称为“ios”的引擎现在也可以在macOS上使用,我们可能很快就会将其重命名。它基于AVSpeechSynthesizer API,而目前仍被称为“macos”的引擎基于NSSpeechSynthesizer AppKit API。截至macOS Mojave,据文件记载,macOS上可以使用AVFoundation framework API,但事实证明,这些API在我们仍然支持的macOS 10.14上存在太多缺陷。NSSpeechSynthesizer API也有缺陷和限制,但至少基本功能可以正常工作。在macOS 10.15及以上版本上,以及在任何iOS平台上,基于AVFoundation的引擎都是正确的选择,也许我们也能找到一种方法让它在macOS 10.14上运行。

Last but not least, on Android we support the engine based on the android.speech.tts package as before.

​最后,在Android上,我们支持基于Android的引擎android.speech.tts软件包和以前一样。

Web Assembly support will not be included in Qt 6.4.

Qt 6.4中不包括Web组件支持。

A note to engine implementors: We have removed the QTextToSpeechProcessor abstraction from the flite engine implementation. It was exclusively used by the flite engine, and introduced unneeded complexity only to provide threading infrastructure that could be replaced with standard signal/slot invocations. Also, we will not make any source or binary compatibility guarantees for the QTextToSpeechEngine class, as we might want to additional virtual methods in future releases (and plugins have to be built against the same Qt version as the application that loads them anyway).

引擎实现者注意:我们已经从flite引擎实现中删除了QTextToSpeechProcessor抽象。它专门由flite引擎使用,引入了不必要的复杂性,只是为了提供可以用标准信号/插槽调用替代的线程基础设施。此外,我们不会为qtexttospeechanine类提供任何源代码或二进制兼容性保证,因为我们可能希望在未来的版本中添加其他虚拟方法(插件必须基于与加载它们的应用程序相同的Qt版本构建)。

Porting from Qt 5 to Qt 6

从Qt 5到Qt 6的移植

As of now, the port is functionally equivalent to the Qt 5 version, with some minor additions and changes:

到目前为止,该端口的功能相当于Qt 5版本,只是做了一些小的添加和更改:

QVoice is now a modern, movable C++ type. It has gotten a new locale() property which informs callers about the language that the voice is designed to speak, and it supports QDataStream operators. As in Qt 5, a QVoice can only be created by the text-to-speech engine. Applications will have to make sure that a voice loaded from a data stream is suitable for the engine that it will be set on (perhaps by saving and loading the engine name as well).

QVoice现在是一种现代的、可移动的C++类型。它获得了一个新的locale()属性,该属性可以通知呼叫者语音设计要说的语言,并且支持QDataStream操作符。与Qt5一样,QVoice只能由文本到语音引擎创建。应用程序必须确保从数据流加载的语音适用于将要启动的引擎(可能还需要保存和加载引擎名称)。

In the QTextToSpeech class, the API is practically unchanged. Call the say() function to synthesize speech for a piece of text. The speech will then be played on the default audio output device. pause()resume(), and stop() work like before and don't need any further introduction. Properties like pitch, rate, and volume can be configured as before, and on most platforms will only impact the next call to say().

在QTextToSpeech类中,API实际上没有改变。调用say()函数为一段文本合成语音。然后,语音将在默认的音频输出设备上播放。pause()、resume()和stop()像以前一样工作,不需要进一步介绍。音高、速率和音量等属性可以像以前一样进行配置,在大多数平台上,这些属性只会影响对say()的下一次调用。

Qt 6.4 feature freeze is still a few weeks away, and we think that we can add at least one new feature from the list of suggestions in JIRA to Qt Speech: more engine-specific configure options.

Qt 6.4功能冻结还有几周的时间,我们认为我们可以从JIRA中的建议列表中至少添加一个新功能到Qt演讲中:更多特定于引擎的配置选项。

On some platforms, we play the generated audio ourselves via Qt Multimedia, so we can make the audio output device configurable (QTBUG-63677). On Android, multiple speech synthesizers might be present, and we want to make that configurable as well (QTBUG-66033). At time of writing, the API for this is proposed and up for review in https://codereview.qt-project.org/c/qt/qtspeech/+/406229.

​在某些平台上,我们自己通过Qt多媒体播放生成的音频,因此我们可以配置音频输出设备(QTBUG-63677)。在安卓系统上,可能会有多个语音合成器,我们也希望能够对其进行配置(QTBUG-66033)。在撰写本文时,已经提出了这方面的API,并将在https://codereview.qt-project.org/c/qt/qtspeech/+/406229.

QML API

Speech synthesising is now fully usable from QML via the TextToSpeech and Voice types. Import QtSpeech.TextToSpeech to use those types in a Qt Quick application. Voice attributes are accessible directly from QML. A new QML-only example is included that does the exact same thing as the old C++-with-Widgets example.

​语音合成现在可以通过TextToSpeech和Voice类型从QML完全使用。在Qt Quick应用程序中使用Import QtSpeech.TextToSpeech载入这些类型。语音属性可直接从QML访问。其中包含了一个新的仅限QML的示例,它与旧的C++Widgets示例的功能完全相同。

 

Beyond Qt 6.4

展望Qt 6.4

We are focusing on completing the work for Qt 6.4 in time for the feature freeze. But we have a few ideas about more features for future releases, and we'd like to hear from you about these ideas:

我们的重点是在功能冻结前及时完成Qt 6.4的工作。但我们对未来版本的更多功能有一些想法,我们希望听到您对这些想法的看法:

Enqueueing of utterances

话语排队

Today, QTextToSpeech::say() stops any ongoing utterance, and then proceeds with synthesizing the new text. We would like to see if we can make calls to say() line up new utterances so that a continuous audio stream is produced without the application having to orchestrate the calls. Several engines support this natively, and for those that don't this natively it is conceptually easy to implement on a higher level. However, we would then also like to add an API that allows applications to line up blocks of text with modified speech parameters (voice, pitch, rate etc), and applications might want to follow the speaking progress. So this requires a bit of design.

今天,QTextToSpeech::say()停止任何正在进行的话语,然后继续合成新文本。我们想看看是否可以调用say()来排列新的话语,以便在应用程序不必编排调用的情况下生成连续的音频流。有几个引擎本机支持这一点,而对于那些本机不支持这一点的引擎,从概念上讲,它很容易在更高的级别上实现。然而,我们还想添加一个API,允许应用程序将文本块与修改后的语音参数(语音、音调、速率等)对齐,应用程序可能希望跟踪语音进程。所以这需要一些设计。

Detailed progress report

详细进度报告

Some engines provide meta-data and callback infrastructure that reports which words are being spoken at a given time. Exposing update information on this level through signals would allow applications to better synchronize UI with speech output. From an API design perspective this goes hand-in-hand with queuing up utterances and updating progress on that level.

一些引擎提供元数据和回调基础设施,以报告在给定时间说出的单词。通过信号公开这一级别的更新信息将允许应用程序更好地将UI与语音输出同步。从API设计的角度来看,这与排队发言和更新该级别的进度密切相关。

SSML support

SSML支持

SSML is a standardized XML format that allows content providers to annotate text with information for speech synthesizer engines. For example, slowing down certain phrases, changing the pitch, or controlling whether "10" is pronounced as "ten" or as "one zero". Alias tags can be used to make sure that a speech synthesizer pronounces "Qt" like it should be! Several engines support SSML input, although the supported subset varies from engine to engine. Some of the tags could be simulated by QTextToSpeech (which then again requires that we can line up utterances with different configurations). At the least we can strip away unsupported XML tags so that applications don't have to worry about a character-by-character rendition of XML to their users.

SSML是一种标准化的XML格式,允许内容提供商为语音合成器引擎用信息注释文本。例如,放慢某些短语的语速,改变音高,或者控制“10”的发音是“10”还是“1 0”。Alias标签可以用来确保语音合成器像应该的那样发音“Qt”!有几个引擎支持SSML输入,尽管支持的子集因引擎而异。其中一些标签可以由QTextToSpeech模拟(这再次要求我们可以将不同配置的话语排列起来)。至少我们可以去掉不受支持的XML标记,这样应用程序就不必担心向用户逐个字符地呈现XML。

Access to PCM data

访问PCM数据

Both the flite and the WinRT engines synthesise text into a stream of PCM data that we have to pass to an audio device using Qt Multimedia. Android's UtteranceProgressListener interface makes chunks of audio data available through a callback. The AVFoundation API has a similar approach. So for several of the supported engines we might be able to give applications access to the PCM data for further processing, which might open up additional use cases.

​flite和WinRT引擎都将文本合成为PCM数据流,我们必须使用Qt多媒体将其传递给音频设备。Android的OuttanceProgressListener界面通过回调提供大量音频数据。AVFoundation API也有类似的方法。因此,对于一些受支持的引擎,我们可能能够让应用程序访问PCM数据以进行进一步处理,这可能会打开其他用例。

We might also add support for more engines, in particular one for Web Assembly; flite has been ported to web assembly, but that might not be the optimal solution. And last but not least: support for speech recognition would be a great addition to the module!

​我们还可以添加对更多引擎的支持,尤其是对Web组装引擎的支持;flite已经被移植到web组装中,但这可能不是最佳解决方案。最后但并非最不重要的是:对语音识别的支持将是对该模块的一个很好的补充!

Let us know how you are using Qt Speech today, which of the features you would like us to prioritise, and what other ideas you have about this module!

让我们知道您今天是如何使用Qt Speech的,您希望我们优先考虑哪些功能,以及您对本模块还有哪些其他想法!

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值