预期结果:实时语音文字识别
三方功能支持:腾讯云语音识别
对接要求:在识别过程中,客户端持续上传 binary message 到后台,内容为音频流二进制数据。建议每40ms 发送40ms 时长(即1:1实时率)的数据包,对应 pcm 大小为:8k 采样率640字节,16k 采样率1280字节。音频发送速率过快超过1:1实时率或者音频数据包之间发送间隔超过6秒,可能导致引擎出错,后台将返回错误并主动断开连接。
音频流上传完成之后,客户端需发送以下内容的 text message,通知后台结束识别。
拼装请求url:
传入参数:engine_model_type【引擎模型类型】16k_zh_dialect:多方言,
expired【签名的有效期截止时间,UNIX 时间戳,单位为秒】System.currentTimeMillis() / 1000L + 86400L,
needvad【语音分片长】1:开启
nonce【随机正整数】RandomUtil.randomInt(1000, 99999);
timestamp【当前 UNIX 时间戳,单位为秒】,
secretid【密钥 】
voice_format【语音编码方式】1:pcm
voice_id【音频流识别全局唯一标识】AsrUtils.getVoiceId(asrConfig.getAppId())
signature【接口签名参数】
举例:wss://asr.cloud.tencent.com/asr/v2/1256841545?engine_model_type=16k_zh_dialect&expired=1685527216&needvad=1&nonce=37769&secretid=AKIDkHZbOtm7qKYu1ktrY0D9k6E6hfPdFIkx×tamp=1685440816&voice_format=8&voice_id=1256841545_1685440816950_byc53&signature=Qv1YDDYCP7skMsASStxFuAVMa0w=
签名生成:
1、对除 signature 之外的所有参数按字典序进行排序,拼接请求 URL 作为签名原文,这里以 Appid=125922***
,SecretId=*****Qq1zhZMN8dv0******
为例拼接签名原文,则拼接的签名原文为:
asr.cloud.tencent.com/asr/v2/125922***?engine_model_type=16k_zh&expired=1673494772&needvad=1&nonce=1673408372&secretid=*****Qq1zhZMN8dv0******×tamp=1673408372&voice_format=1&voice_id=c64385ee-3e5c-4fc5-bbfd-7c71addb35b0
实现方法:
private TreeMap<String, Object> getRequestParamMap(AsrConfig asrConfig, AsrRequest request, AsrRequestContent content) {
TreeMap<String, Object> treeMap = new TreeMap();
treeMap.put(TencentContents.SECRET_ID, asrConfig.getSecretId());
treeMap.put(TencentContents.ENGINE_MODEL_TYPE, request.getEngineModelType());
treeMap.put(TencentContents.VOICE_ID, content.getVoiceId());
treeMap.put(TencentContents.VOICE_FORMAT, request.getVoiceFormat());
treeMap.put(TencentContents.TIMESTAMP, request.getTimestamp());
treeMap.put(TencentContents.EXPIRED, request.getExpired());
treeMap.put(TencentContents.NONCE, request.getNonce());
treeMap.put(TencentContents.NEED_VAD, request.getNeedVad());
return treeMap;
}
private TreeMap<String, Object> getWsParams(AsrConfig asrConfig, AsrRequest request, AsrRequestContent content) {
TreeMap<String, Object> treeMap = this.getRequestParamMap(asrConfig, request, content);
if (request.getExtendsParam() != null) {
Iterator var5 = request.getExtendsParam().entrySet().iterator();
while (var5.hasNext()) {
Map.Entry<String, Object> entry = (Map.Entry) var5.next();
treeMap.put(entry.getKey(), entry.getValue());
}
}
return treeMap;
}
public static String createUrl(Map<String, Object> paramMap) {
StringBuilder sb = new StringBuilder();
sb.append("?");
Iterator var2 = paramMap.entrySet().iterator();
while(var2.hasNext()) {
Map.Entry<String, Object> entry = (Map.Entry)var2.next();
if (entry.getValue() != null && entry.getValue() != "") {
sb.append((String)entry.getKey());
sb.append('=');
sb.append(entry.getValue());
sb.append('&');
}
}
if (paramMap.size() > 0) {
sb.setLength(sb.length() - 1);
}
return sb.toString();
}
String signUrl = new StringBuilder().append(asrConfig.getWsSignUrl()).append(asrConfig.getAppId()).append(paramUrl).toString();
public AsrConfig(String appId, String secretKey, String secretId, Long waitTime, String realAsrUrl, String signUrl, String logUrl, String wsUrl, String token) {
super(secretId, secretKey, Long.valueOf(appId), token);
this.realAsrUrl = (String)Optional.ofNullable(realAsrUrl).orElse("https://asr.cloud.tencent.com/asr/v1/");
this.signUrl = (String)Optional.ofNullable(signUrl).orElse("asr.cloud.tencent.com/asr/v1/");
this.logUrl = (String)Optional.ofNullable(logUrl).orElse("https://asr.tencentcloudapi.com/");
this.wsUrl = (String)Optional.ofNullable(wsUrl).orElse("wss://asr.cloud.tencent.com/asr/v2/");
this.wsSignUrl = "asr.cloud.tencent.com/asr/v2/";
this.flashUrl = "https://asr.cloud.tencent.com/asr/flash/v1/";
this.flashSignUrl = "asr.cloud.tencent.com/asr/flash/v1/";
this.waitTime = (Long)Optional.ofNullable(waitTime).orElse(6000L);
}
2、对签名原文使用 SecretKey 进行 HmacSha1 加密,之后再进行 base64 编码。例如对上一步的签名原文, SecretKey=*****SkqpeHgqmSz*****
,使用 HmacSha1 算法进行加密并做 base64 编码处理:
Base64Encode(HmacSha1("asr.cloud.tencent.com/asr/v2/125922***?engine_model_type=16k_zh&expired=1673494772&needvad=1&nonce=1673408372&secretid=*****Qq1zhZMN8dv0******×tamp=1673408372&voice_format=1&voice_id=c64385ee-3e5c-4fc5-bbfd-7c71addb35b0", "*****SkqpeHgqmSz*****"))
得到 signature 签名值为:G8jDQBRg1JfeBi/YnTjyjekxfDA=
代码:
public static String base64_hmac_sha1(String originalText, String secretKey) {
try {
Mac hmac = Mac.getInstance("HmacSHA1");
hmac.init(new SecretKeySpec(secretKey.getBytes("UTF-8"), "HmacSHA1"));
byte[] hash = hmac.doFinal(originalText.getBytes("UTF-8"));
return Base64.encodeBase64String(hash);
} catch (Exception var4) {
var4.printStackTrace();
return "";
}
}
4、将 signature 值进行 urlencode(必须进行 URL 编码,否则将导致鉴权失败偶现 )后拼接得到最终请求 URL 为:
wss://asr.cloud.tencent.com/asr/v2/1259228442?engine_model_type=16k_zh&expired=1592380492&filter_dirty=1&filter_modal=1&filter_punc=1&needvad=1&nonce=1592294092123&secretid=AKIDoQq1zhZMN8dv0psmvud6OUKuGPO7pu0r×tamp=1592294092&voice_format=1&voice_id=RnKu9FODFHK5FPpsrN&signature=HepdTRX6u155qIPKNKC%2B3U0j1N0%3D
websocket调用代码:
/**
* 单独执行
*
* @param client SpeechClient
*/
public static void runOnce(final SpeechClient client) {
try {
//案例使用文件模拟实时获取语音流,用户使用可直接调用write传入字节数据
FileInputStream fileInputStream = new FileInputStream(new File("E:\\CloudMusic\\电台节目\\365读书 - 钱钟书:谈教训.mp3"));
// FileInputStream fileInputStream = new FileInputStream(new File("E:\\Download\\珍惜-孙露.mp3"));
//http 建议每次传输200ms数据 websocket建议每次传输40ms数据
List<byte[]> speechData = ByteUtils.subToSmallBytes(fileInputStream,
SpeechRecognitionSysConfig.requestWay == AsrConstant.RequestWay.Http ? 6400 : 640);
//请求参数,用于配置语音识别相关参数,可使用init方法进行默认配置或使用 builder的方式构建自定义参数
SpeechRecognitionRequest request = SpeechRecognitionRequest.initialize();
request.setEngineModelType("16k_zh_dialect"); //模型类型为必传参数,否则异常
request.setVoiceFormat(8); //指定音频格式
SpeechRecognizer speechWsRecognizer = client.newSpeechRecognizer(request, new MySpeechRecognitionListener());
//开始识别 调用start方法
speechWsRecognizer.start();
for (int i = 0; i < speechData.size(); i++) {
//模拟音频间隔
Thread.sleep(SpeechRecognitionSysConfig.requestWay == AsrConstant.RequestWay.Http ? 200 : 20);
//发送数据
speechWsRecognizer.write(speechData.get(i));
}
//结束识别调用stop方法
speechWsRecognizer.stop();
fileInputStream.close();
} catch (Exception e) {
e.printStackTrace();
}
}
public class SpeechWsRecognizer implements SpeechRecognizer {
protected AsrConfig asrConfig;
protected SpeechRecognitionRequest asrRequest;
protected AsrRequestContent asrRequestContent;
protected SpeechRecognitionListener listener;
protected WebSocket webSocket;
protected int reConnectMaxNum = 10;
protected int connectNum = 0;
protected volatile boolean isConnect = false;
protected volatile AtomicBoolean endFlag = new AtomicBoolean(false);
protected volatile AtomicBoolean startFlag = new AtomicBoolean(false);
protected SpeechRecognitionSignService speechRecognitionSignService = new SpeechRecognitionSignService();
private ReentrantLock lock = new ReentrantLock();
private final CountDownLatch startLatch = new CountDownLatch(1);
private final CountDownLatch closeLatch = new CountDownLatch(1);
private boolean begin = false;
private AtomicLong adder = new AtomicLong(0L);
private TractionManager tractionManager;
private WsClientService wsClientService;
public SpeechWsRecognizer(WsClientService wsClientService, String streamId, AsrConfig config, SpeechRecognitionRequest request, SpeechRecognitionListener listener) {
this.wsClientService = wsClientService;
this.asrConfig = config;
this.asrRequest = request;
if (StringUtils.isEmpty(request.getVoiceId())) {
request.setVoiceId(AsrUtils.getVoiceId(config.getAppId()));
}
this.asrRequestContent = AsrRequestContent.builder().seq(0).end(0).streamId(streamId).voiceId(request.getVoiceId()).build();
this.listener = listener;
this.tractionManager = new TractionManager(config.getAppId());
}
private Boolean createWebsocket() throws SdkRunException {
if (!this.isConnect || this.webSocket == null) {
Boolean var2;
try {
this.lock.lock();
if (this.isConnect && this.webSocket != null) {
return true;
}
ReportService.ifLogMessage(this.getId(), "create websocket", false);
this.asrRequest.setTimestamp(System.currentTimeMillis() / 1000L);
this.asrRequest.setExpired(System.currentTimeMillis() / 1000L + 86400L);
String paramUrl = SignHelper.createUrl(this.speechRecognitionSignService.getWsParams(this.asrConfig, this.asrRequest, this.asrRequestContent));
String signUrl = this.asrConfig.getWsSignUrl() + this.asrConfig.getAppId() + paramUrl;
String sign = SignBuilder.base64_hmac_sha1(signUrl, this.asrConfig.getSecretKey());
String url = this.asrConfig.getWsUrl() + this.asrConfig.getAppId() + paramUrl;
WebSocketListener webSocketListener = this.createWebSocketListener();
this.webSocket = this.wsClientService.asrWebSocket(this.asrConfig.getToken(), url, sign, webSocketListener);
this.isConnect = true;
boolean countDown = this.startLatch.await((long)SpeechRecognitionSysConfig.wsStartMethodWait, TimeUnit.SECONDS);
if (!countDown) {
throw new SdkRunException(Code.CODE_10001);
}
return true;
} catch (Exception var10) {
var10.printStackTrace();
var2 = false;
} finally {
this.lock.unlock();
}
return var2;
} else {
return true;
}
}
public void start() throws SdkRunException {
Boolean success = this.createWebsocket();
if (success) {
this.startFlag.set(true);
this.tractionManager.beginTraction(this.asrRequestContent.getStreamId());
}
}
public void write(byte[] data) throws SdkRunException {
if (!this.startFlag.get()) {
ReportService.ifLogMessage(this.getId(), "method " + this.adder.get() + " package please call start method!!", false);
throw new SdkRunException(Code.CODE_10002);
} else if (this.endFlag.get()) {
ReportService.ifLogMessage(this.getId(), "method " + this.adder.get() + " can`t write,because you call stop method or send message fail", false);
throw new SdkRunException(Code.CODE_10003);
} else if (!this.isConnect) {
ReportService.ifLogMessage(this.getId(), "method " + this.adder.get() + " client is closing", false);
throw new SdkRunException(Code.CODE_10004);
} else {
ReportService.ifLogMessage(this.getId(), "send " + this.adder.get() + " package", false);
boolean success = this.webSocket.send(ByteString.of(data));
ReportService.ifLogMessage(this.getId(), "send " + this.adder.get() + " package " + success, false);
this.adder.incrementAndGet();
if (!success) {
for(int i = 0; i < SpeechRecognitionSysConfig.retryRequestNum; ++i) {
success = this.webSocket.send(ByteString.of(data));
if (success) {
break;
}
}
}
}
}
private void write(String data) {
if (!this.endFlag.get()) {
ReportService.ifLogMessage(this.getId(), "send " + this.adder.get() + " end package", false);
this.adder.incrementAndGet();
this.webSocket.send(data);
}
}
public Boolean stop() {
if (this.endFlag.get()) {
return true;
} else {
this.write(JsonUtil.toJson(MapUtil.builder().put("type", "end").build()));
this.endFlag.set(true);
try {
this.closeLatch.await((long)SpeechRecognitionSysConfig.wsStopMethodWait, TimeUnit.SECONDS);
} catch (InterruptedException var2) {
var2.printStackTrace();
ReportService.ifLogMessage(this.getId(), "stop_exception:" + var2.getMessage(), false);
}
return true;
}
}
private WebSocketListener createWebSocketListener() {
return new WebSocketListener() {
public void onClosed(WebSocket webSocket, int code, String reason) {
super.onClosed(webSocket, code, reason);
ReportService.ifLogMessage(SpeechWsRecognizer.this.getId(), "ws onClosed" + reason, false);
SpeechWsRecognizer.this.isConnect = false;
SpeechWsRecognizer.this.countDownStop("onClosed");
}
public void onClosing(WebSocket webSocket, int code, String reason) {
super.onClosing(webSocket, code, reason);
ReportService.ifLogMessage(SpeechWsRecognizer.this.getId(), "ws onClosing", false);
SpeechWsRecognizer.this.isConnect = false;
SpeechWsRecognizer.this.countDownStop("onClosing");
}
public void onFailure(WebSocket webSocket, Throwable t, Response response) {
try {
SpeechWsRecognizer.this.isConnect = false;
SpeechWsRecognizer.this.countDownStart("onFailure");
SpeechWsRecognizer.this.countDownStop("onFailure");
String trace = Tutils.getStackTrace(t);
if (!StringUtils.contains(trace, "Socket closed") && !SpeechWsRecognizer.this.endFlag.get()) {
ReportService.ifLogMessage(SpeechWsRecognizer.this.getId(), "onFailure:" + trace, true);
SpeechRecognitionResponse rs = new SpeechRecognitionResponse();
rs.setCode(Code.EXCEPTION.getCode());
rs.setMessage(trace);
rs.setStreamId(SpeechWsRecognizer.this.asrRequestContent.getStreamId());
rs.setVoiceId(SpeechWsRecognizer.this.asrRequestContent.getVoiceId());
ReportService.ifLogMessage(SpeechWsRecognizer.this.getId(), "onFailure", false);
ReportService.report(false, String.valueOf(rs.getCode()), SpeechWsRecognizer.this.asrConfig, SpeechWsRecognizer.this.getId(), SpeechWsRecognizer.this.asrRequest, rs, SpeechWsRecognizer.this.asrConfig.getWsUrl(), t.getMessage());
SpeechWsRecognizer.this.listener.onFail(rs);
}
} catch (Throwable var6) {
throw var6;
}
}
public void onMessage(WebSocket webSocket, String text) {
try {
super.onMessage(webSocket, text);
ReportService.ifLogMessage(SpeechWsRecognizer.this.getId(), "onMessage:" + text, false);
SpeechRecognitionResponse response = (SpeechRecognitionResponse)JsonUtil.fromJson(text, SpeechRecognitionResponse.class);
if (SpeechWsRecognizer.this.listener != null && response != null) {
SpeechWsRecognizer.this.listener.onMessage(response);
if (response.getCode() == 0) {
SpeechWsRecognizer.this.resultCallBack(response);
ReportService.report(true, String.valueOf(response.getCode()), SpeechWsRecognizer.this.asrConfig, SpeechWsRecognizer.this.getId(), SpeechWsRecognizer.this.asrRequest, response, SpeechWsRecognizer.this.asrConfig.getWsUrl(), response.getMessage());
} else {
ReportService.report(false, String.valueOf(response.getCode()), SpeechWsRecognizer.this.asrConfig, SpeechWsRecognizer.this.getId(), SpeechWsRecognizer.this.asrRequest, response, SpeechWsRecognizer.this.asrConfig.getWsUrl(), response.getMessage());
response.setStreamId(SpeechWsRecognizer.this.asrRequestContent.getStreamId());
response.setVoiceId(SpeechWsRecognizer.this.asrRequestContent.getVoiceId());
SpeechWsRecognizer.this.endFlag.set(true);
SpeechWsRecognizer.this.listener.onFail(response);
}
}
} catch (Throwable var4) {
throw var4;
}
}
public void onMessage(WebSocket webSocket, ByteString bytes) {
super.onMessage(webSocket, bytes);
}
public void onOpen(WebSocket webSocket, Response response) {
super.onOpen(webSocket, response);
ReportService.ifLogMessage(SpeechWsRecognizer.this.getId(), "onOpen:" + JsonUtil.toJson(response), false);
SpeechWsRecognizer.this.isConnect = response.code() == 101;
if (!SpeechWsRecognizer.this.isConnect) {
ReportService.ifLogMessage(SpeechWsRecognizer.this.getId(), "onOpen: fail", false);
webSocket.close(1001, "onOpen");
}
SpeechWsRecognizer.this.countDownStart("onOpen");
if (SpeechWsRecognizer.this.listener != null) {
SpeechRecognitionResponse recognitionResponse = new SpeechRecognitionResponse();
recognitionResponse.setCode(0);
recognitionResponse.setStreamId(SpeechWsRecognizer.this.asrRequestContent.getStreamId());
recognitionResponse.setFinalSpeech(0);
recognitionResponse.setVoiceId(SpeechWsRecognizer.this.asrRequestContent.getVoiceId());
recognitionResponse.setMessage("success");
SpeechWsRecognizer.this.listener.onRecognitionStart(recognitionResponse);
}
}
};
}
private void resultCallBack(SpeechRecognitionResponse response) {
response.setStreamId(this.asrRequestContent.getStreamId());
if (response.getFinalSpeech() == null) {
response.setFinalSpeech(0);
}
SpeechRecognitionResponse beginResp;
if (response.getResult() != null && this.listener != null) {
if (response.getResult().getSliceType() == 0) {
this.begin = true;
this.listener.onSentenceBegin(response);
} else if (response.getResult().getSliceType() == 2) {
if (!this.begin) {
beginResp = (SpeechRecognitionResponse)JsonUtil.fromJson(JsonUtil.toJson(response), SpeechRecognitionResponse.class);
beginResp.getResult().setSliceType(0);
this.listener.onSentenceBegin(beginResp);
}
this.begin = false;
this.listener.onSentenceEnd(response);
} else {
this.listener.onRecognitionResultChange(response);
}
}
if (response.getFinalSpeech() != null && response.getFinalSpeech() == 1) {
if (this.listener != null) {
beginResp = new SpeechRecognitionResponse();
beginResp.setCode(0);
beginResp.setVoiceId(this.asrRequestContent.getVoiceId());
beginResp.setFinalSpeech(1);
beginResp.setStreamId(this.asrRequestContent.getStreamId());
beginResp.setMessage("success");
beginResp.setMessageId(response.getMessageId());
this.listener.onRecognitionComplete(beginResp);
}
this.countDownStop("final");
this.webSocket.cancel();
}
}
private String getId() {
return this.asrRequestContent.getStreamId() + "_" + this.asrRequestContent.getVoiceId();
}
private void reconnect(byte[] data) {
if (!this.endFlag.get()) {
if (this.connectNum <= this.reConnectMaxNum) {
try {
Thread.sleep(10L);
this.write(data);
++this.connectNum;
} catch (InterruptedException var3) {
var3.printStackTrace();
}
}
}
}
private void countDownStop(String source) {
try {
if (this.closeLatch.getCount() > 0L) {
this.closeLatch.countDown();
ReportService.ifLogMessage(this.asrRequestContent.getVoiceId(), source + "_closeLatch_countDown", false);
}
} catch (Exception var3) {
ReportService.ifLogMessage(this.asrRequestContent.getVoiceId(), source + "_closeLatch_exception" + var3.getMessage(), true);
}
}
private void countDownStart(String source) {
try {
if (this.startLatch.getCount() > 0L) {
this.startLatch.countDown();
ReportService.ifLogMessage(this.asrRequestContent.getVoiceId(), source + "_startLatch_countDown", false);
}
} catch (Exception var3) {
ReportService.ifLogMessage(this.asrRequestContent.getVoiceId(), source + " _startLatch_countDown" + var3.getMessage(), true);
}
}
}
这是我的一个模拟测试,详细代码后续补充