实时实时语音识别(websocket)接入-腾讯云

预期结果:实时语音文字识别

三方功能支持:腾讯云语音识别

对接要求:在识别过程中,客户端持续上传 binary message 到后台,内容为音频流二进制数据。建议每40ms 发送40ms 时长(即1:1实时率)的数据包,对应 pcm 大小为:8k 采样率640字节,16k 采样率1280字节。音频发送速率过快超过1:1实时率或者音频数据包之间发送间隔超过6秒,可能导致引擎出错,后台将返回错误并主动断开连接。

音频流上传完成之后,客户端需发送以下内容的 text message,通知后台结束识别。

拼装请求url:

传入参数:engine_model_type【引擎模型类型】16k_zh_dialect:多方言,

expired【签名的有效期截止时间,UNIX 时间戳,单位为秒】System.currentTimeMillis() / 1000L + 86400L,

needvad【语音分片长】1:开启 

nonce【随机正整数】RandomUtil.randomInt(1000, 99999);

timestamp【当前 UNIX 时间戳,单位为秒】,

secretid【密钥 】

voice_format【语音编码方式】1:pcm

voice_id【音频流识别全局唯一标识】AsrUtils.getVoiceId(asrConfig.getAppId())

signature【接口签名参数】

举例:wss://asr.cloud.tencent.com/asr/v2/1256841545?engine_model_type=16k_zh_dialect&expired=1685527216&needvad=1&nonce=37769&secretid=AKIDkHZbOtm7qKYu1ktrY0D9k6E6hfPdFIkx&timestamp=1685440816&voice_format=8&voice_id=1256841545_1685440816950_byc53&signature=Qv1YDDYCP7skMsASStxFuAVMa0w=

签名生成:

1、对除 signature 之外的所有参数按字典序进行排序,拼接请求 URL 作为签名原文,这里以 Appid=125922***SecretId=*****Qq1zhZMN8dv0****** 为例拼接签名原文,则拼接的签名原文为:

asr.cloud.tencent.com/asr/v2/125922***?engine_model_type=16k_zh&expired=1673494772&needvad=1&nonce=1673408372&secretid=*****Qq1zhZMN8dv0******&timestamp=1673408372&voice_format=1&voice_id=c64385ee-3e5c-4fc5-bbfd-7c71addb35b0

实现方法:

    private TreeMap<String, Object> getRequestParamMap(AsrConfig asrConfig, AsrRequest request, AsrRequestContent content) {
        TreeMap<String, Object> treeMap = new TreeMap();
        treeMap.put(TencentContents.SECRET_ID, asrConfig.getSecretId());
        treeMap.put(TencentContents.ENGINE_MODEL_TYPE, request.getEngineModelType());
        treeMap.put(TencentContents.VOICE_ID, content.getVoiceId());
        treeMap.put(TencentContents.VOICE_FORMAT, request.getVoiceFormat());
        treeMap.put(TencentContents.TIMESTAMP, request.getTimestamp());
        treeMap.put(TencentContents.EXPIRED, request.getExpired());
        treeMap.put(TencentContents.NONCE, request.getNonce());
        treeMap.put(TencentContents.NEED_VAD, request.getNeedVad());
        return treeMap;
    }

    private TreeMap<String, Object> getWsParams(AsrConfig asrConfig, AsrRequest request, AsrRequestContent content) {
        TreeMap<String, Object> treeMap = this.getRequestParamMap(asrConfig, request, content);
        if (request.getExtendsParam() != null) {
            Iterator var5 = request.getExtendsParam().entrySet().iterator();
            while (var5.hasNext()) {
                Map.Entry<String, Object> entry = (Map.Entry) var5.next();
                treeMap.put(entry.getKey(), entry.getValue());
            }
        }
        return treeMap;
    }

    public static String createUrl(Map<String, Object> paramMap) {
        StringBuilder sb = new StringBuilder();
        sb.append("?");
        Iterator var2 = paramMap.entrySet().iterator();

        while(var2.hasNext()) {
            Map.Entry<String, Object> entry = (Map.Entry)var2.next();
            if (entry.getValue() != null && entry.getValue() != "") {
                sb.append((String)entry.getKey());
                sb.append('=');
                sb.append(entry.getValue());
                sb.append('&');
            }
        }

        if (paramMap.size() > 0) {
            sb.setLength(sb.length() - 1);
        }

        return sb.toString();
    }

String signUrl = new StringBuilder().append(asrConfig.getWsSignUrl()).append(asrConfig.getAppId()).append(paramUrl).toString();

   public AsrConfig(String appId, String secretKey, String secretId, Long waitTime, String realAsrUrl, String signUrl, String logUrl, String wsUrl, String token) {
        super(secretId, secretKey, Long.valueOf(appId), token);
        this.realAsrUrl = (String)Optional.ofNullable(realAsrUrl).orElse("https://asr.cloud.tencent.com/asr/v1/");
        this.signUrl = (String)Optional.ofNullable(signUrl).orElse("asr.cloud.tencent.com/asr/v1/");
        this.logUrl = (String)Optional.ofNullable(logUrl).orElse("https://asr.tencentcloudapi.com/");
        this.wsUrl = (String)Optional.ofNullable(wsUrl).orElse("wss://asr.cloud.tencent.com/asr/v2/");
        this.wsSignUrl = "asr.cloud.tencent.com/asr/v2/";
        this.flashUrl = "https://asr.cloud.tencent.com/asr/flash/v1/";
        this.flashSignUrl = "asr.cloud.tencent.com/asr/flash/v1/";
        this.waitTime = (Long)Optional.ofNullable(waitTime).orElse(6000L);
    }

2、对签名原文使用 SecretKey 进行 HmacSha1 加密,之后再进行 base64 编码。例如对上一步的签名原文, SecretKey=*****SkqpeHgqmSz*****,使用 HmacSha1 算法进行加密并做 base64 编码处理:

Base64Encode(HmacSha1("asr.cloud.tencent.com/asr/v2/125922***?engine_model_type=16k_zh&expired=1673494772&needvad=1&nonce=1673408372&secretid=*****Qq1zhZMN8dv0******&timestamp=1673408372&voice_format=1&voice_id=c64385ee-3e5c-4fc5-bbfd-7c71addb35b0", "*****SkqpeHgqmSz*****"))

得到 signature 签名值为:G8jDQBRg1JfeBi/YnTjyjekxfDA=

代码:

    public static String base64_hmac_sha1(String originalText, String secretKey) {
        try {
            Mac hmac = Mac.getInstance("HmacSHA1");
            hmac.init(new SecretKeySpec(secretKey.getBytes("UTF-8"), "HmacSHA1"));
            byte[] hash = hmac.doFinal(originalText.getBytes("UTF-8"));
            return Base64.encodeBase64String(hash);
        } catch (Exception var4) {
            var4.printStackTrace();
            return "";
        }
    }

4、将 signature 值进行 urlencode(必须进行 URL 编码,否则将导致鉴权失败偶现 )后拼接得到最终请求 URL 为:

wss://asr.cloud.tencent.com/asr/v2/1259228442?engine_model_type=16k_zh&expired=1592380492&filter_dirty=1&filter_modal=1&filter_punc=1&needvad=1&nonce=1592294092123&secretid=AKIDoQq1zhZMN8dv0psmvud6OUKuGPO7pu0r&timestamp=1592294092&voice_format=1&voice_id=RnKu9FODFHK5FPpsrN&signature=HepdTRX6u155qIPKNKC%2B3U0j1N0%3D

websocket调用代码:

 /**
     * 单独执行
     *
     * @param client SpeechClient
     */
    public static void runOnce(final SpeechClient client) {
        try {
            //案例使用文件模拟实时获取语音流,用户使用可直接调用write传入字节数据

            FileInputStream fileInputStream = new FileInputStream(new File("E:\\CloudMusic\\电台节目\\365读书 - 钱钟书:谈教训.mp3"));
//            FileInputStream fileInputStream = new FileInputStream(new File("E:\\Download\\珍惜-孙露.mp3"));
            //http 建议每次传输200ms数据   websocket建议每次传输40ms数据
            List<byte[]> speechData = ByteUtils.subToSmallBytes(fileInputStream,
                    SpeechRecognitionSysConfig.requestWay == AsrConstant.RequestWay.Http ? 6400 : 640);
            //请求参数,用于配置语音识别相关参数,可使用init方法进行默认配置或使用 builder的方式构建自定义参数
            SpeechRecognitionRequest request = SpeechRecognitionRequest.initialize();
            request.setEngineModelType("16k_zh_dialect"); //模型类型为必传参数,否则异常
            request.setVoiceFormat(8);  //指定音频格式
            SpeechRecognizer speechWsRecognizer = client.newSpeechRecognizer(request, new MySpeechRecognitionListener());
            //开始识别 调用start方法
            speechWsRecognizer.start();
            for (int i = 0; i < speechData.size(); i++) {
                //模拟音频间隔
                Thread.sleep(SpeechRecognitionSysConfig.requestWay == AsrConstant.RequestWay.Http ? 200 : 20);
                //发送数据
                speechWsRecognizer.write(speechData.get(i));
            }
            //结束识别调用stop方法
            speechWsRecognizer.stop();
            fileInputStream.close();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

public class SpeechWsRecognizer implements SpeechRecognizer {
    protected AsrConfig asrConfig;
    protected SpeechRecognitionRequest asrRequest;
    protected AsrRequestContent asrRequestContent;
    protected SpeechRecognitionListener listener;
    protected WebSocket webSocket;
    protected int reConnectMaxNum = 10;
    protected int connectNum = 0;
    protected volatile boolean isConnect = false;
    protected volatile AtomicBoolean endFlag = new AtomicBoolean(false);
    protected volatile AtomicBoolean startFlag = new AtomicBoolean(false);
    protected SpeechRecognitionSignService speechRecognitionSignService = new SpeechRecognitionSignService();
    private ReentrantLock lock = new ReentrantLock();
    private final CountDownLatch startLatch = new CountDownLatch(1);
    private final CountDownLatch closeLatch = new CountDownLatch(1);
    private boolean begin = false;
    private AtomicLong adder = new AtomicLong(0L);
    private TractionManager tractionManager;
    private WsClientService wsClientService;

    public SpeechWsRecognizer(WsClientService wsClientService, String streamId, AsrConfig config, SpeechRecognitionRequest request, SpeechRecognitionListener listener) {
        this.wsClientService = wsClientService;
        this.asrConfig = config;
        this.asrRequest = request;
        if (StringUtils.isEmpty(request.getVoiceId())) {
            request.setVoiceId(AsrUtils.getVoiceId(config.getAppId()));
        }

        this.asrRequestContent = AsrRequestContent.builder().seq(0).end(0).streamId(streamId).voiceId(request.getVoiceId()).build();
        this.listener = listener;
        this.tractionManager = new TractionManager(config.getAppId());
    }

    private Boolean createWebsocket() throws SdkRunException {
        if (!this.isConnect || this.webSocket == null) {
            Boolean var2;
            try {
                this.lock.lock();
                if (this.isConnect && this.webSocket != null) {
                    return true;
                }

                ReportService.ifLogMessage(this.getId(), "create websocket", false);
                this.asrRequest.setTimestamp(System.currentTimeMillis() / 1000L);
                this.asrRequest.setExpired(System.currentTimeMillis() / 1000L + 86400L);
                String paramUrl = SignHelper.createUrl(this.speechRecognitionSignService.getWsParams(this.asrConfig, this.asrRequest, this.asrRequestContent));
                String signUrl = this.asrConfig.getWsSignUrl() + this.asrConfig.getAppId() + paramUrl;
                String sign = SignBuilder.base64_hmac_sha1(signUrl, this.asrConfig.getSecretKey());
                String url = this.asrConfig.getWsUrl() + this.asrConfig.getAppId() + paramUrl;
                WebSocketListener webSocketListener = this.createWebSocketListener();
                this.webSocket = this.wsClientService.asrWebSocket(this.asrConfig.getToken(), url, sign, webSocketListener);
                this.isConnect = true;
                boolean countDown = this.startLatch.await((long)SpeechRecognitionSysConfig.wsStartMethodWait, TimeUnit.SECONDS);
                if (!countDown) {
                    throw new SdkRunException(Code.CODE_10001);
                }

                return true;
            } catch (Exception var10) {
                var10.printStackTrace();
                var2 = false;
            } finally {
                this.lock.unlock();
            }

            return var2;
        } else {
            return true;
        }
    }

    public void start() throws SdkRunException {
        Boolean success = this.createWebsocket();
        if (success) {
            this.startFlag.set(true);
            this.tractionManager.beginTraction(this.asrRequestContent.getStreamId());
        }

    }

    public void write(byte[] data) throws SdkRunException {
        if (!this.startFlag.get()) {
            ReportService.ifLogMessage(this.getId(), "method " + this.adder.get() + " package please call start method!!", false);
            throw new SdkRunException(Code.CODE_10002);
        } else if (this.endFlag.get()) {
            ReportService.ifLogMessage(this.getId(), "method " + this.adder.get() + " can`t write,because you call stop method or send message fail", false);
            throw new SdkRunException(Code.CODE_10003);
        } else if (!this.isConnect) {
            ReportService.ifLogMessage(this.getId(), "method " + this.adder.get() + " client is closing", false);
            throw new SdkRunException(Code.CODE_10004);
        } else {
            ReportService.ifLogMessage(this.getId(), "send " + this.adder.get() + " package", false);
            boolean success = this.webSocket.send(ByteString.of(data));
            ReportService.ifLogMessage(this.getId(), "send " + this.adder.get() + " package " + success, false);
            this.adder.incrementAndGet();
            if (!success) {
                for(int i = 0; i < SpeechRecognitionSysConfig.retryRequestNum; ++i) {
                    success = this.webSocket.send(ByteString.of(data));
                    if (success) {
                        break;
                    }
                }
            }

        }
    }

    private void write(String data) {
        if (!this.endFlag.get()) {
            ReportService.ifLogMessage(this.getId(), "send " + this.adder.get() + " end package", false);
            this.adder.incrementAndGet();
            this.webSocket.send(data);
        }

    }

    public Boolean stop() {
        if (this.endFlag.get()) {
            return true;
        } else {
            this.write(JsonUtil.toJson(MapUtil.builder().put("type", "end").build()));
            this.endFlag.set(true);

            try {
                this.closeLatch.await((long)SpeechRecognitionSysConfig.wsStopMethodWait, TimeUnit.SECONDS);
            } catch (InterruptedException var2) {
                var2.printStackTrace();
                ReportService.ifLogMessage(this.getId(), "stop_exception:" + var2.getMessage(), false);
            }

            return true;
        }
    }

    private WebSocketListener createWebSocketListener() {
        return new WebSocketListener() {
            public void onClosed(WebSocket webSocket, int code, String reason) {
                super.onClosed(webSocket, code, reason);
                ReportService.ifLogMessage(SpeechWsRecognizer.this.getId(), "ws onClosed" + reason, false);
                SpeechWsRecognizer.this.isConnect = false;
                SpeechWsRecognizer.this.countDownStop("onClosed");
            }

            public void onClosing(WebSocket webSocket, int code, String reason) {
                super.onClosing(webSocket, code, reason);
                ReportService.ifLogMessage(SpeechWsRecognizer.this.getId(), "ws onClosing", false);
                SpeechWsRecognizer.this.isConnect = false;
                SpeechWsRecognizer.this.countDownStop("onClosing");
            }

            public void onFailure(WebSocket webSocket, Throwable t, Response response) {
                try {
                    SpeechWsRecognizer.this.isConnect = false;
                    SpeechWsRecognizer.this.countDownStart("onFailure");
                    SpeechWsRecognizer.this.countDownStop("onFailure");
                    String trace = Tutils.getStackTrace(t);
                    if (!StringUtils.contains(trace, "Socket closed") && !SpeechWsRecognizer.this.endFlag.get()) {
                        ReportService.ifLogMessage(SpeechWsRecognizer.this.getId(), "onFailure:" + trace, true);
                        SpeechRecognitionResponse rs = new SpeechRecognitionResponse();
                        rs.setCode(Code.EXCEPTION.getCode());
                        rs.setMessage(trace);
                        rs.setStreamId(SpeechWsRecognizer.this.asrRequestContent.getStreamId());
                        rs.setVoiceId(SpeechWsRecognizer.this.asrRequestContent.getVoiceId());
                        ReportService.ifLogMessage(SpeechWsRecognizer.this.getId(), "onFailure", false);
                        ReportService.report(false, String.valueOf(rs.getCode()), SpeechWsRecognizer.this.asrConfig, SpeechWsRecognizer.this.getId(), SpeechWsRecognizer.this.asrRequest, rs, SpeechWsRecognizer.this.asrConfig.getWsUrl(), t.getMessage());
                        SpeechWsRecognizer.this.listener.onFail(rs);
                    }
                } catch (Throwable var6) {
                    throw var6;
                }
            }

            public void onMessage(WebSocket webSocket, String text) {
                try {
                    super.onMessage(webSocket, text);
                    ReportService.ifLogMessage(SpeechWsRecognizer.this.getId(), "onMessage:" + text, false);
                    SpeechRecognitionResponse response = (SpeechRecognitionResponse)JsonUtil.fromJson(text, SpeechRecognitionResponse.class);
                    if (SpeechWsRecognizer.this.listener != null && response != null) {
                        SpeechWsRecognizer.this.listener.onMessage(response);
                        if (response.getCode() == 0) {
                            SpeechWsRecognizer.this.resultCallBack(response);
                            ReportService.report(true, String.valueOf(response.getCode()), SpeechWsRecognizer.this.asrConfig, SpeechWsRecognizer.this.getId(), SpeechWsRecognizer.this.asrRequest, response, SpeechWsRecognizer.this.asrConfig.getWsUrl(), response.getMessage());
                        } else {
                            ReportService.report(false, String.valueOf(response.getCode()), SpeechWsRecognizer.this.asrConfig, SpeechWsRecognizer.this.getId(), SpeechWsRecognizer.this.asrRequest, response, SpeechWsRecognizer.this.asrConfig.getWsUrl(), response.getMessage());
                            response.setStreamId(SpeechWsRecognizer.this.asrRequestContent.getStreamId());
                            response.setVoiceId(SpeechWsRecognizer.this.asrRequestContent.getVoiceId());
                            SpeechWsRecognizer.this.endFlag.set(true);
                            SpeechWsRecognizer.this.listener.onFail(response);
                        }
                    }

                } catch (Throwable var4) {
                    throw var4;
                }
            }

            public void onMessage(WebSocket webSocket, ByteString bytes) {
                super.onMessage(webSocket, bytes);
            }

            public void onOpen(WebSocket webSocket, Response response) {
                super.onOpen(webSocket, response);
                ReportService.ifLogMessage(SpeechWsRecognizer.this.getId(), "onOpen:" + JsonUtil.toJson(response), false);
                SpeechWsRecognizer.this.isConnect = response.code() == 101;
                if (!SpeechWsRecognizer.this.isConnect) {
                    ReportService.ifLogMessage(SpeechWsRecognizer.this.getId(), "onOpen: fail", false);
                    webSocket.close(1001, "onOpen");
                }

                SpeechWsRecognizer.this.countDownStart("onOpen");
                if (SpeechWsRecognizer.this.listener != null) {
                    SpeechRecognitionResponse recognitionResponse = new SpeechRecognitionResponse();
                    recognitionResponse.setCode(0);
                    recognitionResponse.setStreamId(SpeechWsRecognizer.this.asrRequestContent.getStreamId());
                    recognitionResponse.setFinalSpeech(0);
                    recognitionResponse.setVoiceId(SpeechWsRecognizer.this.asrRequestContent.getVoiceId());
                    recognitionResponse.setMessage("success");
                    SpeechWsRecognizer.this.listener.onRecognitionStart(recognitionResponse);
                }

            }
        };
    }

    private void resultCallBack(SpeechRecognitionResponse response) {
        response.setStreamId(this.asrRequestContent.getStreamId());
        if (response.getFinalSpeech() == null) {
            response.setFinalSpeech(0);
        }

        SpeechRecognitionResponse beginResp;
        if (response.getResult() != null && this.listener != null) {
            if (response.getResult().getSliceType() == 0) {
                this.begin = true;
                this.listener.onSentenceBegin(response);
            } else if (response.getResult().getSliceType() == 2) {
                if (!this.begin) {
                    beginResp = (SpeechRecognitionResponse)JsonUtil.fromJson(JsonUtil.toJson(response), SpeechRecognitionResponse.class);
                    beginResp.getResult().setSliceType(0);
                    this.listener.onSentenceBegin(beginResp);
                }

                this.begin = false;
                this.listener.onSentenceEnd(response);
            } else {
                this.listener.onRecognitionResultChange(response);
            }
        }

        if (response.getFinalSpeech() != null && response.getFinalSpeech() == 1) {
            if (this.listener != null) {
                beginResp = new SpeechRecognitionResponse();
                beginResp.setCode(0);
                beginResp.setVoiceId(this.asrRequestContent.getVoiceId());
                beginResp.setFinalSpeech(1);
                beginResp.setStreamId(this.asrRequestContent.getStreamId());
                beginResp.setMessage("success");
                beginResp.setMessageId(response.getMessageId());
                this.listener.onRecognitionComplete(beginResp);
            }

            this.countDownStop("final");
            this.webSocket.cancel();
        }

    }

    private String getId() {
        return this.asrRequestContent.getStreamId() + "_" + this.asrRequestContent.getVoiceId();
    }

    private void reconnect(byte[] data) {
        if (!this.endFlag.get()) {
            if (this.connectNum <= this.reConnectMaxNum) {
                try {
                    Thread.sleep(10L);
                    this.write(data);
                    ++this.connectNum;
                } catch (InterruptedException var3) {
                    var3.printStackTrace();
                }
            }

        }
    }

    private void countDownStop(String source) {
        try {
            if (this.closeLatch.getCount() > 0L) {
                this.closeLatch.countDown();
                ReportService.ifLogMessage(this.asrRequestContent.getVoiceId(), source + "_closeLatch_countDown", false);
            }
        } catch (Exception var3) {
            ReportService.ifLogMessage(this.asrRequestContent.getVoiceId(), source + "_closeLatch_exception" + var3.getMessage(), true);
        }

    }

    private void countDownStart(String source) {
        try {
            if (this.startLatch.getCount() > 0L) {
                this.startLatch.countDown();
                ReportService.ifLogMessage(this.asrRequestContent.getVoiceId(), source + "_startLatch_countDown", false);
            }
        } catch (Exception var3) {
            ReportService.ifLogMessage(this.asrRequestContent.getVoiceId(), source + " _startLatch_countDown" + var3.getMessage(), true);
        }

    }
}

这是我的一个模拟测试,详细代码后续补充

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值