【Simple Audio Recognition】tensorflow android demo 语音部分源码记录分析

最新推荐文章于 2023-08-15 17:04:18 发布

m2008h1

最新推荐文章于 2023-08-15 17:04:18 发布

阅读量386

点赞数 2

分类专栏： AI 文章标签： speech recognition android audio java tensorflow

本文链接：https://blog.csdn.net/m2008h1/article/details/85244605

版权

AI 专栏收录该内容

5 篇文章 0 订阅

订阅专栏

1. 语音数据流的处理

完整页面源码：https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/android/src/org/tensorflow/demo/SpeechActivity.java

在录音的线程中，代码片段如下

    // Loop, gathering audio data and copying it to a round-robin buffer.
    while (shouldContinue) {
      int numberRead = record.read(audioBuffer, 0, audioBuffer.length);
      int maxLength = recordingBuffer.length;
      int newRecordingOffset = recordingOffset + numberRead;
      int secondCopyLength = Math.max(0, newRecordingOffset - maxLength);
      int firstCopyLength = numberRead - secondCopyLength;
      // We store off all the data for the recognition thread to access. The ML
      // thread will copy out of this buffer into its own, while holding the
      // lock, so this should be thread safe.
      recordingBufferLock.lock();
      try {
        System.arraycopy(audioBuffer, 0, recordingBuffer, recordingOffset, firstCopyLength);
        System.arraycopy(audioBuffer, firstCopyLength, recordingBuffer, 0, secondCopyLength);
        recordingOffset = newRecordingOffset % maxLength;
      } finally {
        recordingBufferLock.unlock();
      }
    }

数据拷贝过程如下

在识别的线程中，代码片段如下

    // Loop, grabbing recorded data and running the recognition model on it.
    while (shouldContinueRecognition) {
      // The recording thread places data in this round-robin buffer, so lock to
      // make sure there's no writing happening and then copy it to our own
      // local version.
      recordingBufferLock.lock();
      try {
        int maxLength = recordingBuffer.length;
        int firstCopyLength = maxLength - recordingOffset;
        int secondCopyLength = recordingOffset;
        System.arraycopy(recordingBuffer, recordingOffset, inputBuffer, 0, firstCopyLength);
        System.arraycopy(recordingBuffer, 0, inputBuffer, firstCopyLength, secondCopyLength);
      } finally {
        recordingBufferLock.unlock();
      }
      ...
    }

数据拷贝过程如下

通过数据缓存的拷贝流程来看，录音时，数据流是连续不断的，而每次拷贝的录音数据和上一次是可能存在重复的，这保证了录音数据的延续性。因为在录音时，我们并不知道有效信息会何时出现，并且还需要保持有效信息的完整性，这样的处理基本能保证捕捉到关键数据。

2. 识别结果的判断

完整页面源码：

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/android/src/org/tensorflow/demo/RecognizeCommands.java

1所述的数据存储机制，带来了数据不会遗漏的好处，相对的，在进行语音识别时，基于数据处理时保留的延续性，我们可能会产生很多个相同的重复识别结果，这需要有另外一个处理流程来保证识别结果的精确性。具体的处理过程的参看下列描述：

调用处理过程的代码如下（在1所述源码链接中）。当命令的名字不以"_"开始，如"_silence"或"_unkonw"，且这是一个时效较新的命令时，判断语音命令有效。

final RecognizeCommands.RecognitionResult result = recognizeCommands.processLatestResults(outputScores, currentTime);
if (!result.foundCommand.startsWith("_") && result.isNewCommand) {
    //此时识别到了一个有效的命令
    ...
}

判断命令是否为一个新命令的过程如下：(具体可查看源码中的"processLatestResults"方法)

a. 判断数据是否存在异常，如类别数目不对以及命令的产生未按照时间顺序

    if (currentResults.length != labelsCount) {
      throw new RuntimeException(
          "The results for recognition should contain "
              + labelsCount
              + " elements, but there are "
              + currentResults.length);
    }

    if ((!previousResults.isEmpty()) && (currentTimeMS < previousResults.getFirst().first)) {
      throw new RuntimeException(
          "You must feed results in increasing time order, but received a timestamp of "
              + currentTimeMS
              + " that was earlier than the previous one of "
              + previousResults.getFirst().first);
    }

b. 查看历史队列中的命令数目，如果存在历史数据，并且最新一条的历史数据与当前命令相差时间小于“minimumTimeBetweenSamplesMs”(源码中使用30ms)，则判断当前命令不是一条新命令。

备注：previousResults是一个定义为"Deque<Pair<Long, float[]>>"的双向队列，用于存储接收到的命令数据。

    final int howManyResults = previousResults.size();
    // Ignore any results that are coming in too frequently.
    if (howManyResults > 1) {
      final long timeSinceMostRecent = currentTimeMS - previousResults.getLast().first;
      if (timeSinceMostRecent < minimumTimeBetweenSamplesMs) {
        return new RecognitionResult(previousTopLabel, previousTopLabelScore, false);
      }
    }

c. 添加当前命令到历史队列中，并清除与当前命令时间相差大于“averageWindowDurationMs”（源码中使用500ms）的命令。

    // Add the latest results to the head of the queue.
    previousResults.addLast(new Pair<Long, float[]>(currentTimeMS, currentResults));

    // Prune any earlier results that are too old for the averaging window.
    final long timeLimit = currentTimeMS - averageWindowDurationMs;
    while (previousResults.getFirst().first < timeLimit) {
      previousResults.removeFirst();
    }

d. 此时，若历史队列中的命令数量小于“minimumCount”（源码中使用3），则认为基于当前的信息，无法判断此条命令是否为新的命令

    // If there are too few results, assume the result will be unreliable and bail.
    final long earliestTime = previousResults.getFirst().first;
    final long samplesDuration = currentTimeMS - earliestTime;
    if ((howManyResults < minimumCount)
        || (samplesDuration < (averageWindowDurationMs / MINIMUM_TIME_FRACTION))) {
      Log.v("RecognizeResult", "Too few results");
      return new RecognitionResult(previousTopLabel, 0.0f, false);
    }

e. 基于历史队列中的所有命令数据进行计算，统计出各个label中平均分数最高的。

    // Calculate the average score across all the results in the window.
    float[] averageScores = new float[labelsCount];
    for (Pair<Long, float[]> previousResult : previousResults) {
      final float[] scoresTensor = previousResult.second;
      int i = 0;
      while (i < scoresTensor.length) {
        averageScores[i] += scoresTensor[i] / howManyResults;
        ++i;
      }
    }

    // Sort the averaged results in descending score order.
    ScoreForSorting[] sortedAverageScores = new ScoreForSorting[labelsCount];
    for (int i = 0; i < labelsCount; ++i) {
      sortedAverageScores[i] = new ScoreForSorting(averageScores[i], i);
    }
    Arrays.sort(sortedAverageScores);

f. 计算当前命令与上一条成功识别命令的时间间隔。

备注：“previousTopLabel”为上一个识别成功的命令的label，其初始值为"_silence"。“previousTopLabelTime”为上一个识别成功的命令的时间，其初始值为“Long.MIN_VALUE”

    // See if the latest top score is enough to trigger a detection.
    final int currentTopIndex = sortedAverageScores[0].index;
    final String currentTopLabel = labels.get(currentTopIndex);
    final float currentTopScore = sortedAverageScores[0].score;
    // If we've recently had another label trigger, assume one that occurs too
    // soon afterwards is a bad result.
    long timeSinceLastTop;
    if (previousTopLabel.equals(SILENCE_LABEL) || (previousTopLabelTime == Long.MIN_VALUE)) {
      timeSinceLastTop = Long.MAX_VALUE;
    } else {
      timeSinceLastTop = currentTimeMS - previousTopLabelTime;
    }

g. 如果历史队列中计算出最高平均分数大于“detectionThreshold”（源码中使用0.70f），且当前命令与上一条命令的时间间隔大于“suppressionMs”（源码中使用1500ms），则判定当前命令为新命令，识别结果为最高平均分数所对应的label，否则，当前命令无效。

    boolean isNewCommand;
    if ((currentTopScore > detectionThreshold) && (timeSinceLastTop > suppressionMs)) {
      previousTopLabel = currentTopLabel;
      previousTopLabelTime = currentTimeMS;
      previousTopLabelScore = currentTopScore;
      isNewCommand = true;
    } else {
      isNewCommand = false;
    }
    return new RecognitionResult(currentTopLabel, currentTopScore, isNewCommand);

综上，对于命令结果的判定，并不是基于某时某刻的模型推断分数而生成，而是需要在相当的一个时间段里基于好几条命令结果的平均统计而决定。其中，还需要对命令判断的重复度进行处理，避免出现同一个命令识别两次的问题。

基于上述过程中的具体数值，我们可以这样描述一个新命令的成功判别条件：当前命令时间之前的500ms内存在至少2条历史数据，并且他们的label（算上当前命令）平均分数中最高的分数超过0.7，及当前命令的时间距离上一条有效命令相差大于1500ms。

m2008h1

关注

2
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
【Simple Audio Recognition】tensorflow android demo 语音部分源码记录分析

1. 语音数据流的处理完整页面源码：https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/android/src/org/tensorflow/demo/SpeechActivity.java在录音的线程中，代码片段如下 // Loop, gathering audio data and...
复制链接

扫一扫