在 React Native 中使用 Whisper 进行语音识别

最新推荐文章于 2024-04-30 18:21:50 发布

pxr007

最新推荐文章于 2024-04-30 18:21:50 发布

阅读量608

点赞数

文章标签： react native whisper 语音识别

本文链接：https://blog.csdn.net/weixin_47967031/article/details/132755511

版权

在本文中，我们将使用 Whisper 创建语音转文本应用程序。Whisper需要Python后端，因此我们将使用Flask为应用程序创建服务器。

React Native 作为构建移动客户端的框架。我希望您喜欢创建此应用程序的过程，因为我确实这样做了。让我们直接深入研究它。

什么是语音识别？

语音识别使程序能够将人类语音处理成书面格式。语法、句法、结构和音频对于理解和处理人类语音至关重要。

语音识别算法是计算机科学中最复杂的领域之一。人工智能、机器学习、无监督预训练技术的发展，以及 Wav2Vec 2.0 等框架，这些框架在自我监督学习和从原始音频中学习方面是有效的，已经提高了它们的能力。

语音识别器由以下组件组成：

语音输入
一种解码器，它依赖于声学模型、发音词典和语言模型进行输出
输出一词

这些组件和技术进步使未在Windows10中查看RAM内存详细信息的5种方法标记语音的大型数据集的消费成为可能。预先训练的音频编码器能够学习高质量的语音表示;它们唯一的缺点是不受监督的性质。

什么是解码器？

高性能解码器将语音表示映射怎么给Windows 11完整系统备份？简单图文操作教程到可用输出。解码器解决了音频编码器的监控问题。但是，解码器限制了Wav2Vec等框架对语音识别的有效性。解码器使用起来可能非常复杂，需要熟练的从业者，特别是因为 Wav2Vec 2.0 等技术难以使用。

关键是要结合尽可能多的高质量语音识别数据集。以这种方式训练的模型比在单个源上训练的模型更有效。

什么是耳语？

Whisper或WSPR代表用于语音识别的Web级监督预训练。耳语模型接受训练，以便能够预测成绩单的文本。

Whisper 依靠序列到序列模型在话语及其转录形式之间进行映射，这使得语音识别管道更有效。Whisper带有一个音频语言检测器，这是一个在VoxLingua107上训练的微调模型。

Whisper数据集由音频与来自互联网的成绩单配对组成。数据集的质量通过使用自动筛选方法而提高。

设置耳语

要使用Whisper，我们需要依靠Pytho如何让Android看起来和感觉像iOS17？不换手机也能体验 iOSn作为我们的后端。Whisper 还需要命令行工具 ffmpeg，它使我们的应用程序能够录制、转换和流式传输音频和视频。

以下是在不同机器上安装 ffgmeg 的必要命令：

# on Ubuntu or Debian
sudo apt update && sudo apt install ffmpeg


# on Arch Linux
sudo pacman -S ffmpeg


# on MacOS using Homebrew (https://brew.sh/)
brew install ffmpeg


# on Windows using Chocolatey (https://chocolatey.org/)
choco install ffmpeg


# on Windows using Scoop (https://scoop.sh/)
scoop install ffmpeg

使用 Flask 创建后端应用程序

在本节中，我们将为应用创建后端服务。如何在 Windows 11 上标准化音量 Flask是一个用Python编写的Web框架。我选择将Flask用于此应用程序，因为它易于设置。

Flask开发团队建议使用最新版本的Python，尽管Flask仍然支持Python ≥ 3.7。

安装先决条件完成后，我们可以创建项目文件夹来保存客户端和后端应用程序。

mkdir translateWithWhisper && cd translateWithWhisper && mkdir backend && cd backend

Flask 利用虚拟环境来管理项目依赖关系;Python有一个开箱即用的venv模块来创建它们。

在终端窗口中使用以下命令创建文件夹。此文件夹包含我们的依赖项。venv

python3 -m venv venv

指定项目依赖项

使用文件指定必要的依赖项。该文件位于后端目录的根目录中。requirements.txt``requirements.txt

touch requirements.txt
code requirements.txt

将以下代码复制并粘贴到文件中：requirements.txt

numpy
tqdm
transformers>=4.19.0
ffmpeg-python==0.2.0
pyaudio
SpeechRecognition
pydub
git+https://github.com/openai/whisper.git
--extra-index-url https://download.pytorch.org/whl/cu113
torch
flask
flask_cors

创建 Bash shell 脚本以安装依赖项

在根项目目录中，创建一个 Bash shell 脚如何使用 UUP Dump 下载 Windows 11 ISO 文件？详细图文教程本文件。Bash 脚本处理 Flask 应用程序中依赖项的安装。

在根项目目录中，打开终端窗口。使用以下命令创建外壳脚本：

touch install_dependencies.sh
code install_dependencies.sh

将以下代码块复制并粘贴到文件中：install_dependencies.sh

# install and run backend
cd backend && python3 -m venv venv
source venv/Scripts/activate

pip install wheel
pip install -r requirements.txt

现在，在根目录中打开一个终端窗口并运行以下命令：

sh .\install_dependencies.sh

创建终端节点transcribe

现在，我们将在应用程序中创建一个终结点，该终结点将从客户端接收音频输入。应用程序将转录输入并将转录的文本返回给客户端。transcribe

此终结点接受请求并处理输入。当响应是 200 HTTP 响应时，客户端会收到转录的文本。POST

创建一个文件来保存用于处理输入的逻辑。打开一个新的终端窗口，在后端目录中创建一个文件：app.py``app.py

touch backend/app.py
code backend/app.py

将下面的代码块复制并粘贴到文件中：app.py

import os
import tempfile
import flask
from flask import request
from flask_cors import CORS
import whisper

app = flask.Flask(__name__)
CORS(app)

// endpoint for handling the transcribing of audio inputs
@app.route('/transcribe', methods=['POST'])
def transcribe():
    if request.method == 'POST
        language = request.form['language']
        model = request.form['model_size']

        # there are no english models for large
        if model != 'large' and language == 'english':
            model = model + '.en'
        audio_model = whisper.load_model(model)

        temp_dir = tempfile.mkdtemp()
        save_path = os.path.join(temp_dir, 'temp.wav')

        wav_file = request.files['audio_data']
        wav_file.save(save_path)

        if language == 'english':
            result = audio_model.transcribe(save_path, language='english')
        else:
            result = audio_model.transcribe(save_path)

        return result['text']
    else:
        return "This endpoint only processes POST wav blob"

运行烧瓶应用程序

在包含变量的已激活终端窗口中，运行以下命令以启动应用程序：venv

$ cd backend
$ flask run –port 8000

期望应用程序启动时没有任何错误。如果是这种情况，终端窗口中应显示以下结果：

That closes out the creation of our endpoint in our Flask application.transcribe

Hosting the server

To make network requests to the created HTTP endpoint in iOS, we’ll need to route to an HTTPS server. ngrok solves the issue of creating a re-route.

下载 ngrok，然后安装软件包并趣知笔记打开它。终端窗口启动;输入以下命令以使用 ngrok 托管服务器：

ngrok http 8000

ngrok 将生成一个托管 URL，该 URL 将在客户端应用程序中用于请求。

使用 React Native 创建语音识别移动应用程序

对于本教程的这一部分，您需要安装一些东西：

世博会 CLI：用于与世博会工具接口的命令行工具
适用于 Android 和 iOS 的 Expo Go 应用程序：用于打开通过 Expo CLI 提供的应用程序

在新的终端窗口中，初始化 React Native 项目：

npx create-expo-app client
cd client

现在，启动开发服务器：

npx expo start

要在iOS设备上打开应用程序，请打开相机并扫描终端上的QR码。在 Android 设备上，按 Expo Go 应用程序的“主页”选项卡上的扫描二维码。

我们的世博围棋应用程序

处理录音

Expo-av 在我们的应用程序中处理音频的录制。我们的 Flask 服务器需要文件格式的文件。expo-av 包允许我们在保存之前指定格式。.wav

在终端中安装必要的软件包：

yarn add axios expo-av react-native-picker-select

创建模型选择器

应用程序必须能够选择模型大小。有五个选项可供选择：

小
基础
小
中等
大

所选输入大小确定要在服务器上将输入与哪个模型进行比较。

再次在终端中，使用以下命令创建一个名为的文件夹和子文件夹：src``/components

mkdir src
mkdir src/components
touch src/components/Mode.tsx
code src/components/Mode.tsx

将代码块粘贴到文件中：Mode.tsx

import React from "react";
import { View, Text, StyleSheet } from "react-native";
import RNPickerSelect from "react-native-picker-select";

const Mode = ({
  onModelChange,
  transcribeTimeout,
  onTranscribeTimeoutChanged,
}: any) => {
  function onModelChangeLocal(value: any) {
    onModelChange(value);
  }

  function onTranscribeTimeoutChangedLocal(event: any) {
    onTranscribeTimeoutChanged(event.target.value);
  }

  return (
    <View>
      <Text style={styles.title}>Model Size</Text>
      <View style={{ flexDirection: "row" }}>
        <RNPickerSelect
          onValueChange={(value) => onModelChangeLocal(value)}
          useNativeAndroidPickerStyle={false}
          placeholder={{ label: "Select model", value: null }}
          items={[
            { label: "tiny", value: "tiny" },
            { label: "base", value: "base" },
            { label: "small", value: "small" },
            { label: "medium", value: "medium" },
            { label: "large", value: "large" },
          ]}
          style={customPickerStyles}
        />
      </View>
      <View>
        <Text style={styles.title}>Timeout :{transcribeTimeout}</Text>
      </View>
    </View>
  );
};

export default Mode;
const styles = StyleSheet.create({
  title: {
    fontWeight: "200",
    fontSize: 25,
    float: "left",
  },
});
const customPickerStyles = StyleSheet.create({
  inputIOS: {
    fontSize: 14,
    paddingVertical: 10,
    paddingHorizontal: 12,
    borderWidth: 1,
    borderColor: "green",
    borderRadius: 8,
    color: "black",
    paddingRight: 30, // to ensure the text is never behind the icon
  },
  inputAndroid: {
    fontSize: 14,
    paddingHorizontal: 10,
    paddingVertical: 8,
    borderWidth: 1,
    borderColor: "blue",
    borderRadius: 8,
    color: "black",
    paddingRight: 30, // to ensure the text is never behind the icon
  },
});

创建输出Transcribe

服务器返回带有文本的输出。此组件接收输出数据并显示它。

mkdir src
mkdir src/components
touch src/components/TranscribeOutput.tsx
code src/components/TranscribeOutput.tsx

将代码块粘贴到文件中：TranscribeOutput.tsx

import React from "react";
import { Text, View, StyleSheet } from "react-native";
const TranscribedOutput = ({
  transcribedText,
  interimTranscribedText,
}: any) => {
  if (transcribedText.length === 0 && interimTranscribedText.length === 0) {
    return <Text>...</Text>;
  }

  return (
    <View style={styles.box}>
      <Text style={styles.text}>{transcribedText}</Text>
      <Text>{interimTranscribedText}</Text>
    </View>
  );
};
const styles = StyleSheet.create({
  box: {
    borderColor: "black",
    borderRadius: 10,
    marginBottom: 0,
  },
  text: {
    fontWeight: "400",
    fontSize: 30,
  },
});

export default TranscribedOutput;

创建客户端功能

该应用程序依靠 Axios 从 Flask 服务器发送趣知笔记网站地图和接收数据;我们在前面的部分中安装了它。测试应用程序的默认语言是英语。

在文件中，导入必要的依赖项：App.tsx

import * as React from "react";
import {
  Text,
  StyleSheet,
  View,
  Button,
  ActivityIndicator,
} from "react-native";
import { Audio } from "expo-av";
import FormData from "form-data";
import axios from "axios";
import Mode from "./src/components/Mode";
import TranscribedOutput from "./src/components/TranscribeOutput";

创建状态变量

应用程序需要跟踪录制、转录数据、录制和正在进行的转录。默认情况下，语言、模型和超时在状态中设置。

export default () => {
  const [recording, setRecording] = React.useState(false as any);
  const [recordings, setRecordings] = React.useState([]);
  const [message, setMessage] = React.useState("");
  const [transcribedData, setTranscribedData] = React.useState([] as any);
  const [interimTranscribedData] = React.useState("");
  const [isRecording, setIsRecording] = React.useState(false);
  const [isTranscribing, setIsTranscribing] = React.useState(false);
  const [selectedLanguage, setSelectedLanguage] = React.useState("english");
  const [selectedModel, setSelectedModel] = React.useState(1);
  const [transcribeTimeout, setTranscribeTimout] = React.useState(5);
  const [stopTranscriptionSession, setStopTranscriptionSession] =
    React.useState(false);
  const [isLoading, setLoading] = React.useState(false);
  return (
    <View style={styles.root}></View>
)
}

const styles = StyleSheet.create({
  root: {
    display: "flex",
    flex: 1,
    alignItems: "center",
    textAlign: "center",
    flexDirection: "column",
  },
});

创建引用、语言和模型选项变量

useRef Hook 使我们能够跟踪当前初始化的属性。我们要设置转录会话、语言和模型。useRef

将代码块粘贴到挂钩下：setLoading``useState

  const [isLoading, setLoading] = React.useState(false);
  const intervalRef: any = React.useRef(null);

  const stopTranscriptionSessionRef = React.useRef(stopTranscriptionSession);
  stopTranscriptionSessionRef.current = stopTranscriptionSession;

  const selectedLangRef = React.useRef(selectedLanguage);
  selectedLangRef.current = selectedLanguage;

  const selectedModelRef = React.useRef(selectedModel);
  selectedModelRef.current = selectedModel;

  const supportedLanguages = [
    "english",
    "chinese",
    "german",
    "spanish",
    "russian",
    "korean",
    "french",
    "japanese",
    "portuguese",
    "turkish",
    "polish",
    "catalan",
    "dutch",
    "arabic",
    "swedish",
    "italian",
    "indonesian",
    "hindi",
    "finnish",
    "vietnamese",
    "hebrew",
    "ukrainian",
    "greek",
    "malay",
    "czech",
    "romanian",
    "danish",
    "hungarian",
    "tamil",
    "norwegian",
    "thai",
    "urdu",
    "croatian",
    "bulgarian",
    "lithuanian",
    "latin",
    "maori",
    "malayalam",
    "welsh",
    "slovak",
    "telugu",
    "persian",
    "latvian",
    "bengali",
    "serbian",
    "azerbaijani",
    "slovenian",
    "kannada",
    "estonian",
    "macedonian",
    "breton",
    "basque",
    "icelandic",
    "armenian",
    "nepali",
    "mongolian",
    "bosnian",
    "kazakh",
    "albanian",
    "swahili",
    "galician",
    "marathi",
    "punjabi",
    "sinhala",
    "khmer",
    "shona",
    "yoruba",
    "somali",
    "afrikaans",
    "occitan",
    "georgian",
    "belarusian",
    "tajik",
    "sindhi",
    "gujarati",
    "amharic",
    "yiddish",
    "lao",
    "uzbek",
    "faroese",
    "haitian creole",
    "pashto",
    "turkmen",
    "nynorsk",
    "maltese",
    "sanskrit",
    "luxembourgish",
    "myanmar",
    "tibetan",
    "tagalog",
    "malagasy",
    "assamese",
    "tatar",
    "hawaiian",
    "lingala",
    "hausa",
    "bashkir",
    "javanese",
    "sundanese",
  ];

  const modelOptions = ["tiny", "base", "small", "medium", "large"];
  React.useEffect(() => {
    return () => clearInterval(intervalRef.current);
  }, []);

  function handleTranscribeTimeoutChange(newTimeout: any) {
    setTranscribeTimout(newTimeout);
  }

创建录制函数

在本节中，我们将编写五个函数来处理音频听录。

函数startRecording

第一个函数是函数。此函数使应用程序能够请求使用麦克风的权限。所需的音频格式是预设的，我们有一个用于跟踪超时的参考：startRecording

  async function startRecording() {
    try {
      console.log("Requesting permissions..");
      const permission = await Audio.requestPermissionsAsync();
      if (permission.status === "granted") {
        await Audio.setAudioModeAsync({
          allowsRecordingIOS: true,
          playsInSilentModeIOS: true,
        });
        alert("Starting recording..");
        const RECORDING_OPTIONS_PRESET_HIGH_QUALITY: any = {
          android: {
            extension: ".mp4",
            outputFormat: Audio.RECORDING_OPTION_ANDROID_OUTPUT_FORMAT_MPEG_4,
            audioEncoder: Audio.RECORDING_OPTION_ANDROID_AUDIO_ENCODER_AMR_NB,
            sampleRate: 44100,
            numberOfChannels: 2,
            bitRate: 128000,
          },
          ios: {
            extension: ".wav",
            audioQuality: Audio.RECORDING_OPTION_IOS_AUDIO_QUALITY_MIN,
            sampleRate: 44100,
            numberOfChannels: 2,
            bitRate: 128000,
            linearPCMBitDepth: 16,
            linearPCMIsBigEndian: false,
            linearPCMIsFloat: false,
          },
        };
        const { recording }: any = await Audio.Recording.createAsync(
          RECORDING_OPTIONS_PRESET_HIGH_QUALITY
        );
        setRecording(recording);
        console.log("Recording started");
        setStopTranscriptionSession(false);
        setIsRecording(true);
        intervalRef.current = setInterval(
          transcribeInterim,
          transcribeTimeout * 1000
        );
        console.log("erer", recording);
      } else {
        setMessage("Please grant permission to app to access microphone");
      }
    } catch (err) {
      console.error(" Failed to start recording", err);
    }
  }

函数stopRecording

该功能使用户能够停止录制。状态变量存储并保存更新的记录。stopRecording``recording

  async function stopRecording() {
    console.log("Stopping recording..");
    setRecording(undefined);
    await recording.stopAndUnloadAsync();
    const uri = recording.getURI();
    let updatedRecordings = [...recordings] as any;
    const { sound, status } = await recording.createNewLoadedSoundAsync();
    updatedRecordings.push({
      sound: sound,
      duration: getDurationFormatted(status.durationMillis),
      file: recording.getURI(),
    });
    setRecordings(updatedRecordings);
    console.log("Recording stopped and stored at", uri);
    // Fetch audio binary blob data

    clearInterval(intervalRef.current);
    setStopTranscriptionSession(true);
    setIsRecording(false);
    setIsTranscribing(false);
  }

和函数getDurationFormatted``getRecordingLines

要获取录制的持续时间和录制文本的长度，请创建 and 函数：getDurationFormatted``getRecordingLines

  function getDurationFormatted(millis: any) {
    const minutes = millis / 1000 / 60;
    const minutesDisplay = Math.floor(minutes);
    const seconds = Math.round(minutes - minutesDisplay) * 60;
    const secondDisplay = seconds < 10 ? `0${seconds}` : seconds;
    return `${minutesDisplay}:${secondDisplay}`;
  }

  function getRecordingLines() {
    return recordings.map((recordingLine: any, index) => {
      return (
        <View key={index} style={styles.row}>
          <Text style={styles.fill}>
            {" "}
            Recording {index + 1} - {recordingLine.duration}
          </Text>
          <Button
            style={styles.button}
            onPress={() => recordingLine.sound.replayAsync()}
            title="Play"
          ></Button>
        </View>
      );
    });
  }

创建函数transcribeRecording

此功能允许我们与 Flask 服务器进行通信。我们使用 expo-av 库中的功能访问我们创建的音频。、和是我们发送到服务器的关键数据片段。getURI()languagemodel_size``audio_data

响应表示成功。我们将响应存储在 Hook 中。此回复包含我们的转录文本。200setTranscribedDatauseState

function transcribeInterim() {
    clearInterval(intervalRef.current);
    setIsRecording(false);
  }

  async function transcribeRecording() {
    const uri = recording.getURI();
    const filetype = uri.split(".").pop();
    const filename = uri.split("/").pop();
    setLoading(true);
    const formData: any = new FormData();
    formData.append("language", selectedLangRef.current);
    formData.append("model_size", modelOptions[selectedModelRef.current]);
    formData.append(
      "audio_data",
      {
        uri,
        type: `audio/${filetype}`,
        name: filename,
      },
      "temp_recording"
    );
    axios({
      url: "https://2c75-197-210-53-169.eu.ngrok.io/transcribe",
      method: "POST",
      data: formData,
      headers: {
        Accept: "application/json",
        "Content-Type": "multipart/form-data",
      },
    })
      .then(function (response) {
        console.log("response :", response);
        setTranscribedData((oldData: any) => [...oldData, response.data]);
        setLoading(false);
        setIsTranscribing(false);
        intervalRef.current = setInterval(
          transcribeInterim,
          transcribeTimeout * 1000
        );
      })
      .catch(function (error) {
        console.log("error : error");
      });

    if (!stopTranscriptionSessionRef.current) {
      setIsRecording(true);
    }
  }

组装应用程序

让我们组装到目前为止创建的所有零件：

import * as React from "react";
import {
  Text,
  StyleSheet,
  View,
  Button,
  ActivityIndicator,
} from "react-native";
import { Audio } from "expo-av";
import FormData from "form-data";
import axios from "axios";
import Mode from "./src/components/Mode";
import TranscribedOutput from "./src/components/TranscribeOutput";

export default () => {
  const [recording, setRecording] = React.useState(false as any);
  const [recordings, setRecordings] = React.useState([]);
  const [message, setMessage] = React.useState("");
  const [transcribedData, setTranscribedData] = React.useState([] as any);
  const [interimTranscribedData] = React.useState("");
  const [isRecording, setIsRecording] = React.useState(false);
  const [isTranscribing, setIsTranscribing] = React.useState(false);
  const [selectedLanguage, setSelectedLanguage] = React.useState("english");
  const [selectedModel, setSelectedModel] = React.useState(1);
  const [transcribeTimeout, setTranscribeTimout] = React.useState(5);
  const [stopTranscriptionSession, setStopTranscriptionSession] =
    React.useState(false);
  const [isLoading, setLoading] = React.useState(false);
  const intervalRef: any = React.useRef(null);

  const stopTranscriptionSessionRef = React.useRef(stopTranscriptionSession);
  stopTranscriptionSessionRef.current = stopTranscriptionSession;

  const selectedLangRef = React.useRef(selectedLanguage);
  selectedLangRef.current = selectedLanguage;

  const selectedModelRef = React.useRef(selectedModel);
  selectedModelRef.current = selectedModel;

  const supportedLanguages = [
    "english",
    "chinese",
    "german",
    "spanish",
    "russian",
    "korean",
    "french",
    "japanese",
    "portuguese",
    "turkish",
    "polish",
    "catalan",
    "dutch",
    "arabic",
    "swedish",
    "italian",
    "indonesian",
    "hindi",
    "finnish",
    "vietnamese",
    "hebrew",
    "ukrainian",
    "greek",
    "malay",
    "czech",
    "romanian",
    "danish",
    "hungarian",
    "tamil",
    "norwegian",
    "thai",
    "urdu",
    "croatian",
    "bulgarian",
    "lithuanian",
    "latin",
    "maori",
    "malayalam",
    "welsh",
    "slovak",
    "telugu",
    "persian",
    "latvian",
    "bengali",
    "serbian",
    "azerbaijani",
    "slovenian",
    "kannada",
    "estonian",
    "macedonian",
    "breton",
    "basque",
    "icelandic",
    "armenian",
    "nepali",
    "mongolian",
    "bosnian",
    "kazakh",
    "albanian",
    "swahili",
    "galician",
    "marathi",
    "punjabi",
    "sinhala",
    "khmer",
    "shona",
    "yoruba",
    "somali",
    "afrikaans",
    "occitan",
    "georgian",
    "belarusian",
    "tajik",
    "sindhi",
    "gujarati",
    "amharic",
    "yiddish",
    "lao",
    "uzbek",
    "faroese",
    "haitian creole",
    "pashto",
    "turkmen",
    "nynorsk",
    "maltese",
    "sanskrit",
    "luxembourgish",
    "myanmar",
    "tibetan",
    "tagalog",
    "malagasy",
    "assamese",
    "tatar",
    "hawaiian",
    "lingala",
    "hausa",
    "bashkir",
    "javanese",
    "sundanese",
  ];

  const modelOptions = ["tiny", "base", "small", "medium", "large"];

  React.useEffect(() => {
    return () => clearInterval(intervalRef.current);
  }, []);

  function handleTranscribeTimeoutChange(newTimeout: any) {
    setTranscribeTimout(newTimeout);
  }

  async function startRecording() {
    try {
      console.log("Requesting permissions..");
      const permission = await Audio.requestPermissionsAsync();
      if (permission.status === "granted") {
        await Audio.setAudioModeAsync({
          allowsRecordingIOS: true,
          playsInSilentModeIOS: true,
        });
        alert("Starting recording..");
        const RECORDING_OPTIONS_PRESET_HIGH_QUALITY: any = {
          android: {
            extension: ".mp4",
            outputFormat: Audio.RECORDING_OPTION_ANDROID_OUTPUT_FORMAT_MPEG_4,
            audioEncoder: Audio.RECORDING_OPTION_ANDROID_AUDIO_ENCODER_AMR_NB,
            sampleRate: 44100,
            numberOfChannels: 2,
            bitRate: 128000,
          },
          ios: {
            extension: ".wav",
            audioQuality: Audio.RECORDING_OPTION_IOS_AUDIO_QUALITY_MIN,
            sampleRate: 44100,
            numberOfChannels: 2,
            bitRate: 128000,
            linearPCMBitDepth: 16,
            linearPCMIsBigEndian: false,
            linearPCMIsFloat: false,
          },
        };
        const { recording }: any = await Audio.Recording.createAsync(
          RECORDING_OPTIONS_PRESET_HIGH_QUALITY
        );
        setRecording(recording);
        console.log("Recording started");
        setStopTranscriptionSession(false);
        setIsRecording(true);
        intervalRef.current = setInterval(
          transcribeInterim,
          transcribeTimeout * 1000
        );
        console.log("erer", recording);
      } else {
        setMessage("Please grant permission to app to access microphone");
      }
    } catch (err) {
      console.error(" Failed to start recording", err);
    }
  }
  async function stopRecording() {
    console.log("Stopping recording..");
    setRecording(undefined);
    await recording.stopAndUnloadAsync();
    const uri = recording.getURI();
    let updatedRecordings = [...recordings] as any;
    const { sound, status } = await recording.createNewLoadedSoundAsync();
    updatedRecordings.push({
      sound: sound,
      duration: getDurationFormatted(status.durationMillis),
      file: recording.getURI(),
    });
    setRecordings(updatedRecordings);
    console.log("Recording stopped and stored at", uri);
    // Fetch audio binary blob data

    clearInterval(intervalRef.current);
    setStopTranscriptionSession(true);
    setIsRecording(false);
    setIsTranscribing(false);
  }

  function getDurationFormatted(millis: any) {
    const minutes = millis / 1000 / 60;
    const minutesDisplay = Math.floor(minutes);
    const seconds = Math.round(minutes - minutesDisplay) * 60;
    const secondDisplay = seconds < 10 ? `0${seconds}` : seconds;
    return `${minutesDisplay}:${secondDisplay}`;
  }

  function getRecordingLines() {
    return recordings.map((recordingLine: any, index) => {
      return (
        <View key={index} style={styles.row}>
          <Text style={styles.fill}>
            {" "}
            Recording {index + 1} - {recordingLine.duration}
          </Text>
          <Button
            style={styles.button}
            onPress={() => recordingLine.sound.replayAsync()}
            title="Play"
          ></Button>
        </View>
      );
    });
  }

  function transcribeInterim() {
    clearInterval(intervalRef.current);
    setIsRecording(false);
  }

  async function transcribeRecording() {
    const uri = recording.getURI();
    const filetype = uri.split(".").pop();
    const filename = uri.split("/").pop();
    setLoading(true);
    const formData: any = new FormData();
    formData.append("language", selectedLangRef.current);
    formData.append("model_size", modelOptions[selectedModelRef.current]);
    formData.append(
      "audio_data",
      {
        uri,
        type: `audio/${filetype}`,
        name: filename,
      },
      "temp_recording"
    );
    axios({
      url: "https://2c75-197-210-53-169.eu.ngrok.io/transcribe",
      method: "POST",
      data: formData,
      headers: {
        Accept: "application/json",
        "Content-Type": "multipart/form-data",
      },
    })
      .then(function (response) {
        console.log("response :", response);
        setTranscribedData((oldData: any) => [...oldData, response.data]);
        setLoading(false);
        setIsTranscribing(false);
        intervalRef.current = setInterval(
          transcribeInterim,
          transcribeTimeout * 1000
        );
      })
      .catch(function (error) {
        console.log("error : error");
      });

    if (!stopTranscriptionSessionRef.current) {
      setIsRecording(true);
    }
  }
  return (
    <View style={styles.root}>
      <View style={{ flex: 1 }}>
        <Text style={styles.title}>Speech to Text. </Text>
        <Text style={styles.title}>{message}</Text>
      </View>
      <View style={styles.settingsSection}>
        <Mode
          disabled={isTranscribing || isRecording}
          possibleLanguages={supportedLanguages}
          selectedLanguage={selectedLanguage}
          onLanguageChange={setSelectedLanguage}
          modelOptions={modelOptions}
          selectedModel={selectedModel}
          onModelChange={setSelectedModel}
          transcribeTimeout={transcribeTimeout}
          onTranscribeTiemoutChanged={handleTranscribeTimeoutChange}
        />
      </View>
      <View style={styles.buttonsSection}>
        {!isRecording && !isTranscribing && (
          <Button onPress={startRecording} title="Start recording" />
        )}
        {(isRecording || isTranscribing) && (
          <Button
            onPress={stopRecording}
            disabled={stopTranscriptionSessionRef.current}
            title="stop recording"
          />
        )}
        <Button title="Transcribe" onPress={() => transcribeRecording()} />
        {getRecordingLines()}
      </View>

      {isLoading !== false ? (
        <ActivityIndicator
          size="large"
          color="#00ff00"
          hidesWhenStopped={true}
          animating={true}
        />
      ) : (
        <Text></Text>
      )}

      <View style={styles.transcription}>
        <TranscribedOutput
          transcribedText={transcribedData}
          interimTranscribedText={interimTranscribedData}
        />
      </View>
    </View>
  );
};

const styles = StyleSheet.create({
  root: {
    display: "flex",
    flex: 1,
    alignItems: "center",
    textAlign: "center",
    flexDirection: "column",
  },
  title: {
    marginTop: 40,
    fontWeight: "400",
    fontSize: 30,
  },
  settingsSection: {
    flex: 1,
  },
  buttonsSection: {
    flex: 1,
    flexDirection: "row",
  },
  transcription: {
    flex: 1,
    flexDirection: "row",
  },
  recordIllustration: {
    width: 100,
  },
  row: {
    flexDirection: "row",
    alignItems: "center",
    justifyContent: "center",
  },
  fill: {
    flex: 1,
    margin: 16,
  },
  button: {
    margin: 16,
  },
});

运行应用程序

使用以下命令运行 React Native 应用程序：

yarn start

项目存储库是公开可用的。

结论

在本文中，我们学习了如何在 React Native 应用程序中创建语音转文本功能。我预见到Whisper会改变日常生活中叙述和听写的工作方式。本文中介绍的技术支持创建听写应用。

我很高兴看到新的和创新的方式，开发人员扩展Whisper，例如，使用Whisper在我们的移动和网络设备上执行操作，或使用Whisper来改善我们网站和应用程序的可访问性。

pxr007

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
打赏
0
评论
在 React Native 中使用 Whisper 进行语音识别

语音识别使程序能够将人类语音处理成书面格式。语法、句法、结构和音频对于理解和处理人类语音至关重要。语音识别算法是计算机科学中最复杂的领域之一。人工智能、机器学习、无监督预训练技术的发展，以及 Wav2Vec 2.0 等框架，这些框架在自我监督学习和从原始音频中学习方面是有效的，已经提高了它们的能力。语音识别器由以下组件组成：语音输入一种解码器，它依赖于声学模型、发音词典和语言模型进行输出输出一词这些组件和技术进步使未在Windows10中查看RAM内存详细信息的5种方法。
复制链接

扫一扫