家长必看！1 小时搞定 RAZ 英文绘本英文提取！

本文链接：https://blog.csdn.net/wxaiway/article/details/145219532

是不是还在为孩子英文绘本中的生僻单词发愁？是不是还在为手工敲字翻译绘本内容而耗时费力？今天，我要给大家分享一个超实用的黑科技，让你轻松搞定孩子英文绘本的英文提取工作，只需 1 小时，就能完成 RAZ 所有绘本的英文文章提取，简直是家长们的福音！　

有了英文文本后，可以利用大模型进行翻译及总结，参考 10 分钟创建一个英文精读智能体，实现中英翻译、提炼生僻词汇等　

传统方法的痛点

在最开始，我们都是手工一行一行敲出绘本中的英文内容，然后再进行翻译。这种方法不仅耗时，还容易出错。

后来，我们尝试通过拍照提取文本，虽然比手工敲字快了一些，但依然不够高效。直到今天，我终于找到了一个更友好的方案——音频转文本！　

音频转文本的神奇力量

现在的大模型识别率非常高，利用音频转文本技术，我们可以轻松地将绘本中的英文内容提取出来。我找到了阿里云的 Paraformer 实时语音识别 API，它有免费的额度可以使用。说干就干，我用这个 API，1 个小时就完成了 RAZ J 级别所有绘本的英文文章提取工作。　

实现方案大公开

下面，我将详细分享如何创建一个能够批量处理 MP3 文件并进行语音识别的工具。这个工具将读取指定目录中的所有 MP3 文件，进行语音识别，并将结果保存到另一个目录中。　

依赖安装

1、安装依赖库　

安装所需的 Python 库：　

pip install dashscope pydub ffmpeg-python -i https://pypi.tuna.tsinghua.edu.cn/simple

注意：pydub依赖于ffmpeg，你需要确保ffmpeg已经正确安装在你的系统上。

2、配置 API-KEY 　

先到阿里云百炼注册账号，创建好 API-KEY：　

https://bailian.console.aliyun.com/　

在你的环境变量中设置DASHSCOPE_API_KEY。比如可以在终端中设置：　

export DASHSCOPE_API_KEY=your_api_key_here

代码实现

以下是完整的 Python 脚本AudioBatchTranscriber.py：　

import os
import dashscope
from dashscope.audio.asr import Recognition, RecognitionCallback, RecognitionResult
import json
from pydub import AudioSegment
from concurrent.futures import ThreadPoolExecutor
import argparse

# 定义回调类
class Callback(RecognitionCallback):
    def on_complete(self) -> None:
        print("Recognition completed successfully.")

    def on_error(self, result: RecognitionResult) -> None:
        print(f"Error during recognition: {result.message}")

    def on_event(self, result: RecognitionResult) -> None:
        print(f"Received transcription event: {result.message}")

# 从环境变量中获取API密钥
dashscope.api_key = os.getenv('DASHSCOPE_API_KEY')

def convert_mp3_to_wav(mp3_file, wav_file):
    audio = AudioSegment.from_file(mp3_file, format='mp3')
    # 确保输出的WAV文件符合要求：单声道、16kHz采样率、16位宽
    audio = audio.set_channels(1).set_frame_rate(16000).set_sample_width(2)
    audio.export(wav_file, format='wav')

def transcribe_audio(file_path, output_dir):
    try:
        # 检查文件是否存在
        if not os.path.isfile(file_path):
            print(f"Error: 文件不存在: {file_path}")
            return None

        # 将MP3文件转换为适合的WAV格式
        if file_path.endswith('.mp3'):
            wav_file = file_path.replace('.mp3', '.wav')
            convert_mp3_to_wav(file_path, wav_file)
            file_path = wav_file

        # 创建回调对象
        callback = Callback()

        # 初始化语音识别对象
        recognition = Recognition(
            model='paraformer-realtime-v2',
            format='pcm',
            sample_rate=16000,
            language_hints=['en'],
            callback=callback
        )

        # 调用语音转文字服务
        result = recognition.call(file_path)

        if result.status_code == 200:
            transcription_result = result.output  # 直接使用 .output 属性

            # 获取音频文件的基本名称
            base_name = os.path.splitext(os.path.basename(file_path))[0]

            # 构建输出文件路径
            json_output_file_path = os.path.join(output_dir, f"{base_name}.json")
            txt_output_file_path = os.path.join(output_dir, f"{base_name}.txt")

            # 将JSON结果写入文件
            with open(json_output_file_path, 'w', encoding='utf-8') as f:
                json.dump(transcription_result, f, indent=4, ensure_ascii=False)

            # 合并所有文本片段，并清理多余的空格
            full_text = ""
            for sentence in transcription_result.get('sentence', []):
                full_text += sentence.get('text', '') + " "

            # 清理多余的空格
            cleaned_full_text = ' '.join(full_text.split()).strip()

            # 将完整文本写入文件
            with open(txt_output_file_path, 'w', encoding='utf-8') as f:
                f.write(cleaned_full_text)

            print(f"Transcription results saved to: {json_output_file_path}")
            print(f"Transcribed text saved to: {txt_output_file_path}")
            return cleaned_full_text
        else:
            print(f"Error: {result.status_code} - {result.message}")
            return None
    except Exception as e:
        print(f"Exception processing {file_path}: {e}")
        return None

def process_all_mp3_files(input_dir, output_dir):
    # 确保输出目录存在
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)

    # 获取所有MP3文件
    mp3_files = [os.path.join(input_dir, f) for f in os.listdir(input_dir) if f.endswith('.mp3')]

    # 使用ThreadPoolExecutor进行并发处理
    with ThreadPoolExecutor(max_workers=5) as executor:
        futures = {executor.submit(transcribe_audio, file_path, output_dir): file_path for file_path in mp3_files}
        
        for future in futures:
            file_path = futures[future]
            try:
                result = future.result()
                if result:
                    print(f"Processed {file_path}: {result[:50]}...")  # 打印前50个字符作为示例
            except Exception as e:
                print(f"Error processing {file_path}: {e}")

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Process MP3 files and transcribe them using DashScope.")
    parser.add_argument("--input", type=str, required=True, help="Directory containing MP3 files")
    parser.add_argument("--output", type=str, required=True, help="Directory to save transcription results")

    args = parser.parse_args()

    input_directory = os.path.abspath(args.input)
    output_directory = os.path.abspath(args.output)

    process_all_mp3_files(input_directory, output_directory)

如何执行

1、保存脚本　

将上述代码保存为AudioBatchTranscriber.py

2、准备输入目录　

确保J-mp3目录中包含你要处理的 MP3 文件。例如：

J-mp3/
    81-WhenBadThingsHappen.mp3
    another-file.mp3
    ...

3、运行脚本　

打开终端或命令提示符，导航到保存AudioBatchTranscriber.py的目录，然后运行以下命令：

python AudioBatchTranscriber.py --input J-mp3 --output transcribed

这将会读取J-mp3目录中的所有 MP3 文件，进行语音识别，并将结果保存到transcribed目录中。

输出结果

运行脚本后，transcribed目录中将会生成与每个 MP3 文件同名的两个文件：　

{base_name}.json：包含完整的 JSON 结果。
{base_name}.txt：包含清理后的完整文本。例如：

transcribed/
    81-WhenBadThingsHappen.json
    81-WhenBadThingsHappen.txt
    another-file.json
    another-file.txt
    ...

总结

通过以上步骤，你可以轻松地创建一个批量音频转录工具，支持多线程并发处理，确保高效的音频文件转录和结果保存。

希望这个教程对你有所帮助，让你在孩子的英文学习道路上更加轻松愉快！