用ChatTTS 朗读《我的阿勒泰》里面的文章《我所能带给你们的事物》

本文链接：https://blog.csdn.net/weixin_40425640/article/details/139495509

最近开源的ChatTTS十分的火爆，而且效果非常不错，普通的个人电脑就能跑得动，总是想拿它来做点什么，顺便看看效果是不是像网上说的那么理想。刚好最近《我的阿勒泰》也比较火爆，用它来朗读朗读李娟的文字看看效果如何？

那就开始吧。

读取《我所能带给你们的事物》

《我所能带给你们的事物》是散文集《我的阿勒泰》的第一篇文章，讲得是从乌鲁木齐带东西回老家的一些事情，在网上找一本我的阿勒泰.epub，用程序把他读取出来，并把他分段。

# Load the EPUB file
book = epub.read_epub('a.epub')

book_text = ""
# Extract the text from each item in the EPUB
for item in book.get_items():
    if item.get_type() == ebooklib.ITEM_DOCUMENT and item.id == "x_chapter002":
        # Parse the HTML content using BeautifulSoup
        soup = BeautifulSoup(item.get_content(), 'html.parser')
        # Extract and print the text
        book_text = soup.get_text()
        print(soup.get_text())

为了字幕和语音效果，这里不仅做了分段，还限制每个块的长度：

def split_text(text, max_length=50):
    # 按段落分割
    paragraphs = text.split('\n')
    result = []

    for paragraph in paragraphs:
        # 按标点符号分割
        sentences = re.split(r'([。！？；，,.!?;])', paragraph)
        chunk = ''
        for sentence in sentences:
            if sentence:
                # 检查是否超过最大长度
                if len(chunk) + len(sentence) <= max_length:
                    chunk += sentence
                else:
                    result.append(chunk.strip())
                    chunk = sentence
        if chunk:
            result.append(chunk.strip())

    return result

按块生成语音

对每个块先进行语气的推理，用固定的女生，生成语音：

chat = ChatTTS.Chat()
chat.load_models(compile=False) # Set to True for better performance




# 输出结果
for i,chunk in enumerate(chunks):
    print(chunk)
    # Define the text input for inference (Support Batching)
    text = chunk
    torch.manual_seed(2)
    rand_spk = chat.sample_random_speaker()
    params_infer_code = {
        'spk_emb': rand_spk,
        'temperature': 0.3,
        'top_P': 0.7,
        'top_K': 20,
    }
    params_refine_text = {'prompt': '[oral_2][laugh_0][break_6]'}

    torch.manual_seed(42)

    if True:
        text = chat.infer(text,
                          skip_refine_text=False,
                          refine_text_only=True,
                          params_refine_text=params_refine_text,
                          params_infer_code=params_infer_code
                          )

    wavs = chat.infer(text,
                     skip_refine_text=True,
                     params_refine_text=params_refine_text,
                     params_infer_code=params_infer_code
                     )

    # Assuming audio_data is a numpy array and the sample rate is 24000 Hz
    audio_data = np.array(wavs[0]).flatten()
    sample_rate = 24000

    # Specify the output file name
    output_file = "wavs\\" +str(i)+'-output_audio.wav'

    # Save the audio data to a WAV file
    write(output_file, sample_rate, audio_data)

    print(f"Audio saved to {output_file}")

语音放在wavs 的目录下：

在这里插入图片描述

把wavs 的目录下的所有语音，连成mp3：

def merge_wav_to_mp3(input_folder, output_file):
    # 获取所有 WAV 文件的路径
    wav_files = [os.path.join(input_folder, f) for f in os.listdir(input_folder) if f.endswith('.wav')]
    
    # 确保有 WAV 文件
    if not wav_files:
        print("没有找到任何 WAV 文件。")
        return
    
    wav_files.sort(key=os.path.getmtime)
    
    # 初始化一个空的音频段
    combined = AudioSegment.empty()
    
    # 依次加载并合并所有 WAV 文件
    for wav_file in wav_files:
        print(wav_file)
        audio = AudioSegment.from_wav(wav_file)
        combined += audio
    
    # 导出合并后的音频为 MP3 文件
    combined.export(output_file, format='mp3')
    print(f"合并后的 MP3 文件已保存到 {output_file}")

找一张图，用图和mp3生成一段mp4：

在这里插入图片描述

from moviepy.editor import *

# 加载音频文件
audio = AudioFileClip("combined_audio.mp3")

# 加载图像文件，并设置持续时间与音频长度相同
image = ImageClip("a.webp").set_duration(audio.duration)

# 设置图像的音频为加载的音频文件
video = image.set_audio(audio)

# 导出视频文件
video.write_videofile("a.mp4", codec="libx264", audio_codec="aac",fps=24)