最近开源的ChatTTS十分的火爆,而且效果非常不错,普通的个人电脑就能跑得动,总是想拿它来做点什么,顺便看看效果是不是像网上说的那么理想。刚好最近《我的阿勒泰》也比较火爆,用它来朗读朗读李娟的文字看看效果如何?
那就开始吧。
读取《我所能带给你们的事物》
《我所能带给你们的事物》是散文集《我的阿勒泰》的第一篇文章,讲得是从乌鲁木齐带东西回老家的一些事情,在网上找一本我的阿勒泰.epub,用程序把他读取出来,并把他分段。
# Load the EPUB file
book = epub.read_epub('a.epub')
book_text = ""
# Extract the text from each item in the EPUB
for item in book.get_items():
if item.get_type() == ebooklib.ITEM_DOCUMENT and item.id == "x_chapter002":
# Parse the HTML content using BeautifulSoup
soup = BeautifulSoup(item.get_content(), 'html.parser')
# Extract and print the text
book_text = soup.get_text()
print(soup.get_text())
为了字幕和语音效果,这里不仅做了分段,还限制每个块的长度:
def split_text(text, max_length=50):
# 按段落分割
paragraphs = text.split('\n')
result = []
for paragraph in paragraphs:
# 按标点符号分割
sentences = re.split(r'([。!?;,,.!?;])', paragraph)
chunk = ''
for sentence in sentences:
if sentence:
# 检查是否超过最大长度
if len(chunk) + len(sentence) <= max_length:
chunk += sentence
else:
result.append(chunk.strip())
chunk = sentence
if chunk:
result.append(chunk.strip())
return result
按块生成语音
对每个块先进行语气的推理,用固定的女生,生成语音:
chat = ChatTTS.Chat()
chat.load_models(compile=False) # Set to True for better performance
# 输出结果
for i,chunk in enumerate(chunks):
print(chunk)
# Define the text input for inference (Support Batching)
text = chunk
torch.manual_seed(2)
rand_spk = chat.sample_random_speaker()
params_infer_code = {
'spk_emb': rand_spk,
'temperature': 0.3,
'top_P': 0.7,
'top_K': 20,
}
params_refine_text = {'prompt': '[oral_2][laugh_0][break_6]'}
torch.manual_seed(42)
if True:
text = chat.infer(text,
skip_refine_text=False,
refine_text_only=True,
params_refine_text=params_refine_text,
params_infer_code=params_infer_code
)
wavs = chat.infer(text,
skip_refine_text=True,
params_refine_text=params_refine_text,
params_infer_code=params_infer_code
)
# Assuming audio_data is a numpy array and the sample rate is 24000 Hz
audio_data = np.array(wavs[0]).flatten()
sample_rate = 24000
# Specify the output file name
output_file = "wavs\\" +str(i)+'-output_audio.wav'
# Save the audio data to a WAV file
write(output_file, sample_rate, audio_data)
print(f"Audio saved to {output_file}")
语音放在wavs
的目录下:
把wavs
的目录下的所有语音,连成mp3:
def merge_wav_to_mp3(input_folder, output_file):
# 获取所有 WAV 文件的路径
wav_files = [os.path.join(input_folder, f) for f in os.listdir(input_folder) if f.endswith('.wav')]
# 确保有 WAV 文件
if not wav_files:
print("没有找到任何 WAV 文件。")
return
wav_files.sort(key=os.path.getmtime)
# 初始化一个空的音频段
combined = AudioSegment.empty()
# 依次加载并合并所有 WAV 文件
for wav_file in wav_files:
print(wav_file)
audio = AudioSegment.from_wav(wav_file)
combined += audio
# 导出合并后的音频为 MP3 文件
combined.export(output_file, format='mp3')
print(f"合并后的 MP3 文件已保存到 {output_file}")
找一张图,用图和mp3生成一段mp4:
from moviepy.editor import *
# 加载音频文件
audio = AudioFileClip("combined_audio.mp3")
# 加载图像文件,并设置持续时间与音频长度相同
image = ImageClip("a.webp").set_duration(audio.duration)
# 设置图像的音频为加载的音频文件
video = image.set_audio(audio)
# 导出视频文件
video.write_videofile("a.mp4", codec="libx264", audio_codec="aac",fps=24)
公众号:每日AI新工具