pydub使用记录

pydub使用记录

官方地址

官方文档

读取文件

读取文件需要使用ffmpeg或者libav.

from pydub import AudioSegment
#通用方法

AudioSegment().from_file()
'''
format | example: "aif" | default: "mp3" Format of the output file. Supports "wav" and "raw" natively, requires ffmpeg for all other formats. "raw" files require 3 additional 
keyword arguments, sample_width, frame_rate, and channels, denoted below with: raw only. This extra info is required because raw audio files do not have headers to include this info in the file itself like wav files do.

sample_width | example: 2 raw only — Use 1 for 8-bit audio 2 for 16-bit (CD quality) and 4 for 32-bit. It’s the number of bytes per sample.

channels | example: 1 raw only — 1 for mono, 2 for stereo.

frame_rate | example: 2 raw only — Also known as sample rate, common values are 44100 (44.1kHz - CD audio), and 48000 (48kHz - DVD audio)
'''

# wave and raw don’t use ffmpeg
wav_audio = AudioSegment.from_file("/path/to/sound.wav", format="wav")
raw_audio = AudioSegment.from_file("/path/to/sound.raw", format="raw",
                                   frame_rate=44100, channels=2, sample_width=2)

# all other formats use ffmpeg
mp3_audio = AudioSegment.from_file("/path/to/sound.mp3", format="mp3")

# use a file you've already opened (advanced …ish)
with open("/path/to/sound.wav", "rb") as wav_file:
    audio_segment = AudioSegment.from_file(wav_file, format="wav")

# also supports the os.PathLike protocol for python >= 3.6
from pathlib import Path
wav_path = Path("path/to/sound.wav")
wav_audio = AudioSegment.from_file(wav_path)

#直接使用原始音频数据
sound = AudioSegment(
    # raw audio data (bytes)
    data=b'…',

    # 2 byte (16 bit) samples
    sample_width=2,

    # 44.1 kHz frame rate
    frame_rate=44100,

    # stereo
    channels=2
)

#专用方法
song = AudioSegment.from_wav("never_gonna_give_you_up.wav")
song = AudioSegment.from_mp3("never_gonna_give_you_up.mp3")
ogg_version = AudioSegment.from_ogg("never_gonna_give_you_up.ogg")
flv_version = AudioSegment.from_flv("never_gonna_give_you_up.flv")

保存文件

AudioSegment对象写到文件,会返回一个输出文件的文件柄(file handle)。但这个文件柄并不需要进行操作。

from pydub import AudioSegment
sound = AudioSegment.from_file("/path/to/sound.wav", format="wav")

sound.export(self, out_f=None, format='mp3', codec=None, bitrate=None, parameters=None, tags=None, id3v2_version='4',cover=None)
'''
out_f :输出的音频文件地址
    
format | example: "aif" | default: "mp3" Format of the output file. Supports "wav" and "raw" natively, requires ffmpeg for all other formats.
        
codec | example: "libvorbis" For formats that may contain content encoded with different codecs, you can specify the codec you'd like the encoder to use. For example, the "ogg" format is often used with the "libvorbis" codec. (requires ffmpeg)
    
bitrate | example: "128k" For compressed formats, you can pass the bitrate you'd like the encoder to use (requires ffmpeg). Each codec accepts different bitrate arguments so take a look at the ffmpeg documentation for details (bitrate usually shown as -b, -ba or -a:b).

tags | example: {"album": "1989", "artist": "Taylor Swift"} Allows you to supply media info tags for the encoder (requires ffmpeg). Not all formats can receive tags (mp3 can).

parameters | example: ["-ac", "2"] Pass additional command line parameters to the ffmpeg call. These are added to the end of the call (in the output file section).

id3v2_version | example: "3" | default: "4" Set the ID3v2 version used by ffmpeg to add tags to the output file. If you want Windows Exlorer to display tags, use "3" here (source).

cover | example: "/path/to/imgfile.png" Allows you to supply a cover image (path to the image file). Currently, only MP3 files allow this keyword argument. Cover image must be a jpeg, png, bmp, or tiff file.
'''
# simple export
file_handle = sound.export
### DeepSeek在生成或处理会议记录方面的应用 DeepSeek 是一种强大的大型语言模型,其功能可以扩展到多种自然语言处理任务中,包括但不限于文本摘要、情感分析和对话生成[^1]。对于会议记录的生成与处理,可以通过以下方式利用 DeepSeek: #### 自动化生成会议记录 为了自动生成会议记录,通常需要完成以下几个核心任务:语音转文字(ASR)、提取关键信息、总结讨论要点以及保存结构化的文档。 以下是基于 Python 和 DeepSeek 实现的一个简单示例代码片段,该代码展示了如何将一段音频文件转换为文本,并进一步生成简洁的会议纪要。 ```python import deepseek as ds from pydub import AudioSegment import speech_recognition as sr def transcribe_audio_to_text(audio_file_path): recognizer = sr.Recognizer() with sr.AudioFile(audio_file_path) as source: audio_data = recognizer.record(source) text = recognizer.recognize_google(audio_data, language="en-US") # 使用Google ASR服务作为替代方案 return text def generate_meeting_summary(text_content): model = ds.TextGenerationModel.from_pretrained("deepseek/large") prompt = f"Summarize the following meeting discussion:\n{text_content}\nSummary:" summary = model.generate(prompt=prompt, max_length=200)[0].generated_text return summary.strip() if __name__ == "__main__": audio_path = "meeting_record.mp3" transcription_result = transcribe_audio_to_text(audio_path) final_summary = generate_meeting_summary(transcription_result) print(f"\nGenerated Meeting Summary:\n{final_summary}") ``` 上述脚本实现了两个主要的功能模块——**语音转录**和 **自动摘要生成**。其中 `transcribe_audio_to_text` 函数负责把录音文件转化为纯文本;而 `generate_meeting_summary` 则调用了预训练好的 DeepSeek 模型来压缩冗长的内容至精炼版概览。 #### 数据存储格式支持 一旦完成了会议记录的创建过程,则需考虑将其妥善存档以便后续查阅或者分享给其他参与者。按照目标需求说明[^2],这里提供两种常见的持久化策略:`.docx` 文件导出以及 JSON 结构序列化。 ##### 导出为 Word 文档 (.docx) 借助第三方库如 python-docx 可轻松构建富文本样式报告。 ```python from docx import Document def save_as_docx(summary, filename="Meeting_Summary.docx"): document = Document() document.add_heading('Meeting Highlights', level=1) document.add_paragraph(summary) document.save(filename) print(f"Saved {filename} successfully.") ``` ##### 序列化成 JSON 对象 如果偏好轻量级的数据交换标准,那么采用 JSON 编码会更加灵活高效。 ```python import json def serialize_to_json(summary_dict, filepath='summary.json'): with open(filepath, 'w') as fp: json.dump(summary_dict, fp, indent=4) print(f"Wrote data into {filepath}.") ``` 以上两段函数分别演示了如何以不同媒介形式保留最终成果。 #### 训练优化建议 值得注意的是,在实际部署前可能还需要针对特定领域定制微调过的子集版本。因为正如所提到那样,“训练数据的多样性对提升 DeepSeek 在代码生成方面的性能至关重要。” 同理适用于任何类型的 NLP 输出质量改进工作流[^3]。因此收集足够的真实场景样本有助于提高准确性及上下文关联度。 ---
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值