声音特征提取
资源下载
opensmile提取特征
配置参数修改
根据论文“C. Acoustic Feature Extraction: openSMILE ”小节,配置修改如下:
配置文件路径:config/shared/FrameModeFunctionals.conf.inc
frameMode = fixed
frameSize = 0.1 #滑动窗口大小为100ms
frameStep = 0.033 #滑动步长为30Hz
frameCenterSpecial = left
参考:《openSMILE用户手册》第2章 (3) 默认特征集的使用
视频转音频格式
::mp4_to_wav.bat
@echo off
::pushd D:\doc\data\Video_chunks\test\
pushd %1
echo %cd%
setlocal EnableDelayedExpansion
set a=1
set file_ext=wav
set gen_path=.\%file_ext%\
if not exist %gen_path% ( md %gen_path%)
for /f "delims=" %%i in ('dir /b /s *.mp4') do (
::ffmpeg -i "%%i" -ss 00:00:00 -t 00:00:03 -q:a 0 -map a ".\wav\!a!.wav"
::ffmpeg -i "%%i" -f wav -ar 16000 ".\wav\!a!.wav"
::set /a a+=1
set new_name=%%~ni.%file_ext%
:: 询问提示时,选择y覆盖
echo y|ffmpeg -i "%%i" -f wav -ar 16000 "%gen_path%!new_name!"
)
echo 批量处理完成
pause
批量提取音频特征
::wav_to_csvfile.bat
@echo off
::pushd D:\doc\data\Video_chunks\test\
pushd %1
echo %cd%
set file_ext=csv
set gen_path=.\%file_ext%\
if not exist %gen_path% ( md %gen_path%)
setlocal EnableDelayedExpansion
set a=1
for /f "delims=" %%i in ('dir /b /s *.wav') do (
set new_name=%%~ni.%file_ext%
:: 询问提示时,选择y覆盖
echo y|SMILExtract_Release -C D:\develop_files\opensmile-2.3.0\config\IS13_ComParE.conf -I "%%i" -O "%gen_path%!new_name!"
)
echo 特征提取完成
pause
数据预处理
由于使用opensmile生成的csv文件,不能直接使用。文件头部有6380行要删除,且第1列、最后一列也要删除。
# 去除特征文件中无效的信息,最终生成的是133*6373矩阵的csv文件
def update_csvfile(filename):
invalid_num = 6380
rows = read_csvfile(filename)
if len(rows) <= invalid_num:
raise Exception('error: csv_fils=%s only has %d rows < %d' %(filename,len(rows),invalid_num))
rows = rows[invalid_num:]
rows = np.delete(rows,[0,len(rows[0])-1],axis=1)
writ_csvfile(filename,rows)
return rows
def writ_csvfile(filename,rows):
with open(filename,'w',newline='') as f:
writer = csv.writer(f)
writer.writerows(rows)
def read_csvfile(filename):
with open(filename,'r') as f:
all_lines = csv.reader(f)
rows = [row for row in all_lines]
return rows