IEMOCAP语料库介绍
该语料库包含的情感包括 anger, happiness, excitement, sadness, frustration, fear, surprise, other and neutral state
分别用ang、hap、exc、sad、fru、fea、sur、oth、neu来表示
如何查看标签数据
进入文件夹 IEMOCAP\Session1\dialog\EmoEvaluation ,可以看到文件:
这里以打开Ses01F_impro01.txt文件为例:
该 txt 表示 在 dialog/wav 文件夹下的对话文件 Ses01F_impro01.wav 被切割成 以上被标注成的语音文件及对应情感标签。例如 Ses01F_impro01_F000 表示,该文件在 sentences\wav\Ses01F_impro01 里边的 Ses01F_impro01_F000.wav
如何获取IEMOCAP?
百度网盘:,提取码:n0c0
官网申请:具体网址忘了,大家自行搜索一下吧,哈哈。
python提取IEMOCAP情感语料库标签
def get_txt_files(file_dir):
L = []
for root, dirs, files in os.walk(file_dir):
for file in files:
if os.path.splitext(file)[1] == '.txt':
filename=os.path.join(root, file)
# print(get_label(filename))
L.append(filename)
return L
def get_IEMOCAP_filename(IEMOCAP_dir):
all_data = []
Sessions = ['Session1', 'Session2', 'Session3', 'Session4', 'Session5']
target_dir = "/dialog/EmoEvaluation/"
wav_dir = "/sentences/wav/"
for Session in Sessions:
file_dir = IEMOCAP_dir + Session + target_dir
# 读取该Session下的所有txt文件
txt_files = get_txt_files(file_dir)
# 读取单个txt文件,获取情感标签
for txt_file in txt_files:
last_folder = (txt_file.split("/")[-1]).split(".")[0]
# 转换到pandas,方便操作
data = pd.read_csv(txt_file, delimiter="\n",skiprows=1, names='a')
data['a'] = data['a'].astype(str)
# 取第一列,其中包含切片后的语音文件+情感分类标签
filter_data=[x for x in data['a'] if '[' in x]
# 是否包含某字符串
print(txt_file,len(filter_data))
for file in filter_data:
values = file.split("\t")
filename = values[1] + ".wav"
filename = IEMOCAP_dir + Session + wav_dir + last_folder + "/" + filename
target = values[2]
#将target进行编码
target=get_IEMOCAP_4target(target)
#剔除不达标的数据,主要包含oth(3个),dis(2个)
if(target==None):
print(target)
continue
result=(filename,target)
all_data.append(result)
return all_data
def get_IEMOCAP_9target(target):
# 'ang': 0, anger 愤怒
# 'hap': 1, happiness 快乐,幸福
# 'exc': 1, excitement 激动,兴奋
# 'sad': 3, sadness 悲伤,悲痛
# 'fru': 4, frustration 懊恼,沮丧
# 'fea': 5, fear 害怕,畏惧
# 'sur': 6, surprise 惊奇,惊讶
# 'neu': 7, neutral state 中性
# 'xxx': 8, other 其它
if (target == "ang"):
target = 0
elif (target == "hap"):
target = 1
elif (target == "exc"):
target = 2
elif (target == "sad"):
target = 3
elif (target == "fru"):
target = 4
elif (target == "fea"):
target = 5
elif (target == "sur"):
target = 6
elif (target == "neu"):
target = 7
elif (target == "xxx"):
target = 8
else:
print(target)
target=None
print("关键词提取错误,经检查程序,已默认跳过该条数据")
return target
def get_IEMOCAP_4target(target):
# 'ang': 0, anger 愤怒
# 'hap': 1, happiness 快乐,幸福
# 'sad': 2, sadness 悲伤,悲痛
# 'neu': 3, neutral state 中性
#
if (target == "ang"):
target = 0
elif (target == "hap"):
target = 1
elif (target == "sad"):
target = 2
elif (target == "neu"):
target = 3
else:
print(target)
target=None
print("关键词提取错误,经检查程序,已默认跳过该条数据")
return target
取到语音文件路径及其对用的标签
wav_path="./data/IEMOCAP/"
spec_path="./features/ICMOCAP_Spec(4kinds)/"
filenames=get_IEMOCAP_filename(wav_path)
#将文件路径和对应的标签进行拆解
filenames,targets=zip(*filenames)
for i,filename in enumerate(filenames):
label=targets[i] #label为标签,filename为文件名字
***Continue your work...***
接下来就可以自己的操作了哦~~~
这是取到的filenames的文件结构
总结
总共包含十一种感情标签,但论文中一般都比较四种情绪的分辨,也就是写出的get_IEMOCAP_4target函数,如果拓展情绪类别的话,可以使用get_IEMOCAP_9target获取数据中包含的九种情感标签,但在实际的标签提取过程中,还会发现有以下两种另外的情感标签:
异常情感标签 | 数量 |
---|---|
oth | 3 |
dis | 2 |
参考
以下是一篇使用IEMOCAP进行情感识别的识别结果:
2022年4月28更新(pandas版本导致程序运行报错)
在后续的测试中 pandas 版本过高也会导致解析文件名字报错,博主所使用的 pandas 版本为0.20.3,如果已经安装了其他版本的pandas,则可以执行以下命令进行更换
pip install pandas==0.20.3