提取IEMOCAP情感语料库标签

王延凯的博客

已于 2022-05-12 12:33:30 修改

阅读量1.1w

点赞数 21

分类专栏：语音信号处理日常文章标签： python 语音情感分类 IEMOCAP 情感识别

于 2021-11-27 22:10:31 首次发布

本文链接：https://blog.csdn.net/weixin_38468077/article/details/121583989

版权

语音信号处理同时被 2 个专栏收录

24 篇文章

订阅专栏

日常

16 篇文章

订阅专栏

IEMOCAP语料库介绍

该语料库包含的情感包括 anger, happiness, excitement, sadness, frustration, fear, surprise, other and neutral state
分别用ang、hap、exc、sad、fru、fea、sur、oth、neu来表示

如何查看标签数据

进入文件夹 IEMOCAP\Session1\dialog\EmoEvaluation ，可以看到文件：
在这里插入图片描述
这里以打开Ses01F_impro01.txt文件为例：

该 txt 表示在 dialog/wav 文件夹下的对话文件 Ses01F_impro01.wav 被切割成以上被标注成的语音文件及对应情感标签。例如 Ses01F_impro01_F000 表示，该文件在 sentences\wav\Ses01F_impro01 里边的 Ses01F_impro01_F000.wav

如何获取IEMOCAP？

百度网盘：，提取码：n0c0
官网申请：具体网址忘了，大家自行搜索一下吧，哈哈。

python提取IEMOCAP情感语料库标签

def get_txt_files(file_dir):
    L = []
    for root, dirs, files in os.walk(file_dir):
        for file in files:
            if os.path.splitext(file)[1] == '.txt':
                filename=os.path.join(root, file)
                # print(get_label(filename))
                L.append(filename)
        return L


def get_IEMOCAP_filename(IEMOCAP_dir):
    all_data = []
    Sessions = ['Session1', 'Session2', 'Session3', 'Session4', 'Session5']
    target_dir = "/dialog/EmoEvaluation/"
    wav_dir = "/sentences/wav/"

    for Session in Sessions:
        file_dir = IEMOCAP_dir + Session + target_dir
        # 读取该Session下的所有txt文件
        txt_files = get_txt_files(file_dir)
        # 读取单个txt文件，获取情感标签
        for txt_file in txt_files:
            last_folder = (txt_file.split("/")[-1]).split(".")[0]
            # 转换到pandas，方便操作

            data = pd.read_csv(txt_file, delimiter="\n",skiprows=1, names='a')
            data['a'] = data['a'].astype(str)
            # 取第一列，其中包含切片后的语音文件+情感分类标签
            filter_data=[x for x in data['a'] if '[' in x]

            # 是否包含某字符串
            print(txt_file,len(filter_data))

            for file in filter_data:
                values = file.split("\t")
                filename = values[1] + ".wav"
                filename = IEMOCAP_dir + Session + wav_dir + last_folder + "/" + filename
                target = values[2]
                #将target进行编码
                target=get_IEMOCAP_4target(target)

                #剔除不达标的数据，主要包含oth(3个),dis(2个)
                if(target==None):
                    print(target)
                    continue

                result=(filename,target)
                all_data.append(result)
    return all_data
    
def get_IEMOCAP_9target(target):
    # 'ang': 0, anger 愤怒
    # 'hap': 1, happiness 快乐，幸福
    # 'exc': 1, excitement 激动，兴奋
    # 'sad': 3, sadness 悲伤，悲痛
    # 'fru': 4, frustration 懊恼，沮丧
    # 'fea': 5, fear 害怕，畏惧
    # 'sur': 6, surprise 惊奇，惊讶
    # 'neu': 7, neutral state 中性
    # 'xxx': 8, other 其它

    if (target == "ang"):
        target = 0
    elif (target == "hap"):
        target = 1
    elif (target == "exc"):
        target = 2
    elif (target == "sad"):
        target = 3
    elif (target == "fru"):
        target = 4
    elif (target == "fea"):
        target = 5
    elif (target == "sur"):
        target = 6
    elif (target == "neu"):
        target = 7
    elif (target == "xxx"):
        target = 8
    else:
        print(target)
        target=None
        print("关键词提取错误，经检查程序，已默认跳过该条数据")

    return target

def get_IEMOCAP_4target(target):
    # 'ang': 0, anger 愤怒
    # 'hap': 1, happiness 快乐，幸福
    # 'sad': 2, sadness 悲伤，悲痛
    # 'neu': 3, neutral state 中性
    #

    if (target == "ang"):
        target = 0
    elif (target == "hap"):
        target = 1
    elif (target == "sad"):
        target = 2
    elif (target == "neu"):
        target = 3
    else:
        print(target)
        target=None
        print("关键词提取错误，经检查程序，已默认跳过该条数据")

    return target

取到语音文件路径及其对用的标签

wav_path="./data/IEMOCAP/"
spec_path="./features/ICMOCAP_Spec(4kinds)/"
filenames=get_IEMOCAP_filename(wav_path)
#将文件路径和对应的标签进行拆解
filenames,targets=zip(*filenames)
for i,filename in enumerate(filenames):
	label=targets[i]	#label为标签，filename为文件名字

								***Continue your work...***
	
								接下来就可以自己的操作了哦~~~