对话状态追踪TRADE模型数据和代码解读

ywm-pku

已于 2022-03-20 23:14:37 修改

阅读量1.5k

点赞数 1

分类专栏：对话系统文章标签：自然语言处理人工智能 nlp

于 2022-03-18 14:45:52 首次发布

本文链接：https://blog.csdn.net/qqywm/article/details/123088568

版权

对话系统专栏收录该内容

3 篇文章 0 订阅

订阅专栏

数据集：

Multi-Domain Wizard-of-Oz 数据集 (MultiWOZ)，一个完全标记的人类与人类书面对话的集合，跨越多个领域和主题。在 10k 个对话的大小上，它至少比以前所有带注释的面向任务的语料库大一个数量级。

taxi {}
police {}
hospital {}
hotel {'info': {'type': 'hotel', 'parking': 'yes', 'pricerange': 'cheap', 'internet': 'yes'}, 'fail_info': {}, 'book': {'pre_invalid': True, 'stay': '2', 'day': 'tuesday', 'invalid': False, 'people': '6'}, 'fail_book': {'stay': '3'}}
topic {'taxi': False, 'police': False, 'restaurant': False, 'hospital': False, 'hotel': False, 'general': False, 'attraction': False, 'train': False, 'booking': False}
attraction {}
train {}
message ["You are looking for a <span class='emphasis'>place to stay</span>. The hotel should be in the <span class='emphasis'>cheap</span> price range and should be in the type of <span class='emphasis'>hotel</span>", "The hotel should <span class='emphasis'>include free parking</span> and should <span class='emphasis'>include free wifi</span>", "Once you find the <span class='emphasis'>hotel</span> you want to book it for <span class='emphasis'>6 people</span> and <span class='emphasis'>3 nights</span> starting from <span class='emphasis'>tuesday</span>", "If the booking fails how about <span class='emphasis'>2 nights</span>", "Make sure you get the <span class='emphasis'>reference number</span>"]
restaurant {

加载和下载数据代码段：下载数据后导入一个文件夹下

def loadData(): #加载数据
    data_url = "data/multi-woz/data.json"
    dataset_url = "https://www.repository.cam.ac.uk/bitstream/handle/1810/280608/MULTIWOZ2.zip?sequence=3&isAllowed=y"
    if not os.path.exists("data"):
        os.makedirs("data")
        os.makedirs("data/multi-woz")

    if not os.path.exists(data_url):
        print("Downloading and unzipping the MultiWOZ dataset")
        resp = urllib.request.urlopen(dataset_url)
        zip_ref = ZipFile(BytesIO(resp.read())) #内置的 zipfile 模块实现对 zip 文件的解压
        zip_ref.extractall("data/multi-woz")
        zip_ref.close()
        #shutil.copy() 函数实现文件复制功能，将 source 文件复制到 destination 文件夹中，两个参数都是字符串格式。
        shutil.copy('data/multi-woz/MULTIWOZ2 2/data.json', 'data/multi-woz/')
        shutil.copy('data/multi-woz/MULTIWOZ2 2/valListFile.json', 'data/multi-woz/')
        shutil.copy('data/multi-woz/MULTIWOZ2 2/testListFile.json', 'data/multi-woz/')
        shutil.copy('data/multi-woz/MULTIWOZ2 2/dialogue_acts.json', 'data/multi-woz/')

def get_dial(dialogue)的返回

[{'usr': 'am looking for a place to to stay that has cheap price range it should be in a type of hotel', 'sys': 'okay , do you have a specific area you want to stay in ?', 'sys_a': ['area'], 'domain': 'hotel', 'bvs': [['hotel-pricerange', 'cheap'], ['hotel-type', 'hotel']]}, {'usr': 'no , i just need to make sure it s cheap . oh , and i need parking', 'sys': 'i found 1 cheap hotel for you that include -s parking . do you like me to book it ?', 'sys_a': [['none', 'none'], ['price', 'cheap'], ['choice', '1'], ['parking', 'none']], 'domain': 'hotel', 'bvs': [['hotel-parking', 'yes'], ['hotel-pricerange', 'cheap'], ['hotel-type', 'hotel']]}, {'usr': 'yes , please . 6 people 3 nights starting on tuesday .', 'sys': 'i am sorry but i was not able to book that for you for tuesday . is there another day you would like to stay or perhaps a shorter stay ?', 'sys_a': ['stay', 'day'], 'domain': 'hotel', 'bvs': [['hotel-book day', 'tuesday'], ['hotel-book people', '6'], ['hotel-book stay', '3'], ['hotel-parking', 'yes'], ['hotel-pricerange', 'cheap'], ['hotel-type', 'hotel']]}, {'usr': 'how about only 2 nights .', 'sys': 'booking was successful . reference number is : 7gawk763 . anything else i can do for you ?', 'sys_a': [], 'domain': 'hotel', 'bvs': [['hotel-book day', 'tuesday'], ['hotel-book people', '6'], ['hotel-book stay', '2'], ['hotel-parking', 'yes'], ['hotel-pricerange', 'cheap'], ['hotel-type', 'hotel']]}, {'usr': 'no , that will be all . goodbye .', 'sys': 'thank you for using our services .', 'sys_a': [], 'domain': 'hotel', 'bvs': [['hotel-book day', 'tuesday'], ['hotel-book people', '6'], ['hotel-book stay', '2'], ['hotel-parking', 'yes'], ['hotel-pricerange', 'cheap'], ['hotel-type', 'hotel']]}]

对话的序号前面是SNG就是单轮；MUL就是多轮。

数据结构：每个对话包含a goal，multiple user，system utterances，belief state，dialogue acts and slots。

Belief state：有三个部分，分别是semi、book、booked。其中semi是特定领域里的槽值；book在特定领域的booking slots；booked是book的一个子集，在book这个字典里，是booked entity（一旦预定生成）。

MultiWOZ 对话数据集合

对话动作设定

Attraction, Hospital, Police, Hotel, Restaurant, Taxi, Train 这7个领域组成。后四个域是扩展域，包括子任务预订。

The belief state have three sections: semi, book and booked. Semi refers to slots from a particular domain. Book refers to booking slots for a particular domain and booked is a sub-list of book dictionary with information about the booked entity (once the booking has been made).

"message": [
    "You are looking for a <span class='emphasis'>place to stay</span>. The hotel should be in the <span class='emphasis'>cheap</span> price range and should be in the type of <span class='emphasis'>hotel</span>", 
    "The hotel should <span class='emphasis'>include free parking</span> and should <span class='emphasis'>include free wifi</span>", 
    "Once you find the <span class='emphasis'>hotel</span> you want to book it for <span class='emphasis'>6 people</span> and <span class='emphasis'>3 nights</span> starting from <span class='emphasis'>tuesday</span>", 
    "If the booking fails how about <span class='emphasis'>2 nights</span>", 
    "Make sure you get the <span class='emphasis'>reference number</span>"
],

"message": “您正在寻找一个住宿地点。酒店应该在便宜的价格范围内，并且应该是酒店类型”，

“酒店应包括免费停车，并应包括免费wifi”，

“一旦你找到了酒店，你想从周二开始为6人预订一间酒店，三晚”，

“如果预订失败，两晚怎么样？”，

“确保您获得了参考号”

message描述对话的情况

"log":是记录了对话内容

{"goal":目标

"taxi": 领域{ }

.....................

"hotel": {       领域
    "info": {            意图{槽:值}
        "type": "hotel", 
        "parking": "yes", 
        "pricerange": "cheap", 
        "internet": "yes"
    }, 
    "fail_info": {}, 
    "book": {
        "pre_invalid": true, 
        "stay": "2", 
        "day": "tuesday", 
        "invalid": false, 
        "people": "6"
    }, 
    "fail_book": {
        "stay": "3"
    }
},

..............

usr：am looking for a place to to stay that has cheap price range it should be in a type of hotel
sys:okay , do you have a specific area you want to stay in ?
usr:no , i just need to make sure it s cheap . oh , and i need parking
sys:i found 1 cheap hotel for you that include -s parking . do you like me to book it ?
usr:yes , please . 6 people 3 nights starting on tuesday .
sys:i am sorry but i was not able to book that for you for tuesday . is there another day you would like to stay or perhaps a shorter stay ?
usr:how about only 2 nights .
sys:booking was successful . reference number is : 7gawk763 . anything else i can do for you ?
usr:no , that will be all . goodbye .
sys:thank you for using our services .

"hotel": {
    "book": {
        "booked": [], 
        "stay": "", 
        "day": "", 
        "people": ""
    }, 
    "semi": {
        "name": "not mentioned", 
        "area": "not mentioned", 
        "parking": "not mentioned", 
        "pricerange": "cheap", 
        "stars": "not mentioned", 
        "internet": "not mentioned", 
        "type": "hotel"
    }
},

数据结构：每个对话包含a goal，multiple user，system utterances，belief state，dialogue acts and slots。
Belief state：有三个部分，分别是semi、book、booked。其中semi是特定领域里的槽值；book在特定领域的booking slots；booked是book的一个子集，在book这个字典里，是booked entity（一旦预定生成）。

整理后的完整的数据：

{'usr': 'am looking for a place to to stay that has cheap price range it should be in a type of hotel', 'sys': 'okay , do you have a specific area you want to stay in ?', 'sys_a': ['area'], 'domain': 'hotel', 'bvs': [['hotel-pricerange', 'cheap'], ['hotel-type', 'hotel']]}
{'usr': 'no , i just need to make sure it s cheap . oh , and i need parking', 'sys': 'i found 1 cheap hotel for you that include -s parking . do you like me to book it ?', 'sys_a': [['none', 'none'], ['price', 'cheap'], ['choice', '1'], ['parking', 'none']], 'domain': 'hotel', 'bvs': [['hotel-parking', 'yes'], ['hotel-pricerange', 'cheap'], ['hotel-type', 'hotel']]}
{'usr': 'yes , please . 6 people 3 nights starting on tuesday .', 'sys': 'i am sorry but i was not able to book that for you for tuesday . is there another day you would like to stay or perhaps a shorter stay ?', 'sys_a': ['stay', 'day'], 'domain': 'hotel', 'bvs': [['hotel-book day', 'tuesday'], ['hotel-book people', '6'], ['hotel-book stay', '3'], ['hotel-parking', 'yes'], ['hotel-pricerange', 'cheap'], ['hotel-type', 'hotel']]}
{'usr': 'how about only 2 nights .', 'sys': 'booking was successful . reference number is : 7gawk763 . anything else i can do for you ?', 'sys_a': [], 'domain': 'hotel', 'bvs': [['hotel-book day', 'tuesday'], ['hotel-book people', '6'], ['hotel-book stay', '2'], ['hotel-parking', 'yes'], ['hotel-pricerange', 'cheap'], ['hotel-type', 'hotel']]}
{'usr': 'no , that will be all . goodbye .', 'sys': 'thank you for using our services .', 'sys_a': [], 'domain': 'hotel', 'bvs': [['hotel-book day', 'tuesday'], ['hotel-book people', '6'], ['hotel-book stay', '2'], ['hotel-parking', 'yes'], ['hotel-pricerange', 'cheap'], ['hotel-type', 'hotel']]}

+++++++++++++++++++++

{'system_transcript': '', 'turn_idx': 0, 'belief_state': [{'slots': [['hotel-pricerange', 'cheap']], 'act': 'inform'}, {'slots': [['hotel-type', 'hotel']], 'act': 'inform'}], 'turn_label': [['hotel-pricerange', 'cheap'], ['hotel-type', 'hotel']], 'transcript': 'am looking for a place to to stay that has cheap price range it should be in a type of hotel', 'system_acts': [], 'domain': 'hotel'}
{'system_transcript': 'okay , do you have a specific area you want to stay in ?', 'turn_idx': 1, 'belief_state': [{'slots': [['hotel-parking', 'yes']], 'act': 'inform'}, {'slots': [['hotel-pricerange', 'cheap']], 'act': 'inform'}, {'slots': [['hotel-type', 'hotel']], 'act': 'inform'}], 'turn_label': [['hotel-parking', 'yes']], 'transcript': 'no , i just need to make sure it s cheap . oh , and i need parking', 'system_acts': ['area'], 'domain': 'hotel'}
{'system_transcript': 'i found 1 cheap hotel for you that include -s parking . do you like me to book it ?', 'turn_idx': 2, 'belief_state': [{'slots': [['hotel-book day', 'tuesday']], 'act': 'inform'}, {'slots': [['hotel-book people', '6']], 'act': 'inform'}, {'slots': [['hotel-book stay', '3']], 'act': 'inform'}, {'slots': [['hotel-parking', 'yes']], 'act': 'inform'}, {'slots': [['hotel-pricerange', 'cheap']], 'act': 'inform'}, {'slots': [['hotel-type', 'hotel']], 'act': 'inform'}], 'turn_label': [['hotel-book day', 'tuesday'], ['hotel-book people', '6'], ['hotel-book stay', '3']], 'transcript': 'yes , please . 6 people 3 nights starting on tuesday .', 'system_acts': [['none', 'none'], ['price', 'cheap'], ['choice', '1'], ['parking', 'none']], 'domain': 'hotel'}
{'system_transcript': 'i am sorry but i was not able to book that for you for tuesday . is there another day you would like to stay or perhaps a shorter stay ?', 'turn_idx': 3, 'belief_state': [{'slots': [['hotel-book day', 'tuesday']], 'act': 'inform'}, {'slots': [['hotel-book people', '6']], 'act': 'inform'}, {'slots': [['hotel-book stay', '2']], 'act': 'inform'}, {'slots': [['hotel-parking', 'yes']], 'act': 'inform'}, {'slots': [['hotel-pricerange', 'cheap']], 'act': 'inform'}, {'slots': [['hotel-type', 'hotel']], 'act': 'inform'}], 'turn_label': [['hotel-book stay', '2']], 'transcript': 'how about only 2 nights .', 'system_acts': ['stay', 'day'], 'domain': 'hotel'}
{'system_transcript': 'booking was successful . reference number is : 7gawk763 . anything else i can do for you ?', 'turn_idx': 4, 'belief_state': [{'slots': [['hotel-book day', 'tuesday']], 'act': 'inform'}, {'slots': [['hotel-book people', '6']], 'act': 'inform'}, {'slots': [['hotel-book stay', '2']], 'act': 'inform'}, {'slots': [['hotel-parking', 'yes']], 'act': 'inform'}, {'slots': [['hotel-pricerange', 'cheap']], 'act': 'inform'}, {'slots': [['hotel-type', 'hotel']], 'act': 'inform'}], 'turn_label': [], 'transcript': 'no , that will be all . goodbye .', 'system_acts': [], 'domain': 'hotel'}

+++++++++++++++++++++

{
        "dialogue_idx": "SNG01856.json",
        "domains": [
            "hotel"
        ],
        "dialogue": [
            {
                "system_transcript": "",
                "turn_idx": 0,
                "belief_state": [     #状态
                    {
                        "slots": [    #槽位-值
                            [
                                "hotel-pricerange",
                                "cheap"
                            ]
                        ],
                        "act": "inform"   #动作
                    },
                    {
                        "slots": [
                            [
                                "hotel-type",
                                "hotel"
                            ]
                        ],
                        "act": "inform"
                    }
                ],
                "turn_label": [
                    [
                        "hotel-pricerange",
                        "cheap"
                    ],
                    [
                        "hotel-type",
                        "hotel"
                    ]
                ],
                "transcript": "am looking for a place to to stay that has cheap price range it should be in a type of hotel",
                "system_acts": [],
                "domain": "hotel"
            },
            {
                "system_transcript": "okay , do you have a specific area you want to stay in ?",
                "turn_idx": 1,
                "belief_state": [
                    {
                        "slots": [
                            [
                                "hotel-parking",
                                "yes"
                            ]
                        ],
                        "act": "inform"
                    },
                    {
                        "slots": [
                            [
                                "hotel-pricerange",
                                "cheap"
                            ]
                        ],
                        "act": "inform"
                    },
                    {
                        "slots": [
                            [
                                "hotel-type",
                                "hotel"
                            ]
                        ],
                        "act": "inform"
                    }
                ],
                "turn_label": [
                    [
                        "hotel-parking",
                        "yes"
                    ]
                ],
                "transcript": "no , i just need to make sure it s cheap . oh , and i need parking",
                "system_acts": [
                    "area"
                ],
                "domain": "hotel"
            },
            {
                "system_transcript": "i found 1 cheap hotel for you that include -s parking . do you like me to book it ?",
                "turn_idx": 2,
                "belief_state": [
                    {
                        "slots": [
                            [
                                "hotel-book day",
                                "tuesday"
                            ]
                        ],
                        "act": "inform"
                    },
                    {
                        "slots": [
                            [
                                "hotel-book people",
                                "6"
                            ]
                        ],
                        "act": "inform"
                    },
                    {
                        "slots": [
                            [
                                "hotel-book stay",
                                "3"
                            ]
                        ],
                        "act": "inform"
                    },
                    {
                        "slots": [
                            [
                                "hotel-parking",
                                "yes"
                            ]
                        ],
                        "act": "inform"
                    },
                    {
                        "slots": [
                            [
                                "hotel-pricerange",
                                "cheap"
                            ]
                        ],
                        "act": "inform"
                    },
                    {
                        "slots": [
                            [
                                "hotel-type",
                                "hotel"
                            ]
                        ],
                        "act": "inform"
                    }
                ],
                "turn_label": [
                    [
                        "hotel-book day",
                        "tuesday"
                    ],
                    [
                        "hotel-book people",
                        "6"
                    ],
                    [
                        "hotel-book stay",
                        "3"
                    ]
                ],
                "transcript": "yes , please . 6 people 3 nights starting on tuesday .",
                "system_acts": [
                    [
                        "none",
                        "none"
                    ],
                    [
                        "price",
                        "cheap"
                    ],
                    [
                        "choice",
                        "1"
                    ],
                    [
                        "parking",
                        "none"
                    ]
                ],
                "domain": "hotel"
            },
            {
                "system_transcript": "i am sorry but i was not able to book that for you for tuesday . is there another day you would like to stay or perhaps a shorter stay ?",
                "turn_idx": 3,
                "belief_state": [
                    {
                        "slots": [
                            [
                                "hotel-book day",
                                "tuesday"
                            ]
                        ],
                        "act": "inform"
                    },
                    {
                        "slots": [
                            [
                                "hotel-book people",
                                "6"
                            ]
                        ],
                        "act": "inform"
                    },
                    {
                        "slots": [
                            [
                                "hotel-book stay",
                                "2"
                            ]
                        ],
                        "act": "inform"
                    },
                    {
                        "slots": [
                            [
                                "hotel-parking",
                                "yes"
                            ]
                        ],
                        "act": "inform"
                    },
                    {
                        "slots": [
                            [
                                "hotel-pricerange",
                                "cheap"
                            ]
                        ],
                        "act": "inform"
                    },
                    {
                        "slots": [
                            [
                                "hotel-type",
                                "hotel"
                            ]
                        ],
                        "act": "inform"
                    }
                ],
                "turn_label": [
                    [
                        "hotel-book stay",
                        "2"
                    ]
                ],
                "transcript": "how about only 2 nights .",
                "system_acts": [
                    "stay",
                    "day"
                ],
                "domain": "hotel"
            },
            {
                "system_transcript": "booking was successful . reference number is : 7gawk763 . anything else i can do for you ?",
                "turn_idx": 4,
                "belief_state": [
                    {
                        "slots": [
                            [
                                "hotel-book day",
                                "tuesday"
                            ]
                        ],
                        "act": "inform"
                    },
                    {
                        "slots": [
                            [
                                "hotel-book people",
                                "6"
                            ]
                        ],
                        "act": "inform"
                    },
                    {
                        "slots": [
                            [
                                "hotel-book stay",
                                "2"
                            ]
                        ],
                        "act": "inform"
                    },
                    {
                        "slots": [
                            [
                                "hotel-parking",
                                "yes"
                            ]
                        ],
                        "act": "inform"
                    },
                    {
                        "slots": [
                            [
                                "hotel-pricerange",
                                "cheap"
                            ]
                        ],
                        "act": "inform"
                    },
                    {
                        "slots": [
                            [
                                "hotel-type",
                                "hotel"
                            ]
                        ],
                        "act": "inform"
                    }
                ],
                "turn_label": [],
                "transcript": "no , that will be all . goodbye .",
                "system_acts": [],
                "domain": "hotel"
            }
        ]
    },
            {
                "system_transcript": "okay , do you have a specific area you want to stay in ?",
                "turn_idx": 1,
                "belief_state": [
                    {
                        "slots": [
                            [
                                "hotel-parking",
                                "yes"
                            ]
                        ],
                        "act": "inform"
                    },
                    {
                        "slots": [
                            [
                                "hotel-pricerange",
                                "cheap"
                            ]
                        ],
                        "act": "inform"
                    },
                    {
                        "slots": [
                            [
                                "hotel-type",
                                "hotel"
                            ]
                        ],
                        "act": "inform"
                    }
                ],
                "turn_label": [
                    [
                        "hotel-parking",
                        "yes"
                    ]
                ],
                "transcript": "no , i just need to make sure it s cheap . oh , and i need parking",
                "system_acts": [
                    "area"
                ],
                "domain": "hotel"
            },

 {
                "system_transcript": "booking was successful . reference number is : 7gawk763 . anything else i can do for you ?",
                "turn_idx": 4,
                "belief_state": [            #信念状态
                    {
                        "slots": [     #槽位-值
                            [
                                "hotel-book day",
                                "tuesday"
                            ]
                        ],
                        "act": "inform"   #意图
                    },
                    {
                        "slots": [
                            [
                                "hotel-book people",
                                "6"
                            ]
                        ],
                        "act": "inform"
                    },
                    {
                        "slots": [
                            [
                                "hotel-book stay",
                                "2"
                            ]
                        ],
                        "act": "inform"
                    },
                    {
                        "slots": [
                            [
                                "hotel-parking",
                                "yes"
                            ]
                        ],
                        "act": "inform"
                    },
                    {
                        "slots": [
                            [
                                "hotel-pricerange",
                                "cheap"
                            ]
                        ],
                        "act": "inform"
                    },
                    {
                        "slots": [
                            [
                                "hotel-type",
                                "hotel"
                            ]
                        ],
                        "act": "inform"
                    }
                ],
                "turn_label": [], 
                "transcript": "no , that will be all . goodbye .",
                "system_acts": [],  #系统动作
                "domain": "hotel"
            }

数据集的本体结构：ontology.json

SLOTS

['hotel-pricerange', 'hotel-type', 'hotel-parking', 'hotel-book stay', 'hotel-book day', 'hotel-book people', 'hotel-area', 'hotel-stars', 'hotel-internet', 'train-destination', 'train-day', 'train-departure', 'train-arriveby', 'train-book people', 'train-leaveat', 'attraction-area', 'restaurant-food', 'restaurant-pricerange', 'restaurant-area', 'attraction-name', 'restaurant-name', 'attraction-type', 'hotel-name', 'taxi-leaveat', 'taxi-destination', 'taxi-departure', 'restaurant-book time', 'restaurant-book day', 'restaurant-book people', 'taxi-arriveby']

{
        "dialogue_idx": "SNG1384.json",
        "domains": [
            "attraction"
        ],
        "dialogue": [
            {
                "system_transcript": "",
                "turn_idx": 0,
                "belief_state": [
                    {
                        "slots": [
                            [
                                "attraction-type",
                                "museum"
                            ]
                        ],
                        "act": "inform"
                    },
                    {
                        "slots": [
                            [
                                "attraction-area",
                                "west"
                            ]
                        ],
                        "act": "inform"
                    }
                ],
                "turn_label": [
                    [
                        "attraction-type",
                        "museum"
                    ],
                    [
                        "attraction-area",
                        "west"
                    ]
                ],
                "transcript": "give me information about museums in the west side of town .",
                "system_acts": [],
                "domain": "attraction"
            },

data_detail = {
                    "ID":dial_dict["dialogue_idx"], 
                    "domains":dial_dict["domains"], 
                    "turn_domain":turn_domain,
                    "turn_id":turn_id, 
                    "dialog_history":source_text, 
                    "turn_belief":turn_belief_list,
                    "gating_label":gating_label, 
                    "turn_uttr":turn_uttr_strip, 
                    'generate_y':generate_y
                    }

#输出格式
{'ID': 'SNG1384.json', 'domains': ['attraction'], 'turn_domain': 'attraction', 'turn_id': 0, 'dialog_history': '; give me information about museums in the west side of town . ;', 'turn_belief': ['attraction-type-museum', 'attraction-area-west'], 'gating_label': [2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, 2, 2, 2, 2, 2, 0, 2, 2, 2, 2, 2, 2, 2, 2], 'turn_uttr': '; give me information about museums in the west side of town .', 'generate_y': ['none', 'none', 'none', 'none', 'none', 'none', 'none', 'none', 'none', 'none', 'none', 'none', 'none', 'none', 'none', 'west', 'none', 'none', 'none', 'none', 'none', 'museum', 'none', 'none', 'none', 'none', 'none', 'none', 'none', 'none']}

data：

{'ID': ['PMUL3559.json'], 'turn_id': [4], 'turn_belief': [['restaurant-food-italian', 'restaurant-pricerange-expensive', 'restaurant-area-centre', 'attraction-type-college', 'attraction-area-centre']], 'gating_label': tensor([[2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, 0, 0, 0, 2, 2, 0, 2, 2,
         2, 2, 2, 2, 2, 2]], device='cuda:0'), 'context': tensor([[   0,   52,   27,   28,   29,   30,  475,   41,  110,  128,   42,  114,
            0,   81,  135,  267,  477,   41,  110,  323,   58,  304,   52,  481,
          505, 2006,   57,  475,   50,    0,  194,   44,   33,  178,   65,   30,
          195, 1040,   58,   87,  164,  208,   38,  573,   32,  479,   50,    0,
          110, 1153,   80,  169,  174,    0,  304,   52,   47,  110,  117,   44,
           68,   50,    0,   67,   44,  110,  117,   80, 2007,   58,   80,   81,
          228,  404,   98,   52,   99,  106,   46,  109,   50,    0,   67,   44,
           52,   83,   65,  311,  312, 1795,   21,   41,  110,  128,   58,    0,
           75,   44,   52,  179,   78,  333,  228,   21,   64,  554,  146,  555,
           64,   58,    0,  135,   81,  228,   21,   64,   33,  730,  522,   22,
           41,  110,   12,   50,    0]], device='cuda:0'), 'context_plain': ['; i am looking for a college in the centre of cambridge ; there are several colleges in the center . may i suggest saint catharine s college ? ; ok , that sounds like a good idea . how much does it cost to visit ? ; the admission is free ! ; may i have the postcode , please ? ; yes , the postcode is cb21rl . is there any thing else i can help you with ? ; yes , i would like an expensive vietnamese restaurant in the centre . ; sorry , i could not find any restaurant -s matching your requirement -s . ; are there any restaurant -s that serve italian food in the area ? ;'], 'turn_uttr_plain': ['sorry , i could not find any restaurant -s matching your requirement -s . ; are there any restaurant -s that serve italian food in the area ?'], 'turn_domain': tensor([1], device='cuda:0'), 'generate_y': tensor([[[212,   2],
         [212,   2],
         [212,   2],
         [212,   2],
         [212,   2],
         [212,   2],
         [212,   2],
         [212,   2],
         [212,   2],
         [212,   2],
         [212,   2],
         [212,   2],
         [212,   2],
         [212,   2],
         [212,   2],
         [128,   2],
         [522,   2],
         [312,   2],
         [128,   2],
         [212,   2],
         [212,   2],
         [475,   2],
         [212,   2],
         [212,   2],
         [212,   2],
         [212,   2],
         [212,   2],
         [212,   2],
         [212,   2],
         [212,   2]]], device='cuda:0'), 'context_len': [125], 'y_lengths': tensor([[2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
         2, 2, 2, 2, 2, 2]], device='cuda:0')}

TRADE 模型：

TRADE(
  (cross_entorpy): CrossEntropyLoss()
  (encoder): EncoderRNN(
    (dropout_layer): Dropout(p=0.2, inplace=False)
    (embedding): Embedding(18311, 400, padding_idx=1)
    (gru): GRU(400, 400, dropout=0.2, bidirectional=True)
  )
  (decoder): Generator(
    (embedding): Embedding(18311, 400, padding_idx=1)
    (dropout_layer): Dropout(p=0.2, inplace=False)
    (gru): GRU(400, 400, dropout=0.2)
    (W_ratio): Linear(in_features=1200, out_features=1, bias=True)
    (softmax): Softmax(dim=1)
    (sigmoid): Sigmoid()
    (W_gate): Linear(in_features=400, out_features=3, bias=True)
    (Slot_emb): Embedding(22, 400)
  )
)

TRADE模型包括三个组件： an utterance encoder（话语编码器）、a slot gate（槽门）和a state generator(状态发生器)。我们的模型没有预测每个预定义的本体项的概率，而是直接生成槽值。我们共享所有模型参数，state generator对每个domain-slot pair用不同的start-of-sentence token开始。TRADE通过context-enhanced的slot gate和copy mechanism解决多领域对话中对轮映射问题，使各领域间共享知识追踪位置的槽值。和Xu and Hu（2018）类似，TARDE采用了一个三分类分类器作为Slot Gate，以判断当前domain-slot是否被提及，在被提及的情况下，TRADE使用soft-gated copy将vocabulary分布和对话历史分布以不同的权重结合为单个输出分布。

1）utterance encoder（话语编码器）

2）State Generator（状态生成器）

3）Slot Gate（槽门）

4）Optimization（优化）

补充copy net

NLP算法之CopyNet 学习笔记 - 知乎

下图更加直观地说明了不同来源答案的预测方式，即对于词表和输入的交集采用两种模式，若仅在一个来源则采用一种模式，两种来源都不包含则使用UNK词语进行预测。

两种模式的选择由以下方式确定。

生成模式：

1）对于词表中的词语，pg为eφg(yt)/Z

2）对于在输入而不在词表中的词语，pg为0，即不从词表预测

3）对于既不在输入也不在词表的词语，pg为eφg(UNK)/Z

复制模式：

1）对于在输入的词语，pg为Σxj=yteφc(xi)/Z，即计算所有和该词相同的输入的分值总和

2）对于不在输入的词语，pg为0，即不从输入预测

TRADE代码解读部分：

model.train_batch(data, int(args['clip']), SLOTS_LIST[1], reset=(i==0)

#輸入数据代码
def train_batch(self, data, clip, slot_temp, reset=0):
    if reset: self.reset()
    # Zero gradients of both optimizers
    self.optimizer.zero_grad()
    
    # Encode and Decode
    use_teacher_forcing = random.random() < args["teacher_forcing_ratio"]

    #
    all_point_outputs, gates, words_point_out, words_class_out = self.encode_and_decode(data, use_teacher_forcing, slot_temp)

    #损失函数1,解码部分
    loss_ptr = masked_cross_entropy_for_value(
        all_point_outputs.transpose(0, 1).contiguous(),
        data["generate_y"].contiguous(), #[:,:len(self.point_slots)].contiguous(), target
        data["y_lengths"]) #[:,:len(self.point_slots)]) #生成的y的长度

    #损失函数2，
    loss_gate = self.cross_entorpy(gates.transpose(0, 1).contiguous().view(-1, gates.size(-1)), data["gating_label"].contiguous().view(-1))

    if args["use_gate"]:
        loss = loss_ptr + loss_gate #同时训练两个损失函数
    else:
        loss = loss_ptr

    self.loss_grad = loss
    self.loss_ptr_to_bp = loss_ptr
  
    # Update parameters with optimizers
    self.loss += loss.data
    self.loss_ptr += loss_ptr.item()
    self.loss_gate += loss_gate.item()

State Generator:

代码：

final_p_vocab = (1 - vocab_pointer_switches).expand_as(p_context_ptr) * p_context_ptr + \ vocab_pointer_switches.expand_as(p_context_ptr) * p_vocab

pred_word = torch.argmax(final_p_vocab, dim=1)

combined_emb = domain_emb + slot_emb  #领域和槽值一起输入 (domain, slot)dec_state, hidden = self.gru(decoder_input.expand_as(hidden), hidden)

ywm-pku

关注

1
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
对话状态追踪TRADE模型数据和代码解读

数据集：Multi-Domain Wizard-of-Oz 数据集 (MultiWOZ)，一个完全标记的人类与人类书面对话的集合，跨越多个领域和主题。在 10k 个对话的大小上，它至少比以前所有带注释的面向任务的语料库大一个数量级。...
复制链接

扫一扫