简单手撸代码进入SMP2018中文人机对话技术评测任务一前三甲

如果你对自然语言处理和深度学习感兴趣,一定要看看本文,怎么从原始文本数据处理到训练模型到构建应用。过程很重要!总有一点你会有收获。

SMP2018中文人机对话技术评测由中国中文信息学会社会媒体处理专委会主办,哈尔滨工业大学、科大讯飞股份有限公司承办,讯飞公司提供数据,华为公司提供奖金。

这是刚刚出炉的排行榜:

最近刚好在做人机对话的研究,马上把这个任务手撸了,简简单单进入前三(最基础的模型的 F1 值在82左右),并把资源放在了 GitHub SMP2018 上(呜呜呜,可惜以前没去参赛),欢迎大家在我的基础模型上做的更好!

根据训练的模型,我还做了个简单的人机对话应用,也放在GitHub SMP2018 上,如果没有兴趣玩模型,就来玩这个应用吧!

比如这个应用可以对你说的话进行分大类:

 今天东莞天气如何
----------
predict label:	 datetime
----------

 怎么治疗感冒?
----------
predict label:	 health
----------

 你好?
----------
predict label:	 chat
----------
复制代码

评测任务概述

  • 本届人机对话技术评测主要包括两个任务,参赛者可以选择参加任意一个任务或全部任务。

任务1:用户意图领域分类

在人机对话系统的应用过程中,用户可能会有多种意图,相应地会触发人机对话系统中的多个领域(domain) ,其中包括任务型垂直领域(如查询机票、酒店、公交车等)、知识型问答以及闲聊等。因而,人机对话系统的一个关键任务就是正确地将用户的输入分类到相应的领域(domain)中,从而才能返回正确的回复结果。

【用户意图领域分类示例】

1) 你好啊,很高兴见到你! — 闲聊类

2) 我想订一张去北京的机票。 — 任务型垂类(订机票)

3) 我想找一家五道口附近便宜干净的快捷酒店 — 任务型垂类(订酒店)

【评测说明】

评测任务1包含闲聊和垂类两大类,其中垂类又细分为30个垂直领域。本次评测任务1中,仅考虑针对单轮对话用户意图的领域分类,多轮对话整体意图的领域分类不在此次评测范围之内。

本比赛的相关连接
CodaLab评测主页
数据下载
CodaLab 评测教程
评测排行榜
SMP2018-ECDT评测主页
SMP2018-ECDT评测成绩公告链接

我的解决办法

就这个简单的模型,真的就冲进了前三名。

我觉得前期的原始文本处理和后面的构建应用也许更有趣!

如果你对深度学习感兴趣,可以来我的博客交流喔!

======================================================================== MICROSOFT FOUNDATION CLASS LIBRARY : 考试 ======================================================================== AppWizard has created this 考试 application for you. This application not only demonstrates the basics of using the Microsoft Foundation classes but is also a starting point for writing your application. This file contains a summary of what you will find in each of the files that make up your 考试 application. 考试.dsp This file (the project file) contains information at the project level and is used to build a single project or subproject. Other users can share the project (.dsp) file, but they should export the makefiles locally. 考试.h This is the main header file for the application. It includes other project specific headers (including Resource.h) and declares the CMyApp application class. 考试.cpp This is the main application source file that contains the application class CMyApp. 考试.rc This is a listing of all of the Microsoft Windows resources that the program uses. It includes the icons, bitmaps, and cursors that are stored in the RES subdirectory. This file can be directly edited in Microsoft Visual C++. 考试.clw This file contains information used by ClassWizard to edit existing classes or add new classes. ClassWizard also uses this file to store information needed to create and edit message maps and dialog data maps and to create prototype member functions. res\考试.ico This is an icon file, which is used as the application's icon. This icon is included by the main resource file 考试.rc. res\考试.rc2 This file contains resources that are not edited by Microsoft Visual C++. You should place all resources not editable by the resource editor in this file. ///////////////////////////////////////////////////////////////////////////// AppWizard creates one dialog class: 考试Dlg.h, 考试Dlg.cpp - the dialog These files contain your CMyDlg class. This class defines the behavior of your application's main dialog. The dialog's template is in 考试.rc, which can be edited in Microsoft Visual C++. ///////////////////////////////////////////////////////////////////////////// Other standard files: StdAfx.h, StdAfx.cpp These files are used to build a precompiled header (PCH) file named 考试.pch and a precompiled types file named StdAfx.obj. Resource.h This is the standard header file, which defines new resource IDs. Microsoft Visual C++ reads and updates this file. ///////////////////////////////////////////////////////////////////////////// Other notes: AppWizard uses "TODO:" to indicate parts of the source code you should add to or customize. If your application uses MFC in a shared DLL, and your application is in a language other than the operating system's current language, you will need to copy the corresponding localized resources MFC42XXX.DLL from the Microsoft Visual C++ CD-ROM onto the system or system32 directory, and rename it to be MFCLOC.DLL. ("XXX" stands for the language abbreviation. For example, MFC42DEU.DLL contains resources translated to German.) If you don't do this, some of the UI elements of your application will remain in the language of the operating system. /////////////////////////////////////////////////////////////////////////////
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值