本数据集提供了一组电子邮件的集合。
The datasets provided below are sets of emails.
目的是确定电子邮件中的哪些部分会涉及到人名。
The goal is to identify which parts of the email refer to a person name.
此任务是信息提取在一般问题领域的一个示例。
This task is an example of the general problem area of Information Extraction.
项目思路:
将任务建模为一个序列标记问题,其中每个电子邮件都是一系列标记,每个标记都可以有一个“人名”或“非人名”标签。
Model the task as a Sequential Labeling problem, where each email is a sequence of tokens, and each token can have either a label of “person-name” or “not-a-person-name”.
电子邮件数据集网址:
http://www.cs.cmu.edu/~einat/datasets.html
小论文:从电子邮件中提取个人姓名:将姓名识别应用于非正式文本
Extracting Personal Names from Email: Applying Named Entity Recognition to Informal Text
下载论文地址:
http://page2.dfpan.com/fs/blcaj2921529716f8d4/
更多精彩文章请关注微信号: