自然语言处理人名识别常用词典

1.中文常见姓氏词典

该词典来源于盘古分词中文分词开源软件,盘古分词用该词典识别人名

http://pangusegment.codeplex.com/SourceControl/latest#PanGuSegment/PanGu/Dict/ChsName.cs

//有明显歧异的姓氏

"王","张","黄","周","徐","胡","高","林","马","于",

"程","傅","曾","叶","余","夏","钟","田","任","方",

"石","熊","白","毛","江","史","候","龙","万","段"

"雷","钱","汤","易","常","武","赖","文","查"

//没有明显歧异的姓氏 

"赵","肖","孙","李","吴","郑","冯","陈"

"褚","卫","蒋","沈","韩","杨","朱","秦"

"尤","许","何","吕","施","桓","孔","曹"

"严","华","金","魏","陶","姜","戚","谢"

"邹","喻","柏","窦","苏","潘","葛","奚",

"范","彭","鲁","韦","昌","俞","袁","酆"

"鲍","唐","费","廉","岑","薛","贺","倪",  

"滕","殷","罗","毕","郝","邬","卞","康",  

"卜","顾","孟","穆","萧","尹","姚","邵",  

"湛","汪","祁","禹","狄","贝","臧","伏",

"戴","宋","茅","庞","纪","舒","屈","祝"

"董","梁","杜","阮","闵","贾","娄","颜",

"郭","邱","骆","蔡","樊","凌","霍","虞"

"柯","昝","卢","柯","缪","宗","丁","贲",

"邓","郁","杭","洪","崔","龚","嵇","邢",

"滑","裴","陆","荣","荀","惠","甄","芮",  

"羿","储","靳","汲","邴","糜","隗","侯"

"宓","蓬","郗","仲","栾","钭","历","戎"

"刘","詹","幸","韶","郜","黎","蓟","溥",

"蒲","邰","鄂","咸","卓","蔺","屠","乔",

"郁","胥","苍","莘","翟","谭","贡","劳"

"冉","郦","雍","璩","桑","桂","濮","扈",

"冀","浦","庄","晏","瞿","阎","慕","茹",  

"习","宦","艾","容","慎","戈","廖","庾",  

"衡","耿","弘","匡","阙","殳","沃","蔚",  

"夔","隆","巩","聂","晁","敖","融","訾",  

"辛","阚","毋","乜","鞠","丰","蒯","荆",  

"竺","盍","单","欧"

//复姓 

"司马","上官","欧阳","夏侯","诸葛","闻人",  

"东方","赫连","皇甫","尉迟","公羊","澹台",  

"公冶","宗政","濮阳","淳于","单于","太叔",  

"申屠","公孙","仲孙","轩辕","令狐","徐离",  

"宇文","长孙","慕容","司徒","司空","万俟"


2.双字人名的首字词典

//该词典来源于开源软件盘古分词ChsDoubleName1.txt词典,盘古分词用该词典识别人名

建,小,,文,志,,玉,丽,永,海,春,金,明,新,德,秀,红,亚,, 三

,雪,俊, 桂, 爱, 美, 世, 正, 庆, 学, 家, 立, 淑, 振, 云, 华, 光, 惠, 兴, 天, 长, 艳, 慧, 利, 宏, 佳, 瑞, 凤, 荣, 秋,

, 嘉, 卫, 燕, 思, 维, 少, 福, 忠, 宝, 子, 成, 月, 洪, 东, 一, 泽, 林, 大, 素, 旭, 宇, 智, 锦, 冬, 玲, 雅, 伯, 翠, 传

, 剑, 安, 树, 良, 中, 梦, 广, 昌, 元, 万, 清, 静, 友, 宗, 兆, 丹, 克, 彩, 绍, 喜, 远, 朝, 敏, 培, 胜, 祖, 先, 菊, 士

, 有, 连, 军, 健, 巧, 耀, 莉, 英, 方, 和, 仁, 孝, 梅, 汉, 兰, 松, 水, 江, 益, 开, 景, 运, 贵, 祥, 青, 芳, 碧, 婷, 龙

, 自, 顺, 双, 书, 生, 义, 跃, 银, 佩, 雨, 保, 贤, 仲, 鸿, 浩, 加, 定, 炳, 飞, 锡, 柏, 发, 超, 道, 怀, 进, 其, 富, 平

, 阳, 吉, 茂, 彦, 诗, 洁, 润, 承, 治, 焕, 如, 君, 增, 善, 希, 根, 应, 勇, 宜, 守, 会, 凯, 育, 湘, 凌, 本, 敬, 博, 延


2.双字人名的末字词典

// 该词典来源于开源软件盘古分词ChsDoubleName2.txt词典,盘古分词用该词典识别人名

, , 平, 明, 英, 军, 林, 萍, 芳, 玲, 红, 生, 霞, 梅, 文, 荣, 珍, 兰, 娟, 峰, 琴, 云, 辉, 东, 龙, 敏, 伟, 强, 丽, 春, 杰

, 民, 君, 波, 国, 芬, 清, 祥, 斌, 婷, 飞, 良, 忠, 新, 凤, 锋, 成, 勇, 刚, 玉, 元, 宇, 海, 兵, 安, 庆, 涛, 鹏, 亮, 青, 阳,

, 松, 江, 莲, 娜, 兴, 光, 德, 武, 香, 俊, 秀, 慧, 雄, 才, 宏, 群, 琼, 胜, 超, 彬, 莉, 中, 山, 富, 花, 宁, 利, 贵, 福, 发,

, 蓉, 喜, 娥, 昌, 仁, 志, 全, 宝, 权, 美, 琳, 建, 金, 贤, 星, 丹, 根, 和, 珠, 康, 菊, 琪, 坤, 泉, 秋, 静, 佳, 顺, 源, 珊

, 欣, 如, 莹, 章, 浩, 勤, 芹, 容, 友, 芝, 豪, 洁, 鑫, 惠, 洪, 旺, 虎, 远, 妮, 森, 妹, 南, 雯, 奇, 健, 卿, 虹, 娇, 媛, 怡,

, 川, 进, 博, 智, 来, 琦, 学, 聪, 洋, 乐, 年, 翔, 然, 栋, 凯, 颖, 鸣, 丰, 瑞, 奎, 立, 堂, 威, 雪, 鸿, 晶, 桂, 凡, 娣, 先,

, 毅, 雅, 月, 旭, 田, 晖, 方, 恒, 亚, 泽, 风, 银, 高, 贞, 九


3.单字人名常用字词典

/ /该词典来源于开源软件盘古分词ChsSingleName.txt词典,盘古分词用该词典识别人名

, 伟, 勇, 军, 斌, 静, 丽, 涛, 芳, 杰, 萍, 强, 俊, 明, 燕, 磊, 玲, 华, 平, 鹏, 健, 波, 红, 丹, 辉, 超, 艳, 莉, 刚, 娟, 峰

, 亮, 洁, 颖, 琳, 英, 慧, 飞, 霞, 浩, 凯, 宇, 毅, 林, 佳, 云, 莹, 娜, 晶, 洋, 文, 鑫, 欣, 琴, 宁, 琼, 兵, 青, 琦, 翔, 彬

, 阳, 璐, 旭, 蕾, 剑, 虹, 蓉, 建, 倩, 梅, 宏, 威, 博, 君, 力, 龙, 晨, 薇, 雪, 琪, 欢, 荣, 江, 炜, 成, 庆, 冰, 东, 帆, 雷,

, 锐, 进, 海, 凡, 巍, 维, 迪, 媛, 玮, 杨, 群, 瑛, 悦, 春, 瑶, 婧, 兰, 茜, 松, 爽, 立, 瑜, 睿, 晖, 聪, 帅, 瑾, 骏, 雯, 晓

, 勤, 新, 瑞, 岩, 星, 忠, 志, 怡, 坤, 康, 航, 利, 畅, 坚, 雄, 智, 萌, 哲, 岚, 洪, 捷, 珊, 恒, 靖, 清, 扬, 昕, 乐, 武, 玉

, 菲, 锦, 凤, 珍, 晔, 妍, 璇, 胜, 菁, 科, 芬, 露, 越, 彤, 曦, 义, 良, 鸣, 芸, 方, 月, 铭, 光, 震, 冬, 源, 政, 虎, 莎, 彪

, 钢, 凌, 奇, 卫, 彦, 烨, 可, 黎, 川, 淼, 惠, 祥, 然, 三




中文信息计算机自动处理的研究已有几十年的 历史 , 但至今仍有许多技术难题没有得到很好解 决 , 中文姓名自动识别问题就是其中的一个。由于 它与中文文本的自动分词一样 , 属于中文信息处理 的基础研究领域 , 因而它的研究成果直接影响到中 文信息的深层次研究。汉语的自身特点使得中文信 息自动处理大多是先对要处理的文本进行自动分词 (加入显式分割符) , 然后再在分词的基础上进行词 法、语法、语义等方面的深入分析。而在分词阶 段 , 文本中的人名、地名以及其它专有名词和生词 大多被切分成单字词 , 在这种情形下如不能很好地 解决汉语文本中专有名词生词的识别问题 , 将给其 后的汉语文本的深入分析带来难以逾越的障碍。中 文姓名的自动识别问题就是在这种背景下提出来 的。对这一问题的研究目前采用的技术中主要利用 以下几方面的信息: 姓名用字的频率信息、上下文 信息[1 ,2 ] 、语料库统计信息[2 ] 、词性信息等[3 ] 。本 文的方法是 , 首先对中文人名的构成、姓名用字的 规律及上下文文本信息特征进行充分分析 , 在此基 础上建立起两组规则集 , 将其作用于测试文本 , 获 得初步识别结果 , 再利用大规模语料库的统计信息 对初步识别结果进行概率筛选 , 设定合适的阈值 , 输出最终识别结果。经对 50 多万字的开放语料测 试 , 系统自动识别出 1781 个中文人名 , 在不同的 筛选阈值下获得 90 %以上的识别准确率 , 而召回 率高于 91 %。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值