（6）使用Lucene、LingPipe做实体链接（Entity Linking）——使用了LingPipe构建entity索引

最新推荐文章于 2022-09-03 18:59:48 发布

mmc2015

最新推荐文章于 2022-09-03 18:59:48 发布

阅读量1.3k

点赞数 1

分类专栏：实体链接（entity linking）文章标签： Lucene LingPipe 实体链接 entity linking

本文链接：https://blog.csdn.net/mmc2015/article/details/50513812

版权

实体链接（entity linking）专栏收录该内容

8 篇文章 4 订阅

订阅专栏

上一篇做的工作是：使用Lucene构建歧义实体映射index、歧义实体上下文index

这是还差entities的index，否则怎么查entities呢！

LingPipe是个天然的entities recognise工具，有很多用法，具体参考官网，文末给出了链接。

使用LingPipe构建entities的index不多说了，直接上代码：

//entityDictionaryChunkerFF, LingPipe
	//index all entitys
	public static void BuildEntityDictionary() throws Exception
	{
		double CHUNK_SCORE = 1.0;
		//String entityPath="E:/LuceneDocument/long_abstracts_preprocessing_entity(file_contents_examples).txt";
		String entityPath="E:/LuceneDocument/long_abstracts_preprocessing_entity.txt";
		
		MapDictionary<String> dictionary = new MapDictionary<String>();
		
		FileReader fr=new FileReader(entityPath);
        BufferedReader br=new BufferedReader(fr);
        String entity="";
        int i=0;
        while ((entity=br.readLine())!=null) 
        {
        	i++;
        	if(i>500000) //共有463万，这里只取前100万
        	{
        		break;
        	}
            System.out.println(i+"=>"+entity);
            dictionary.addEntry(new DictionaryEntry<String>(entity,"DBpedia_entity",CHUNK_SCORE));
        }
        br.close();
        fr.close();
        
        entityDictionaryChunkerFF = new ExactDictionaryChunker(dictionary,
                                         IndoEuropeanTokenizerFactory.INSTANCE,
                                         false,false); 
        //All matches is false, Case sensitive is false
        //difference see http://alias-i.com/lingpipe/demos/tutorial/ne/read-me.html
        //FF can recognize "German Empire", but TF can't 

        System.out.println("dictionary size:\n" + dictionary.size());
	}

有了entities的index，就可以做entities linking了，参考下一篇。

参考文献：

[1] Mendes, Pablo N, Jakob, Max, Garc&#, et al. DBpedia spotlight: Shedding light on the web of documents[C]// Proceedings of the 7th International Conference on Semantic Systems. ACM, 2011:1-8.

[2] Han X, Sun L. A Generative Entity-Mention Model for Linking Entities with Knowledge Base.[J]. Proceeding of Acl, 2011:945-954.

[3] http://lucene.apache.org/

[4] http://alias-i.com/lingpipe/demos/tutorial/ne/read-me.html

[5] http://wiki.dbpedia.org/Downloads2014

[6] http://www.oschina.net/p/jieba（结巴分词）