terrier建立索引遇到的问题,以及解决方案

Issuse1:java.lang.OutOfMemoryError: Java heap space


具体错误如下:

java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
at gnu.trove.TObjectIntHashMap.rehash(TObjectIntHashMap.java:170)
at gnu.trove.THash.postInsertHook(THash.java:359)
at gnu.trove.TObjectIntHashMap.put(TObjectIntHashMap.java:155)
at org.terrier.utility.TermCodes.getCode(TermCodes.java:100)
at org.terrier.structures.indexing.DocumentPostingList.getTermId(DocumentPostingList.java:133)
at org.terrier.structures.indexing.DocumentPostingList$2.execute(DocumentPostingList.java:168)
at org.terrier.structures.indexing.DocumentPostingList$2.execute(DocumentPostingList.java:166)
at gnu.trove.TObjectIntHashMap.forEachEntry(TObjectIntHashMap.java:426)
at org.terrier.structures.indexing.DocumentPostingList.getPostings2(DocumentPostingList.java:165)
at org.terrier.indexing.BasicIndexer.indexDocument(BasicIndexer.java:368)
at org.terrier.indexing.BasicIndexer.createDirectIndex(BasicIndexer.java:261)
at org.terrier.indexing.Indexer.index(Indexer.java:344)
at org.terrier.applications.TRECIndexing.index(TRECIndexing.java:123)
at org.terrier.applications.TrecTerrier.run(TrecTerrier.java:390)
at org.terrier.applications.TrecTerrier.applyOptions(TrecTerrier.java:573)
at org.terrier.applications.TrecTerrier.main(TrecTerrier.java:237)
21877.18user 916.34system 6:01:37elapsed 105%CPU (0avgtext+0avgdata 0maxresident)k
45946520inputs+21416016outputs (1major+1978833minor)pagefaults 0swaps


解决方案:

increased the maximum Java Heap Space to 2GB, by setting TERRIER_HEAP_MEM to 2048M in bin/terrier-env.sh.

And It seems to be running smoothly.


Issuse2:(可以概括为Key is not unique)

ERROR - This index (Index(/users/ishanic/terrier-3.0/var/index,data_1)) doesnt have an index structure called lexicon-keyfactory: property index.lexicon-keyfactory.class not found
ERROR - Valid structures are: [document-inputstream, meta-inputstream, document-factory, meta, document]


具体错误如下:

NFO - Collection #0 took 183780 seconds to build the runs for 20000000 documents

ERROR - Problem finishing index
java.io.IOException: Key is not unique: 38131,3514
at org.terrier.structures.collections.FSOrderedMapFile$MultiFSOMapWriter.mergeTwo(FSOrderedMapFile.java:908)
at org.terrier.structures.collections.FSOrderedMapFile$MultiFSOMapWriter.close(FSOrderedMapFile.java:861)
at org.terrier.structures.indexing.CompressingMetaIndexBuilder.close(CompressingMetaIndexBuilder.java:259)
at org.terrier.indexing.BasicSinglePassIndexer.createInvertedIndex(BasicSinglePassIndexer.java:274)
at org.terrier.indexing.BasicSinglePassIndexer.createDirectIndex(BasicSinglePassIndexer.java:147)
at org.terrier.indexing.Indexer.index(Indexer.java:344)
at org.terrier.applications.TRECIndexing.createSinglePass(TRECIndexing.java:221)
at org.terrier.applications.TrecTerrier.run(TrecTerrier.java:384)
at org.terrier.applications.TrecTerrier.applyOptions(TrecTerrier.java:573)
at org.terrier.applications.TrecTerrier.main(TrecTerrier.java:237)
INFO - Optimising structure lexicon
ERROR - This index (Index(/users/ishanic/terrier-3.0/var/index,data_1)) doesnt have an index structure called lexicon-keyfactory: property index.lexicon-keyfactory.class not found
ERROR - Valid structures are: [document-inputstream, meta-inputstream, document-factory, meta, document]
ERROR - This index (Index(/users/ishanic/terrier-3.0/var/index,data_1)) doesnt have an index structure called lexicon-valuefactory: property index.lexicon-valuefactory.class not found
ERROR - Valid structures are: [document-inputstream, meta-inputstream, document-factory, meta, document]
A problem occurred: java.lang.NullPointerException
java.lang.NullPointerException
at org.terrier.structures.collections.FSOrderedMapFile.numberOfEntries(FSOrderedMapFile.java:490)
at org.terrier.structures.FSOMapFileLexicon.optimise(FSOMapFileLexicon.java:389)
at org.terrier.structures.indexing.LexiconBuilder.optimise(LexiconBuilder.java:790)
at org.terrier.indexing.BasicIndexer.finishedInvertedIndexBuild(BasicIndexer.java:438)
at org.terrier.indexing.BasicSinglePassIndexer.createInvertedIndex(BasicSinglePassIndexer.java:292)
at org.terrier.indexing.BasicSinglePassIndexer.createDirectIndex(BasicSinglePassIndexer.java:147)
at org.terrier.indexing.Indexer.index(Indexer.java:344)
at org.terrier.applications.TRECIndexing.createSinglePass(TRECIndexing.java:221)
at org.terrier.applications.TrecTerrier.run(TrecTerrier.java:384)
at org.terrier.applications.TrecTerrier.applyOptions(TrecTerrier.java:573)
at org.terrier.applications.TrecTerrier.main(TrecTerrier.java:237)


解决方案:

这些问题是建立meta index 造成的,以下为解决方案:


Give this issue some thought.

* My initial idea was that your indexer.meta.forward.keylens was too small, but this is not the case.

* The error is occurring when building the reverse lookup table (docno -> docid). Will you need this functionality? If not, then you can disable it using indexer.meta.reverse.keys= during indexing.

* Otherwise, can you alter the exception being raised in FSOrderedMapFile to print the value of the key that is causing the collision?


我采用的是第二种,问题迎刃而解。



  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值