- 博客(5)
- 资源 (6)
- 收藏
- 关注
原创 nutch2.3.1爬取marker流程
crawlstatus: STATUS_UNFETCHED = 0x01; //Page was not fetched yet STATUS_FETCHED = 0x02; //Page was successfully fetched STATUS_GONE = 0x03; //Page no longer exists ST
2016-11-08 16:54:53 850
原创 nutch2.3.1 构建solr6索引时meta_keywords longer than the max length 32766
解决办法有2 1是在managed schema置meta_* 的index=false 2是修改nutch代码MetaTagsParser.java如下 private void addIndexedMetatags(Map<CharSequence, ByteBuffer> metadata, String metatag, String value) { //ad
2016-11-03 21:41:54 1002
原创 nutch2.3.1 SolrDeleteDuplicates.java 去重时空指针崩溃
修改源代码如下: @Override public boolean nextKeyValue() throws IOException, InterruptedException { while(true){ if (currentDoc >= numDocs) { return false;
2016-11-02 15:56:24 583
原创 nutch2.3.1 updatejob时错误url导致崩溃
原因可能是错误的html解析出来的 在DbUpdateMapper.java的map时加个trycatch 55 @Override 56 public void map(String key, WebPage page, Context context) 57 throws IOException, InterruptedException { 58 if (Mark.
2016-11-01 15:21:34 778
原创 nutch2.3.1 nutch-site.xml配置
<configuration> <property> <name>storage.data.store.class</name> <value>org.apache.gora.mongodb.store.MongoStore</value> </property> <property> <name>http.agent.name</name> <value>User-
2016-11-01 10:54:40 984
C++ Templates.pdf
2011-05-29
SkinMagicToolkit破解版.rar
2011-01-13
SkinMagic.dll
2011-01-13
SkinH_VC.rar
2011-01-13
空空如也
TA创建的收藏夹 TA关注的收藏夹
TA关注的人