问题:nutch Content of size 94218 was truncated to 65536
解:需要 把nutch-site.xml中加入file.content.limit 和http.content.limit 配置,且设置原65535为-1,
然后把mysql里 my.ini加入以下配置
[mysqldump]
quick
max_allowed_packet=20M
问题:
Exception
in
thread
"main"
java
.lang
.RuntimeException
:
job
failed
:
name
=
generate
:
null
,
jobid
=
job_local177967844_0002
at
org
.apache
.nutch
.util
.NutchJob
.waitForCompletion
(
NutchJob
.java
:
54
),查询发现utf8 传入了空,log报空指针
解:<property><name>generate.batch.id</name><value>*</value></property>
问题 缺batchid
解,在webpage表加入batchId varchar(767) default null