win7下myeclipse部署nutch1.2报Expecting a line not the end of stream异常解决

在win7通过myeclipse部署nutch1.2源码,报如下异常:

2011-10-28 00:09:37,784 WARN  mapred.LocalJobRunner (LocalJobRunner.java:run(256)) - job_local_0001
java.io.IOException: Expecting a line not the end of stream
at org.apache.hadoop.fs.DF.parseExecResult(DF.java:109)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:179)
at org.apache.hadoop.util.Shell.run(Shell.java:134)
at org.apache.hadoop.fs.DF.getAvailable(DF.java:73)
at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:329)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
at org.apache.hadoop.mapred.MapOutputFile.getSpillFileForWrite(MapOutputFile.java:107)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1221)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1129)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:359)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
2011-10-28 00:09:38,174 INFO  mapred.JobClient (JobClient.java:monitorAndPrintJob(1288)) -  map 0% reduce 0%
Exception in thread "main" java.io.IOException: Job failed!
2011-10-28 00:09:38,174 INFO  mapred.JobClient (JobClient.java:monitorAndPrintJob(1343)) - Job complete: job_local_0001
2011-10-28 00:09:38,174 INFO  mapred.JobClient (Counters.java:log(514)) - Counters: 0
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
at org.apache.nutch.crawl.Injector.inject(Injector.java:217)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:124)

在网上找了很多资料,有说改cygwin语言环境的,有说是权限问题的但是实验了都不行,只好自己追溯问题,首先找到异常抛出方法:

DF类中的parseExecResult

  protected void parseExecResult(BufferedReader lines) throws IOException {
    lines.readLine();                         // skip headings
  
    String line = lines.readLine();
    if (line == null) {
      throw new IOException( "Expecting a line not the end of stream" );
    }
    StringTokenizer tokens =
      new StringTokenizer(line, " \t\n\r\f%");
    
    this.filesystem = tokens.nextToken();
    if (!tokens.hasMoreTokens()) {            // for long filesystem name
      line = lines.readLine();
      if (line == null) {
        throw new IOException( "Expecting a line not the end of stream" );//这就是105行了
      }
      tokens = new StringTokenizer(line, " \t\n\r\f%");
    }
    this.capacity = Long.parseLong(tokens.nextToken()) * 1024;
    this.used = Long.parseLong(tokens.nextToken()) * 1024;
    this.available = Long.parseLong(tokens.nextToken()) * 1024;
    this.percentUsed = Integer.parseInt(tokens.nextToken());
    this.mount = tokens.nextToken();
  }

打印103行,是能取到值的,但是乱码,发生错行,将第二行的数据放入了第一行,导致了105的错误,

按照http://hi.baidu.com/amdkings/blog/item/b589a5f56c1ddae17609d78f.html博文中的设置了myeclipse的编译环境还是不行,继续

往前追溯错误抛出在

at org.apache.hadoop.util.Shell.at org.apache.hadoop.util.Shell.runCommand(Shell.java:179)(Shell.java:179)

即是shell类中的runCommand方法调用

 parseExecResult(inReader); // parse the output

在该方法中找到inReader变量的定义及初始化位置如下:

 BufferedReader inReader =  new BufferedReader(new InputStreamReader(process .getInputStream()));

很明显因为inReader 初始化没有进行charset设置,设置charset如下:

BufferedReader inReader =  new BufferedReader(new InputStreamReader(process  .getInputStream(),"utf-8"));

然后再运行,至此可正确往后运行


根据分析过程可得

临时解决办法:

将shell.java类中inReader变量进行编码设置,就是

BufferedReader inReader =  new BufferedReader(new InputStreamReader(process  .getInputStream(),"utf-8"));

较好实践思路:

cygwin中设置英文环境export LANG="en.UTF-8",df是变成英文显示了,但是在myeclipse里是不起作用的,还是中文乱码,可以考虑下载或者用什么方式

将cygwin改成英文环境,使得myeclipse读到英文环境,这样nutch1.2的源码就不需要调整即可运行了


  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值