使用 intellij 远程调试 hdfs 源码 经验总结

最近在阅读hadoop源码,有时候为搞清楚来龙去脉,必要时得做debug。
在搭建调试环境的时候,遇到不少问题,最后逐一解决。在此分享给大家,以飨读者、同仁。

NoClassDefFoundError

第一个问题,莫名其妙,类找不到,代码都没标红,排查了很久以为环境没搭好。

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/Path
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:195)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:123)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.Path
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    ... 3 more

最后才发现,pom.xml 文件多写了一行 “provided”,因为是从maven repo直接复制过来的,代码也没报错,就没注意,注释掉就好了。

<dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-common</artifactId>
      <version>2.6.0</version>
      <!--<scope>provided</scope>-->
    </dependency>

Could not locate executable null\bin\winutils.exe in the Hadoop binaries

接下来遇到第二个问题

18/05/17 13:54:44 ERROR util.Shell: Failed to locate
 the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable 
null\bin\winutils.exe in the Hadoop binaries.

解决办法:在代码里 添加 hadoop_home 系统变量,并且下载winutils.exe,放到hadoop_home 下的 bin 目录

System.setProperty("hadoop.home.dir", "D:\\aws\\hadoop-2.6.0");

No FileSystem for scheme: hdfs

找不到相应的文件系统

java.io.IOException: No FileSystem for scheme: hdfs
    at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)

代码虽然不报错,但是运行是缺少相应的依赖包
解决办法是,在pom.xml,增加依赖

<dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-hdfs</artifactId>
      <version>2.6.0</version>
    </dependency>

访问权限问题

org.apache.hadoop.security.AccessControlException: Permission denied: 
user=XXX, access=WRITE, inode="/user":root:supergroup:drwxr-xr-x
    at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.
    checkFsPermission(FSPermissionChecker.java:271)

window7 客户端 的系统用户 和 hdfs 用户不一致,
解决办法是在代码里增加系统变量

System.setProperty("HADOOP_USER_NAME", "root");

java.net.UnknownHostException

Exception in thread "main" java.lang.IllegalArgumentException: java.net.UnknownHostException: 1.txt

问题是 文件URI,hdfs 后面 要么 是“HDFS:” 后面直接加三个 斜杠,或者 是”IP:port” 再加斜杠, 千万别写成”hdfs://1.txt”。
正确的写法如下

 fs = FileSystem.get(URI.create("hdfs:///1.txt"), conf);
 或
 fs = FileSystem.get(URI.create("hdfs://10.205.84.14:9000/1.txt"), conf);

总结

  • 添加resource文件夹至客户端工程项目里,resource 文件夹里包含两个文件 “core-site.xml”、“log4j.properties”,这两个文件从hadoop集群中获取,保持一致
  • 编写好pom文件,主要是 “hadoop-common” “hadoop-hdfs”
  • 在客户端代码里配好环境变量(环境变量也可以通过其他方式获得)

客户端代码样例

package test.test;


import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;

import java.io.IOException;
import java.net.URI;

/**
 * Created by shen.xiangxiang on 2018/3/30.
 */
public class TestHdfs {

    public static void main( String[] args )
    {

        System.setProperty("hadoop.home.dir", "D:\\aws\\hadoop-2.6.0");
        System.setProperty("HADOOP_USER_NAME", "root");
        Configuration conf = new Configuration();
        FileSystem fs = null;
        try {
            fs = FileSystem.get(URI.create("hdfs:///1.txt"), conf);
        Path path = new Path("3.txt");
        FSDataOutputStream out = fs.create(path);   //创建文件
        out.write("hello".getBytes("UTF-8"));
            out.writeUTF("da jia hao,cai shi zhen de hao!");
            out.close();
        } catch (IOException e) {
            e.printStackTrace();
        }

    }

}

pom文件样例

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>

  <groupId>test.test</groupId>
  <artifactId>test_debug_hadoop</artifactId>
  <version>1.0-SNAPSHOT</version>
  <packaging>jar</packaging>

  <name>testtest</name>
  <url>http://maven.apache.org</url>

  <properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
  </properties>



  <dependencies>
    <!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common -->
    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-common</artifactId>
      <version>2.6.0</version>
      <!--<scope>provided</scope>-->
    </dependency>

    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-hdfs</artifactId>
      <version>2.6.0</version>
    </dependency>

    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>3.8.1</version>
      <scope>test</scope>
    </dependency>
  </dependencies>
</project>
阅读更多
想对作者说点什么?

博主推荐

换一批

没有更多推荐了,返回首页