准备
Hadoop集群
上一篇讲到的Hadoop环境搭建,开启Hadoop几个服务
./sbin/start-dfs.sh
./sbin/start-yarn.sh
./sbin/mr-jobhistory-daemon.sh start historyserver
使用jps查看是否执行成功 命令:start-all.sh已经不推荐使用了。
[root@hadoop01 hadoop-2.6.0]# jps
1941 JobHistoryServer
1665 ResourceManager
1355 NameNode
1977 Jps
1497 SecondaryNameNode
IDEA+MAVEN
安装好IDEA,配置好MAVEN即可
WINDOWS系统账号
(同Linux下的Hadoop运行账号,如root)
如果在Windows中,新建一个账号,用户名为root(必须和Hadoop运行的账号一致,比如叫root、hadoop等等)
新建好之后,注销该账号登录的系统即可,不用在该账号下运行。
搭建
Hadoop
如果在调试HDFS功能,拒绝访问,并且在测试环境下,尝试下述做法
1、 调用hdfs无需使用和运行hadoop用户名一致前提,但是需要到hdfs-site.xml中设置permission=false
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop01:9001</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/hadoop-2.6.0/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/hadoop-2.6.0/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
IDEA工程
新建一个maven工程:hadoop
1、POM依赖
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.6.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.6.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.6.0</version>
</dependency>
</dependencies>
2、新建一个测试类Test
在hadoop中已经通过执行wordcount导入了input和output的fs文件,这里通过hdfs的api进行调试
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import java.io.InputStream;
import java.net.URI;
/**
* Created with j360 -> me.h360.hdfs.
* User: min_xu
* Date: 2015/4/14
* Time: 9:05
* 说明:测试hdfs的文件的情况
*/
public class Test {
public static void main(String[] args) throws Exception {
//hdfs的地址
String uri = "hdfs://192.168.145.128:9000/";
Configuration config = new Configuration();
FileSystem fs = FileSystem.get(URI.create(uri), config);
// 列出hdfs上/tmp/input/目录下的所有文件和目录
FileStatus[] statuses = fs.listStatus(new Path("/tmp/input"));
for (FileStatus status : statuses) {
System.out.println(status);
}
// 在hdfs的/tmp/input目录下创建一个文件,并写入一行文本
FSDataOutputStream os = fs.create(new Path("/tmp/input/test.log"));
os.write("Hello World!".getBytes());
os.flush();
os.close();
// 显示在hdfs的/tmp/input下指定文件的内容
InputStream is = fs.open(new Path("/tmp/input/test.log"));
IOUtils.copyBytes(is, System.out, 1024, true);
}
}
调试
执行main方法
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
FileStatus{path=hdfs://192.168.145.128:9000/tmp/input/f1; isDirectory=false; length=20; replication=2; blocksize=134217728; modification_time=1428671368587; access_time=1428998938744; owner=root; group=supergroup; permission=rw-r--r--; isSymlink=false}
FileStatus{path=hdfs://192.168.145.128:9000/tmp/input/f2; isDirectory=false; length=25; replication=2; blocksize=134217728; modification_time=1428671368663; access_time=1428998938711; owner=root; group=supergroup; permission=rw-r--r--; isSymlink=false}
FileStatus{path=hdfs://192.168.145.128:9000/tmp/input/test.log; isDirectory=false; length=12; replication=3; blocksize=134217728; modification_time=1428991073630; access_time=1428998938072; owner=root; group=supergroup; permission=rw-r--r--; isSymlink=false}
Hello World!
Process finished with exit code 0
已经把前面生成的fs文件打印出来。