本篇简要说明了怎样从hdfs中读取文件并显示。
要使java识别出hdfs开头的URL标示需要一点额外的工作要做:通过URL的setURLStreamHandlerFactory()方法为 java设置一个FSUrlStreamHandlerFactory。这个方法在每个JVM中只能调用一次,所以它通常会被放在一个static block中执行(如下所示),但是如果你的某部分程序(例如一个你无法修改源代码的第三方组件)已经调用了这个方法,那你就不能通过URL来这样读取数据
import java.io.InputStream;
import java.net.URL;
import org.apache.hadoop.fs.FsUrlStreamHandlerFactory;
import org.apache.hadoop.io.IOUtils;
public class URLCat {
static {
URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());
}
public static void main(String[] args) throws Exception {
InputStream in = null;
try {
in = new URL(args[0]).openStream();
IOUtils.copyBytes(in, System.out, 4096, false);
} finally {
IOUtils.closeStream(in);
}
}
}
上例中我们使用了Hadoop中IOUtils类的两个静态方法
1)IOUtils.copyBytes(),其中in表示拷贝源,System.out表示拷贝目的地(也就是要拷贝到标准输出中去),4096表示用来拷贝的buffer大小,false表明拷贝完成后我们并不关闭拷贝源和拷贝目的地(因为System.out并不需要关闭,in可以在finally语句中被关闭)
2)IOUtils.closeStream(in),用来关闭in流。
以下是编译执行步骤:
1,创建目录class,并编译
[root@centos7 class]# pwd
/root/hadoop-2.9.2/class
[root@centos7 java]# javac -classpath /root/hadoop-2.9.2/share/hadoop/common/hadoop-common-2.9.2.jar -d ${HADOOP_HOME}/class URLCat.java
2,打包
[root@centos7 class]# jar -cvf URLCat.jar .
added manifest
adding: URLCat.class(in = 1017) (out= 596)(deflated 41%)
3,执行
[root@centos7 class]# export HADOOP_CLASSPATH=/root/hadoop-2.9.2/class
[root@centos7 class]# hadoop URLCat hdfs://localhost/user/root/sample.txt
0067011990999991950051507004+68750+023550FM-12+038299999V0203301N00671220001CN9999999N9+00001+99999999999
0043011990999991950051512004+68750+023550FM-12+038299999V0203201N00671220001CN9999999N9+00221+99999999999
0043011990999991950051518004+68750+023550FM-12+038299999V0203201N00261220001CN9999999N9-00111+99999999999
0043012650999991949032412004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+01111+99999999999
0043012650999991949032418004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+00781+99999999999
说明:hadoop脚本执行的时候,classpath的搜索路径默认会在当前系统变量$HADOOP_CLASSPATH中去寻找,请执行前看看你的环境变量,并执行export HADOOP_CLASSPATH=XXX
或者:
[root@centos7 class]# hadoop jar URLCat.jar URLCat hdfs://localhost/user/root/sample.txt
0067011990999991950051507004+68750+023550FM-12+038299999V0203301N00671220001CN9999999N9+00001+99999999999
0043011990999991950051512004+68750+023550FM-12+038299999V0203201N00671220001CN9999999N9+00221+99999999999
0043011990999991950051518004+68750+023550FM-12+038299999V0203201N00261220001CN9999999N9-00111+99999999999
0043012650999991949032412004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+01111+99999999999
0043012650999991949032418004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+00781+99999999999