一、命令行模式上传 文件
1、查看要上传文件的大小
[hadoop@cloud01 ~]$ ll -h jdk-7u65-linux-i586.tar.gz
-rw-rw-r--. 1 hadoop hadoop 137M Jul 18 2014 jdk-7u65-linux-i586.tar.gz
-rw-rw-r--. 1 hadoop hadoop 137M Jul 18 2014 jdk-7u65-linux-i586.tar.gz
2、执行上传文件
[hadoop@cloud01 ~]$ hadoop fs -put jdk-7u65-linux-i586.tar.gz hdfs://cloud01:9000/
3、查看上传HDFS后的文件
[hadoop@cloud01 finalized]$ pwd
/home/hadoop/app/hadoop-2.4.1/tmp/dfs/data/current/BP-135889517-192.168.2.31-1424411868365/current/finalized
[hadoop@cloud01 finalized]$ ll -h
total 139M
-rw-rw-r--. 1 hadoop hadoop 128M Feb 22 04:15 blk_1073741837
-rw-rw-r--. 1 hadoop hadoop 1.1M Feb 22 04:15 blk_1073741837_1013.meta
-rw-rw-r--. 1 hadoop hadoop 9.0M Feb 22 04:15 blk_1073741838
-rw-rw-r--. 1 hadoop hadoop 72K Feb 22 04:15 blk_1073741838_1014.meta
/home/hadoop/app/hadoop-2.4.1/tmp/dfs/data/current/BP-135889517-192.168.2.31-1424411868365/current/finalized
[hadoop@cloud01 finalized]$ ll -h
total 139M
-rw-rw-r--. 1 hadoop hadoop 128M Feb 22 04:15 blk_1073741837
-rw-rw-r--. 1 hadoop hadoop 1.1M Feb 22 04:15 blk_1073741837_1013.meta
-rw-rw-r--. 1 hadoop hadoop 9.0M Feb 22 04:15 blk_1073741838
-rw-rw-r--. 1 hadoop hadoop 72K Feb 22 04:15 blk_1073741838_1014.meta
从 HDFS上发现内容: HDFS默认按128M分块,由于该文件 jdk-7u65-linux-i586.tar.gz为137M,发现上传后文件分成两块即blk_1073741837 + blk_1073741838 = 128M + 9M = 139M
4、通过页面方式查看上传后的文件
4.1 查看列表
4.2 点击
jdk-7u65-linux-i586.tar.gz 查看详细内容
二、通过Java接口方式下载文件,分析下载过程
1、分析获取FileSystem实例的执行过程
/**
* Returns the FileSystem for this URI's scheme and authority. The
* scheme of the URI determines a configuration property name,
*
<tt>
fs.
<i>
scheme
</i>
.class
</tt>
whose value names the FileSystem
* class. The entire URI is passed to the FileSystem instance's
* initialize method.
*/
try
{
FileSystem
fileSystem
= FileSystem.get(
new
URI(
HDFS_PATH
),
new
Configuration());
}
catch
(Exception
e
) {
e
.printStackTrace();
}
1.1 新建Client程序
1.2 FileSystem调用get方法
1.3 调用get方法,生成DistributedFileSystem实例对象
1.4 在DistributedFileSystem中调用initialize方法,调用new DFSClient(),DFSClient方法通过RPC.getProtocalProxy生成代理对象,然后进行对代理对象进行加强
1.4 在DistributedFileSystem中调用initialize方法,调用new DFSClient(),DFSClient方法通过RPC.getProtocalProxy生成代理对象,然后进行对代理对象进行加强
1.5 通过this.namenode = proxyinfo.getProxy() 获取代理对象
1.6 通过上述操作,Client可以直接获取服务端NameNode的代理对象,进而远程调用服务对象指定的方法
2、获取FileSystem对象后,调用open方法获取输入流
2.1 相关类分析
2.2 执行open获取hdfs的输入流的执行过程
具体可以看图
LocatedBlocks{
fileLength=143588167
underConstruction=false
blocks=[LocatedBlock{BP-135889517-192.168.2.31-1424411868365:blk_1073741837_1013; getBlockSize()=134217728; corrupt=false; offset=0; locs=[192.168.2.31:50010]},
LocatedBlock{BP-135889517-192.168.2.31-1424411868365:blk_1073741838_1014; getBlockSize()=9370439; corrupt=false; offset=134217728; locs=[192.168.2.31:50010]}]
lastLocatedBlock=LocatedBlock{BP-135889517-192.168.2.31-1424411868365:blk_1073741838_1014; getBlockSize()=9370439; corrupt=false; offset=134217728; locs=[192.168.2.31:50010]}
isLastBlockComplete=true
}