一·描述HDFS架构
1, HDFS即HadoopDistributed FileSystem分布式文件系统.,
2, 架构设计:HDFS主要由3个组件构成,分别是NameNode、SecondaryNameNode和DataNode,HDFS是以master/slave模式运行的,其中NameNode、SecondaryNameNode 运行在master节点,DataNode运行slave节点。
3, 一个HDFS集群是由一个NameNode和一定数目的DataNode组成。NameNode是一个中心服务器,负责管理文件系统的名字空间(namespace)以及客户端对文件的访问。集群中的DataNode一般是一个节点一个,负责管理它所在节点上的存储。
HDFS架构有两大块组成,第一块namenode 第二块datanode.namenode是存储Metadata信息的,Metadata信息是元数据存储在哪一个datanode的哪一个位置的信息。
Client通过namenode得到所需文件的具体存储位置,然后根据获得的位置对所需文件进行读与写。在对文件进行读写的过程中进行备份(Replication)
途中Rack机架,是为了便于数据的管理。
HDFS架构分成两大类,一类namenode,一类是datanode ,这两类都是对数据进行存储的,namenode是给出存储的位置方案,但实际存到datanode中。Client客户端根据其中的组件Distributed File System进行访问namenode,得到datanode中存在的空着的空间位置,然后Client客户端根据FSData OutputStream组件得到的空余存储位置。把自己的block分成多个packet,其中一个packet写给其中的datanode。datanode再把写入的packet写给下一个datanode,写给下一个datanode完成后再向FSData OutputStream组件要取下一个packet。依次循环直到完成。而每个datanode之间类似“串联”,前一个datanode把自己的packet写给下一个datanode,再向自己前一个datanode所要packet.写完后通过FSData OutputStream组件告诉client端,client端在根据组件DistributedFileSystem告诉namenode位置
流式写入,提高并发效率和可利用率。
一个Client端并发给datanode的话,就会出现网络瓶颈。因为datanode多的话需要网带大。
元数据(Metadata),又称中介数据、中继数据,为描述数据的数据(data about data),主要是描述数据属性(property)的信息,用来支持如指示存储位置、历史数据、资源查找、文件记录等功能。
根据两个组件联系namenode和datanode,namenode从datanode中找到去的文件进行拼接,然后呈现给client
二·描述数据块原理及副本机制
数据块原理:
1,块是一个很大的单元,默认是128MB。像硬盘中的文件系统一样,在HDFS中的文件将会按块大小进行分解,并作为独立的单元进行存储。
2, 和硬盘中的文件系统不一样的是,存储在块中的一个比块小的文件并不会占据一个块大小的硬盘物理空间,即HDFS中一个块只存储一个文件的内容
3, HDFS的块之所以这么大,主要原因就是为了把寻道(Seek)时间最小化。
4, 一个文件存储方式,按大小被切分成若干个block,存储到不同节点上,默认情况下每个block有三个副本
副本机制:一般是三个:同Client的节点上一个,不同机架中的节点上一个,与第一个副本同一机架的另一个节点上一个。 如果还有其他的副本,随机任意放置。
三·FS Shell
网址:
http://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-common/FileSystemShell.html
su – hdfs èhadoopfs-
在利用hadoop命令的时,默认的工作路径就是在/user/$USER
1. -ls path
列出path目录下的内容,包括文件名,权限,所有者,大小和修改时间。
2. -lsr path
与ls相似,但递归地显示子目录下的内容。
3. -du path
显示path下所有文件磁盘使用情况下,用字节大小表示,文件名用完整的HDFS协议前缀表示。
4. -dus path
与-du相似,但它还显示全部文件或目录磁盘使用情况
5. -mv src dest
在HDFS中,将文件或目录从HDFS的源路径移动到目标路径。
6. -cp src dest
在HDFS中,将src文件或目录复制到dest。
7. –rm path
删除一个文件或目录
8. –rmr path
删除一个文件或递归删除目录
注意:这里的mv cp操作的源路径和目的路径都是在HDFS中的路径文件
9. –put localSrcdest
将本地文件或目录localSrc上传到HDFS中的dest路径。
10. –copyFromLocallocalSrcdest
与-put命令相同
11. –moveFromLocallocalSrcdest
将文件或目录从localSrc上传到HDFS中的dest目录,再删除本地文件或目录localSrc。
12 –get [-crc] srclocalDest
将文件或目录从HDFS中的src拷贝到本地文件系统localDest。
13 –getmerge srclocalDest [addnl]
将在HDFS中满足路径src的文件合并到本地文件系统的一个文件localDest中。
14 –cat filename
显示文件内容到标准输出上。
15. -copyToLocal[-crc] srclocalDest
与-get命令相同。
16 -moveToLocal[-crc] srclocalDest
与-get命令相似,但拷贝结束后,删除HDFS上原文件。
17 -mkdir path
在HDFS中创建一个名为path的目录,如果它的上级目录不存在,也会被创建,如同linux中的mkidr –p。
18 -setrep [-R][-w] rep path
设置目标文件的复制数。
19 -touchz path
创建一个文件。时间戳为当前时间,如果文件本就存在就失败,除非原文件长充为0。
20 -test –[ezd]path
如果路径(path)存在,返回1,长度为0(zero),或是一个目录(directory)。
21 –stat [format]path
显示文件所占块数(%b),文件名(%n),块大小(%n),复制数(%r),修改时间(%y%Y)。
22 –tail [-f] file
显示文件最后的1KB内容到标准输出。
23 –chmod [-R][owner][:[group]] path…
递归修改时带上-R参数,mode是一个3位的8进制数,或是[augo]+/-{rwxX}。
24 –chgrp [-R]group
设置文件或目录的所有组,递归修改目录时用-R参数。
25 –help cmd
显示cmd命令的使用信息,你需要把命令的“-”去掉
cat
Usage: hadoop fs -catURI[URI ...]
Copies source paths to stdout.
Example:
· hadoop fs -cat hdfs://nn1.example.com/file1hdfs://nn2.example.com/file2
· hadoop fs -cat file:///file3 /user/hadoop/file4
Exit Code:
Returns 0 on success and -1onerror.
查看文件,Linux中查不到,必须hadoopfs命令:hadoop fs –cat /test/a.txt
Chmod
Usage: hadoop fs -chmod[-R]<MODE[,MODE]... | OCTALMODE> URI [URI ...]
Change the permissions offiles.With -R, make the change recursively through the directory structure. Theusermust be the owner of the file, or else a super-user. Additional informationisin the PermissionsGuide.
Options
· The -R option will make the change recursively throughthe directory structure.
Hive用户可访问到,可读权限, 赋予写的权限:hadoop fs -chmod 777 /test
chown
Usage: hadoop fs -chown[-R][OWNER][:[GROUP]] URI [URI ]
Change the owner of files. Theusermust be a super-user. Additional information is in the PermissionsGuide.
Options
· The -R option will make the change recursively throughthe directory structure.
更改文件用户权限:hadoopfs–chown hive:hive /test
count
Usage: hadoop fs -count[-q][-h] [-v] <paths>
Count the number ofdirectories,files and bytes under the paths that match the specified filepattern. Theoutput columns with -count are: DIR_COUNT, FILE_COUNT,CONTENT_SIZE, PATHNAME
The output columns with -count-qare: QUOTA, REMAINING_QUATA, SPACE_QUOTA, REMAINING_SPACE_QUOTA,DIR_COUNT,FILE_COUNT, CONTENT_SIZE, PATHNAME
The -h option shows sizes inhumanreadable format.
The -v option displays aheaderline.
Example:
· hadoop fs -count hdfs://nn1.example.com/file1hdfs://nn2.example.com/file2
· hadoop fs -count -q hdfs://nn1.example.com/file1
· hadoop fs -count -q -h hdfs://nn1.example.com/file1
· hdfs dfs -count -q -h -v hdfs://nn1.example.com/file1
Exit Code:
Returns 0 on success and -1onerror.
cp
Usage: hadoop fs -cp [-f][-p| -p[topax]] URI [URI ...] <dest>
Copy files from sourcetodestination. This command allows multiple sources as well in which casethedestination must be a directory.
‘raw.*’ namespace extendedattributes are preserved if (1) the source anddestination filesystems supportthem (HDFS only), and (2) all source anddestination pathnames are in the/.reserved/raw hierarchy. Determination ofwhether raw.* namespace xattrs arepreserved is independent of the -p (preserve)flag.
Options:
· The -f option will overwrite the destination if italready exists.
· The -p option will preserve file attributes [topx](timestamps, ownership, permission, ACL, XAttr). If -p is specified withno arg, then preserves timestamps, ownership, permission. If -pa isspecified, then preserves permission also because ACL is a super-set ofpermission. Determination of whether raw namespace extended attributes arepreserved is independent of the -p flag.
Example:
· hadoop fs -cp /user/hadoop/file1 /user/hadoop/file2
· hadoop fs -cp /user/hadoop/file1 /user/hadoop/file2/user/hadoop/dir
Exit Code:
Returns 0 on success and -1onerror.
find
Usage: hadoop fs-find<path> ... <expression> ...
Finds all files that match thespecifiedexpression and applies selected actions to them. If no path isspecifiedthen defaults to the current working directory. If no expression isspecifiedthen defaults to -print.
The following primaryexpressionsare recognised:
· -name pattern
-iname pattern
Evaluates as true if thebasenameof the file matches the pattern using standard file system globbing.If -inameis used then the match is case insensitive.
· -print
-print0Always
evaluates to true.Causes thecurrent pathname to be written to standard output. If the -print0expression isused then an ASCII NULL character is appended.
The following operatorsarerecognised:
· expression -a expression
expression -and expression
expression expression
Logical AND operator forjoiningtwo expressions. Returns true if both child expressions return true.Implied bythe juxtaposition of two expressions and so does not need to beexplicitlyspecified. The second expression will not be applied if the firstfails.
Example:
hadoop fs -find /-name test-print
Exit Code:
Returns 0 on success and -1onerror.
查找文件(find)
get
Usage: hadoop fs-get[-ignorecrc] [-crc] <src> <localdst>
Copy files to the local filesystem.Files that fail the CRC check may be copied with the -ignorecrc option.Filesand CRCs may be copied using the -crc option.
Example:
· hadoop fs -get /user/hadoop/file localfile
· hadoop fs -get hdfs://nn.example.com/user/hadoop/filelocalfile
Exit Code:
Returns 0 on success and -1onerror.
把文件拿到test文件夹中(get)
ls
Usage: hadoop fs -ls[-d][-h] [-R] <args>
Options:
· -d: Directories are listed as plain files.
· -h: Format file sizes in a human-readable fashion (eg64.0m instead of 67108864).
· -R: Recursively list subdirectories encountered.
For a file ls returns stat onthefile with the following format:
permissionsnumber_of_replicasuserid groupid filesize modification_date modification_timefilename
For a directory it returns listofits direct children as in Unix. A directory is listed as:
permissions useridgroupidmodification_date modification_time dirname
Files within a directory areorderby filename by default.
Example:
· hadoop fs -ls /user/hadoop/file1
Exit Code:
Returns 0 on success and -1onerror.
mkdir
Usage: hadoop fs -mkdir[-p]<paths>
Takes path uri’s as argumentandcreates directories.
Options:
· The -p option behavior is much like Unix mkdir -p,creating parent directories along the path.
Example:
· hadoop fs -mkdir /user/hadoop/dir1 /user/hadoop/dir2
· hadoop fs -mkdir hdfs://nn1.example.com/user/hadoop/dirhdfs://nn2.example.com/user/hadoop/dir
Exit Code:
Returns 0 on success and -1onerror.
mv
Usage: hadoop fs -mv URI[URI...] <dest>
Moves files from sourcetodestination. This command allows multiple sources as well in which casethedestination needs to be a directory. Moving files across file systems isnotpermitted.
Example:
· hadoop fs -mv /user/hadoop/file1 /user/hadoop/file2
· hadoop fs -mv hdfs://nn.example.com/file1hdfs://nn.example.com/file2 hdfs://nn.example.com/file3hdfs://nn.example.com/dir1
Exit Code:
Returns 0 on success and -1onerror.
文件改名(mv)
put
Usage: hadoop fs-put<localsrc> ... <dst>
Copy single src, or multiplesrcsfrom local file system to the destination file system. Also reads inputfromstdin and writes to destination file system.
· hadoop fs -put localfile /user/hadoop/hadoopfile
· hadoop fs -put localfile1 localfile2/user/hadoop/hadoopdir
· hadoop fs -put localfilehdfs://nn.example.com/hadoop/hadoopfile
· hadoop fs -put - hdfs://nn.example.com/hadoop/hadoopfileReads the input from stdin.
Exit Code:
Returns 0 on success and -1onerror.
文件上传(先赋予写的权限):hadoopfs–put a.txt /test
rm
Usage: hadoop fs -rm [-f][-r|-R] [-skipTrash] URI [URI ...]
Delete files specified as args.
If trash is enabled, filesysteminstead moves the deleted file to a trash directory (given by FileSystem#getTrashRoot).
Currently, the trash featureisdisabled by default. User can enable trash by setting a value greater thanzerofor parameter fs.trash.interval (in core-site.xml).
See expunge aboutdeletion of filesin trash.
Options:
· The -f option will not display a diagnostic message ormodify the exit status to reflect an error if the file does not exist.
· The -R option deletes the directory and any content underit recursively.
· The -r option is equivalent to -R.
· The -skipTrash option will bypass trash, if enabled, anddelete the specified file(s) immediately. This can be useful when it isnecessary to delete files from an over-quota directory.
Example:
· hadoop fs -rm hdfs://nn.example.com/file/user/hadoop/emptydir
Exit Code:
Returns 0 on success and -1onerror.
test
Usage: hadoop fs-test-[defsz] URI
Options:
· -d: f the path is a directory, return 0.
· -e: if the path exists, return 0.
· -f: if the path is a file, return 0.
· -s: if the path is not empty, return 0.
· -z: if the file is zero length, return 0.
Example:
· hadoop fs -test -e filename
text
Usage: hadoop fs-text<src>
Takes a source file and outputsthefile in text format. The allowed formats are zip and TextRecordInputStream.