一:hadoop2.6.4 shell基本命令
输入hadoop,查看命令的基本用法:
此前必须要配置好hadoop的环境变量,配置环境变量可以参考此文章
http://blog.csdn.net/mastethuang/article/details/51867115
huang@ubuntu:~$ hadoop
之后会出现hadoop命令的具体用法:
Usage: hadoop [--config confdir] COMMAND
where COMMAND is one of:
fs run a generic filesystem user client
version print the version
jar <jar> run a jar file
checknative [-a|-h] check native hadoop and compression libraries availability
distcp <srcurl> <desturl> copy file or directories recursively
archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
classpath prints the class path needed to get the
credential interact with credential providers
Hadoop jar and the required libraries
daemonlog get/set the log level for each daemon
trace view and modify Hadoop tracing settings
or
CLASSNAME run the class named CLASSNAME
Most commands print help when invoked w/o parameters.
查看hdfs的命令
huang@ubuntu:~$ hadoop fs
fs命令的具体用法:
Usage: hadoop fs [generic options]
[-appendToFile <localsrc> ... <dst>]
[-cat [-ignoreCrc] <src> ...]
[-checksum <src> ...]
[-chgrp [-R] GROUP PATH...]
[-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
[-chown [-R] [OWNER][:[GROUP]] PATH...]
[-copyFromLocal [-f] [-p] [-l] <localsrc> ... <dst>]
[-copyToLocal [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
[-count [-q] [-h] <path> ...]
[-cp [-f] [-p | -p[topax]] <src> ... <dst>]
[-createSnapshot <snapshotDir> [<snapshotName>]]
[-deleteSnapshot <snapshotDir> <snapshotName>]
[-df [-h] [<path> ...]]
[-du [-s] [-h] <path> ...]
[-expunge]
[-get [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
[-getfacl [-R] <path>]
[-getfattr [-R] {-n name | -d} [-e en] <path>]
[-getmerge [-nl] <src> <localdst>]
[-help [cmd ...]]
[-ls [-d] [-h] [-R] [<path> ...]]
[-mkdir [-p] <path> ...]
[-moveFromLocal <localsrc> ... <dst>]
[-moveToLocal <src> <localdst>]
[-mv <src> ... <dst>]
[-put [-f] [-p] [-l] <localsrc> ... <dst>]
[-renameSnapshot <snapshotDir> <oldName> <newName>]
[-rm [-f] [-r|-R] [-skipTrash] <src> ...]
[-rmdir [--ignore-fail-on-non-empty] <dir> ...]
[-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]]
[-setfattr {-n name [-v value] | -x name} <path>]
[-setrep [-R] [-w] <rep> <path> ...]
[-stat [format] <path> ...]
[-tail [-f] <file>]
[-test -[defsz] <path>]
[-text [-ignoreCrc] <src> ...]
[-touchz <path> ...]
[-usage [cmd ...]]
Generic options supported are
-conf <configuration file> specify an application configuration file
-D <property=value> use value for given property
-fs <local|namenode:port> specify a namenode
-jt <local|resourcemanager:port> specify a ResourceManager
-files <comma separated list of files> specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars> specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives> specify comma separated archives to be unarchived on the compute machines.
The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]
二:wordcount程序运行
在执行wordcount前,我们需要在hdfs文件管理系统中创建数据的input目录
参考上面的命令用法:
创建wc目录
huang@ubuntu:~$ hadoop fs -mkdir /wc/
在wc下创建input目录
huang@ubuntu:~$ hadoop fs -mkdir /wc/input/
查看目录是否创建成功:
huang@ubuntu:~$ hadoop fs -ls -R /
成功则会出现:
drwxr-xr-x - huang supergroup 0 2016-07-09 20:36 /wc
drwxr-xr-x - huang supergroup 0 2016-07-09 20:36 /wc/input
想input文件夹中加入要进行统计wordcount的文件(将
/
usr
/
local
/
hadoop
-
2.6
.
4
/
etc
/
hadoop/
里面的所有xml文件加入到input中
)
huang@ubuntu:~$ hadoop fs -put /usr/local/hadoop-2.6.4/etc/hadoop/*.xml /wc/input/
查看文件xml文件是否成功添加到input中
huang@ubuntu:~$ hadoop fs -ls -R /
显示下面内容则说明添加成功
drwxr-xr-x - huang supergroup 0 2016-07-09 20:36 /wc
drwxr-xr-x - huang supergroup 0 2016-07-09 20:39 /wc/input
-rw-r--r-- 1 huang supergroup 4436 2016-07-09 20:39 /wc/input/capacity-scheduler.xml
-rw-r--r-- 1 huang supergroup 1122 2016-07-09 20:39 /wc/input/core-site.xml
-rw-r--r-- 1 huang supergroup 9683 2016-07-09 20:39 /wc/input/hadoop-policy.xml
-rw-r--r-- 1 huang supergroup 1199 2016-07-09 20:39 /wc/input/hdfs-site.xml
-rw-r--r-- 1 huang supergroup 620 2016-07-09 20:39 /wc/input/httpfs-site.xml
-rw-r--r-- 1 huang supergroup 3523 2016-07-09 20:39 /wc/input/kms-acls.xml
-rw-r--r-- 1 huang supergroup 5511 2016-07-09 20:39 /wc/input/kms-site.xml
-rw-r--r-- 1 huang supergroup 690 2016-07-09 20:39 /wc/input/yarn-site.xml
运行wordcount程序,输入hadoop jar查看用法:
huang@ubuntu:~$ hadoop jar
RunJar jarFile [mainClass] args...
hadoop2.6.4自带一个wordcount的例子,进入到自己安装hadoop的路径中,其jar包放在/share/hadoop/mapreduce/中
huang@ubuntu:/usr/local/hadoop-2.6.4/share/hadoop/mapreduce$ ls
其文件夹中内容如下
hadoop-mapreduce-examples-2.6.
4.jar正是我们所需要的
hadoop-mapreduce-client-app-2.6.4.jar
hadoop-mapreduce-client-hs-2.6.4.jar
hadoop-mapreduce-client-jobclient-2.6.4-tests.jar
lib
hadoop-mapreduce-client-common-2.6.4.jar
hadoop-mapreduce-client-hs-plugins-2.6.4.jar
hadoop-mapreduce-client-shuffle-2.6.4.jar
lib-examples
hadoop-mapreduce-client-core-2.6.4.jar
hadoop-mapreduce-client-jobclient-2.6.4.jar
hadoop-mapreduce-examples-2.6.4.jar
sources
接下来就可以运行这个jar包进行单词统计了(注意output目录不可事先存在)
huang@ubuntu:/usr/local/hadoop-2.6.4/share/hadoop/mapreduce$ hadoop jar hadoop-mapreduce-examples-2.6.4.jar wordcount /wc/input/ /wc/output/
查看运行结果:
huang@ubuntu:~$ hadoop fs -ls -R /wc/output/
里面有两个文件:
-rw-r--r-- 1 huang supergroup 0 2016-07-09 20:50 /wc/output/_SUCCESS
-rw-r--r-- 1 huang supergroup 10431 2016-07-09 20:50 /wc/output/part-r-00000
打开part-r-00000
huang@ubuntu:~$ hadoop fs -text /wc/output/part-r-00000
最后会出现统计的结果(下面为一小部分的结果..):
via 1
when 4
where 1
which 5
while 1
who 2
will 7
window 1
window, 1
with 27
within 1
without 1
work 1
writing, 8
you 9