hadoop2.6.4 shell 基本命令与运行wordcount

一:hadoop2.6.4 shell基本命令

输入hadoop,查看命令的基本用法:


此前必须要配置好hadoop的环境变量,配置环境变量可以参考此文章 http://blog.csdn.net/mastethuang/article/details/51867115
    
    
huang@ubuntu:~$ hadoop

之后会出现hadoop命令的具体用法:
   
   
Usage: hadoop [--config confdir] COMMAND
where COMMAND is one of:
fs run a generic filesystem user client
version print the version
jar <jar> run a jar file
checknative [-a|-h] check native hadoop and compression libraries availability
distcp <srcurl> <desturl> copy file or directories recursively
archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
classpath prints the class path needed to get the
credential interact with credential providers
Hadoop jar and the required libraries
daemonlog get/set the log level for each daemon
trace view and modify Hadoop tracing settings
or
CLASSNAME run the class named CLASSNAME
 
Most commands print help when invoked w/o parameters.

查看hdfs的命令
   
   
huang@ubuntu:~$ hadoop fs

fs命令的具体用法:
   
   
Usage: hadoop fs [generic options]
[-appendToFile <localsrc> ... <dst>]
[-cat [-ignoreCrc] <src> ...]
[-checksum <src> ...]
[-chgrp [-R] GROUP PATH...]
[-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
[-chown [-R] [OWNER][:[GROUP]] PATH...]
[-copyFromLocal [-f] [-p] [-l] <localsrc> ... <dst>]
[-copyToLocal [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
[-count [-q] [-h] <path> ...]
[-cp [-f] [-p | -p[topax]] <src> ... <dst>]
[-createSnapshot <snapshotDir> [<snapshotName>]]
[-deleteSnapshot <snapshotDir> <snapshotName>]
[-df [-h] [<path> ...]]
[-du [-s] [-h] <path> ...]
[-expunge]
[-get [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
[-getfacl [-R] <path>]
[-getfattr [-R] {-n name | -d} [-e en] <path>]
[-getmerge [-nl] <src> <localdst>]
[-help [cmd ...]]
[-ls [-d] [-h] [-R] [<path> ...]]
[-mkdir [-p] <path> ...]
[-moveFromLocal <localsrc> ... <dst>]
[-moveToLocal <src> <localdst>]
[-mv <src> ... <dst>]
[-put [-f] [-p] [-l] <localsrc> ... <dst>]
[-renameSnapshot <snapshotDir> <oldName> <newName>]
[-rm [-f] [-r|-R] [-skipTrash] <src> ...]
[-rmdir [--ignore-fail-on-non-empty] <dir> ...]
[-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]]
[-setfattr {-n name [-v value] | -x name} <path>]
[-setrep [-R] [-w] <rep> <path> ...]
[-stat [format] <path> ...]
[-tail [-f] <file>]
[-test -[defsz] <path>]
[-text [-ignoreCrc] <src> ...]
[-touchz <path> ...]
[-usage [cmd ...]]
 
Generic options supported are
-conf <configuration file> specify an application configuration file
-D <property=value> use value for given property
-fs <local|namenode:port> specify a namenode
-jt <local|resourcemanager:port> specify a ResourceManager
-files <comma separated list of files> specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars> specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives> specify comma separated archives to be unarchived on the compute machines.
 
The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]



二:wordcount程序运行

在执行wordcount前,我们需要在hdfs文件管理系统中创建数据的input目录
参考上面的命令用法:

创建wc目录
   
   
huang@ubuntu:~$ hadoop fs -mkdir /wc/

在wc下创建input目录
   
   
huang@ubuntu:~$ hadoop fs -mkdir /wc/input/

查看目录是否创建成功:
   
   
huang@ubuntu:~$ hadoop fs -ls -R /

成功则会出现:
   
   
drwxr-xr-x - huang supergroup 0 2016-07-09 20:36 /wc
drwxr-xr-x - huang supergroup 0 2016-07-09 20:36 /wc/input

想input文件夹中加入要进行统计wordcount的文件(将 / usr / local / hadoop - 2.6 . 4 / etc / hadoop/ 里面的所有xml文件加入到input中 )
   
   
huang@ubuntu:~$ hadoop fs -put /usr/local/hadoop-2.6.4/etc/hadoop/*.xml /wc/input/

查看文件xml文件是否成功添加到input中
   
   
huang@ubuntu:~$ hadoop fs -ls -R /

显示下面内容则说明添加成功
   
   
drwxr-xr-x - huang supergroup 0 2016-07-09 20:36 /wc
drwxr-xr-x - huang supergroup 0 2016-07-09 20:39 /wc/input
-rw-r--r-- 1 huang supergroup 4436 2016-07-09 20:39 /wc/input/capacity-scheduler.xml
-rw-r--r-- 1 huang supergroup 1122 2016-07-09 20:39 /wc/input/core-site.xml
-rw-r--r-- 1 huang supergroup 9683 2016-07-09 20:39 /wc/input/hadoop-policy.xml
-rw-r--r-- 1 huang supergroup 1199 2016-07-09 20:39 /wc/input/hdfs-site.xml
-rw-r--r-- 1 huang supergroup 620 2016-07-09 20:39 /wc/input/httpfs-site.xml
-rw-r--r-- 1 huang supergroup 3523 2016-07-09 20:39 /wc/input/kms-acls.xml
-rw-r--r-- 1 huang supergroup 5511 2016-07-09 20:39 /wc/input/kms-site.xml
-rw-r--r-- 1 huang supergroup 690 2016-07-09 20:39 /wc/input/yarn-site.xml

运行wordcount程序,输入hadoop jar查看用法:
   
   
huang@ubuntu:~$ hadoop jar
RunJar jarFile [mainClass] args...

hadoop2.6.4自带一个wordcount的例子,进入到自己安装hadoop的路径中,其jar包放在/share/hadoop/mapreduce/中
   
   
huang@ubuntu:/usr/local/hadoop-2.6.4/share/hadoop/mapreduce$ ls

其文件夹中内容如下 hadoop-mapreduce-examples-2.6. 4.jar正是我们所需要的
   
   
hadoop-mapreduce-client-app-2.6.4.jar
hadoop-mapreduce-client-hs-2.6.4.jar
hadoop-mapreduce-client-jobclient-2.6.4-tests.jar
lib
hadoop-mapreduce-client-common-2.6.4.jar
hadoop-mapreduce-client-hs-plugins-2.6.4.jar
hadoop-mapreduce-client-shuffle-2.6.4.jar
lib-examples
hadoop-mapreduce-client-core-2.6.4.jar
hadoop-mapreduce-client-jobclient-2.6.4.jar
hadoop-mapreduce-examples-2.6.4.jar
sources

接下来就可以运行这个jar包进行单词统计了(注意output目录不可事先存在)
   
   
huang@ubuntu:/usr/local/hadoop-2.6.4/share/hadoop/mapreduce$ hadoop jar hadoop-mapreduce-examples-2.6.4.jar wordcount /wc/input/ /wc/output/

查看运行结果:
   
   
huang@ubuntu:~$ hadoop fs -ls -R /wc/output/

里面有两个文件:
   
   
-rw-r--r-- 1 huang supergroup 0 2016-07-09 20:50 /wc/output/_SUCCESS
-rw-r--r-- 1 huang supergroup 10431 2016-07-09 20:50 /wc/output/part-r-00000

打开part-r-00000
   
   
huang@ubuntu:~$ hadoop fs -text /wc/output/part-r-00000

最后会出现统计的结果(下面为一小部分的结果..):
   
   
via 1
when 4
where 1
which 5
while 1
who 2
will 7
window 1
window, 1
with 27
within 1
without 1
work 1
writing, 8
you 9



评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值