wordcount案例

Hadoop与linux的交互

hadoop是安装在linux上的集群,所以二者之间需要交互。Linux命令是操作Linux的文件系统的,而hadoop有自己的文件系统hdfs,所以我们不能直接用Linux命令来操作Hadoop上的文件。此时就需要交互语言
hadoop上的命令基本同Linux,只是需要在前面加hadoop
hadoop的根目录 / 指的是:hdfs://user/机器名:端口/

  • 如下可查看hadoop命令
[root@Hadoop01 /] hadoop
Usage: hadoop [--config confdir] COMMAND
       where COMMAND is one of:
  fs                   run a generic filesystem user client
  version              print the version
  jar <jar>            run a jar file
  checknative [-a|-h]  check native hadoop and compression libraries availability
  distcp <srcurl> <desturl> copy file or directories recursively
  archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
  classpath            prints the class path needed to get the
  credential           interact with credential providers
                       Hadoop jar and the required libraries
  daemonlog            get/set the log level for each daemon
  s3guard              manage data on S3
  trace                view and modify Hadoop tracing settings
 or
  CLASSNAME            run the class named CLASSNAME

Most commands print help when invoked w/o parameters.
  • 如下可查看hadoop fs命令
[root@Hadoop01 /] hadoop fs
Usage: hadoop fs [generic options]
	[-appendToFile <localsrc> ... <dst>]
	[-cat [-ignoreCrc] <src> ...]
	[-checksum <src> ...]
	[-chgrp [-R] GROUP PATH...]
	[-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
	[-chown [-R] [OWNER][:[GROUP]] PATH...]
	[-copyFromLocal [-f] [-p] [-l] <localsrc> ... <dst>]
	[-copyToLocal [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
	[-count [-q] [-h] [-v] [-x] <path> ...]
	[-cp [-f] [-p | -p[topax]] <src> ... <dst>]
	[-createSnapshot <snapshotDir> [<snapshotName>]]
	[-deleteSnapshot <snapshotDir> <snapshotName>]
	[-df [-h] [<path> ...]]
	[-du [-s] [-h] [-x] <path> ...]
	[-expunge]
	[-find <path> ... <expression> ...]
	[-get [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
	[-getfacl [-R] <path>]
	[-getfattr [-R] {-n name | -d} [-e en] <path>]
	[-getmerge [-nl] <src> <localdst>]
	[-help [cmd ...]]
	[-ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [<path> ...]]
	[-mkdir [-p] <path> ...]
	[-moveFromLocal <localsrc> ... <dst>]
	[-moveToLocal <src> <localdst>]
	[-mv <src> ... <dst>]
	[-put [-f] [-p] [-l] <localsrc> ... <dst>]
	[-renameSnapshot <snapshotDir> <oldName> <newName>]
	[-rm [-f] [-r|-R] [-skipTrash] <src> ...]
	[-rmdir [--ignore-fail-on-non-empty] <dir> ...]
	[-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]]
	[-setfattr {-n name [-v value] | -x name} <path>]
	[-setrep [-R] [-w] <rep> <path> ...]
	[-stat [format] <path> ...]
	[-tail [-f] <file>]
	[-test -[defsz] <path>]
	[-text [-ignoreCrc] <src> ...]
	[-touchz <path> ...]
	[-usage [cmd ...]]

wordcount经典案例

wordcount是hadoop上的经典案例,本次我们用这个案例来测试Hadoop的存储、计算、调度过程

  • 新建本地文件wordcount.txt
[byy@Hadoop01 data]$ vim wordcount.txt
hello
word
bai
xue
bai
xue
1
1 2
1 22 3
1 22 3
~                                                                                                                                                                                                                                                                                                 
"test.txt" 10L, 47C  
  • 在hdfs上新建文件夹存放文件
[byy@Hadoop01 data]$ hadoop fs -mkdir -p /data/wordcount/input
[byy@Hadoop01 data]$ hadoop fs -ls /
Found 1 items
drwxr-xr-x   - byy supergroup          0 2021-02-06 17:49 /data
[byy@Hadoop01 data]$ hadoop fs -ls /data
Found 1 items
drwxr-xr-x   - byy supergroup          0 2021-02-06 17:49 /data/wordcount
[byy@Hadoop01 data]$ hadoop fs -ls /data/wordcount/
Found 1 items
drwxr-xr-x   - byy supergroup          0 2021-02-06 17:49 /data/wordcount/input
  • 上传wordcount.txt到hdfs上
[byy@Hadoop01 data]$ hadoop fs -put wordcount.txt /data/wordcount/input/
[byy@Hadoop01 data]$ hadoop fs -ls /data/wordcount/input
Found 1 items
-rw-r--r--   1 byy supergroup         47 2021-02-06 18:56 /data/wordcount/input/wordcount.txt
[byy@Hadoop01 data]$ hadoop fs -cat /data/wordcount/input/wordcount.txt
hello
word
bai
xue
bai
xue
1
1 2
1 22 3
1 22 3
  • 查找mapreduce作业的jar包
[root@Hadoop01 /] cd /opt/app/hadoop
[root@Hadoop01 hadoop] find -name '*example*.jar'
./share/hadoop/mapreduce1/hadoop-examples-2.6.0-mr1-cdh5.16.2.jar
./share/hadoop/mapreduce2/sources/hadoop-mapreduce-examples-2.6.0-cdh5.16.2-test-sources.jar
./share/hadoop/mapreduce2/sources/hadoop-mapreduce-examples-2.6.0-cdh5.16.2-sources.jar
#这个是我们需要的jar包
./share/hadoop/mapreduce2/hadoop-mapreduce-examples-2.6.0-cdh5.16.2.jar
  • 运行jar包进行mapreduce作业,对wordcount.txt进行计算
    此时可以打开yarn界面,就能看到任务进度了
[byy@Hadoop01 ~]$ hadoop jar /opt/app/hadoop/share/hadoop/mapreduce2/hadoop-mapreduce-examples-2.6.0-cdh5.16.2.jar wordcount /data/wordcount/input /data/wordcount/output
#查看计算输出的文件(会在output路径下自动生成文件)
[byy@Hadoop01 ~]$ hadoop fs -ls /data/wordcount/output
Found 2 items
-rw-r--r--   1 root supergroup          0 2021-02-06 19:30 /data/wordcount/output/_SUCCESS #任务成功或失败的文件
-rw-r--r--   1 root supergroup         44 2021-02-06 19:30 /data/wordcount/output/part-r-00000 #结果文件
[byy@Hadoop01 ~]$ hadoop fs -cat /data/wordcount/output/part-r-00000
1	4
2	1
22	2
3	2
bai	2
hello	1
word	1
xue	2
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值