hadoop运行wordcount实例

最新推荐文章于 2022-03-31 20:35:23 发布

away30

最新推荐文章于 2022-03-31 20:35:23 发布

阅读量915

点赞数

分类专栏： hadoop实例文章标签： hadoop 大数据

hadoop实例专栏收录该内容

12 篇文章 0 订阅

订阅专栏

1.查看hadoop版本

[hadoop@ltt1 sbin]$ hadoop version

Hadoop 2.6.0-cdh5.12.0

Subversion http://github.com/cloudera/hadoop -r dba647c5a8bc5e09b572d76a8d29481c78d1a0dd

Compiled by jenkins on 2017-06-29T11:33Z

Compiled with protoc 2.5.0

From source with checksum 7c45ae7a4592ce5af86bc4598c5b4

This command was run using /home/hadoop/hadoop260/share/hadoop/common/hadoop-common-2.6.0-cdh5.12.0.jar

2.通过hadoop自带的jar文件，可以简单测试一些功能。

查看hadoop-mapreduce-examples-2.6.0-cdh5.12.0.jar文件所支持的MapReduce功能列表

[hadoop@ltt1 sbin]$ hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.12.0.jar

An example program must be given as the first argument.

Valid program names are:

aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.

aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.

bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.

dbcount: An example job that count the pageview counts from a database.

distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.

grep: A map/reduce program that counts the matches of a regex in the input.

join: A job that effects a join over sorted, equally partitioned datasets

multifilewc: A job that counts words from several files.

pentomino: A map/reduce tile laying program to find solutions to pentomino problems.

pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.

randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.

randomwriter: A map/reduce program that writes 10GB of random data per node.

secondarysort: An example defining a secondary sort to the reduce.

sort: A map/reduce program that sorts the data written by the random writer.

sudoku: A sudoku solver.

teragen: Generate data for the terasort

terasort: Run the terasort

teravalidate: Checking results of terasort

wordcount: A map/reduce program that counts the words in the input files.

wordmean: A map/reduce program that counts the average length of the words in the input files.

wordmedian: A map/reduce program that counts the median length of the words in the input files.

wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.

3.在hdfs上创建文件夹

hadoop fs -mkdir /input

4.查看hdfs的更目录列表

[hadoop@ltt1 ~]$ hadoop fs -ls /
Found 2 items
drwxr-xr-x - hadoop supergroup 0 2017-09-17 08:11 /input
drwx------ - hadoop supergroup 0 2017-09-17 08:07 /tmp

5.上传本地文件到hdfs

hadoop fs -put $HADOOP_HOME/*.txt /input

6.查看hdfs上input目录下文件

[hadoop@ltt1 ~]$ hadoop fs -ls /input

Found 3 items-rw-r--r-- 2 hadoop supergroup 85063 2017-09-17 08:15 /input/LICENSE.txt-rw-r--r-- 2 hadoop supergroup 14978 2017-09-17 08:15 /input/NOTICE.txt-rw-r--r-- 2 hadoop supergroup 1366 2017-09-17 08:15 /input/README.txt

7.wordcount简单测试。

[hadoop@ltt1 ~]$ hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.12.0.jar wordcount /input /output17/09/17 08:19:12 INFO input.FileInputFormat: Total input paths to process : 317/09/17 08:19:13 INFO mapreduce.JobSubmitter: number of splits:317/09/17 08:19:13 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1505605169997_000217/09/17 08:19:14 INFO impl.YarnClientImpl: Submitted application application_1505605169997_000217/09/17 08:19:14 INFO mapreduce.Job: The url to track the job: http://ltt1.bg.cn:9180/proxy/application_1505605169997_0002/17/09/17 08:19:14 INFO mapreduce.Job: Running job: job_1505605169997_000217/09/17 08:19:27 INFO mapreduce.Job: Job job_1505605169997_0002 running in uber mode : false17/09/17 08:19:27 INFO mapreduce.Job: map 0% reduce 0%17/09/17 08:19:39 INFO mapreduce.Job: map 33% reduce 0%17/09/17 08:19:48 INFO mapreduce.Job: map 100% reduce 0%17/09/17 08:19:50 INFO mapreduce.Job: map 100% reduce 100%17/09/17 08:19:50 INFO mapreduce.Job: Job job_1505605169997_0002 completed successfully17/09/17 08:19:50 INFO mapreduce.Job: Counters: 50

File System Counters

FILE: Number of bytes read=42705

FILE: Number of bytes written=588235

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=101699

HDFS: Number of bytes written=30167

HDFS: Number of read operations=12

HDFS: Number of large read operations=0

HDFS: Number of write operations=2

Job Counters

Launched map tasks=3

Launched reduce tasks=1

Data-local map tasks=2

Rack-local map tasks=1

Total time spent by all maps in occupied slots (ms)=47617

Total time spent by all reduces in occupied slots (ms)=8244

Total time spent by all map tasks (ms)=47617

Total time spent by all reduce tasks (ms)=8244

Total vcore-milliseconds taken by all map tasks=47617

Total vcore-milliseconds taken by all reduce tasks=8244

Total megabyte-milliseconds taken by all map tasks=48759808

Total megabyte-milliseconds taken by all reduce tasks=8441856

Map-Reduce Framework

Map input records=2035

Map output records=14239

Map output bytes=155828

Map output materialized bytes=42717

Input split bytes=292

Combine input records=14239

Combine output records=2653

Reduce input groups=2402

Reduce shuffle bytes=42717

Reduce input records=2653

Reduce output records=2402

Spilled Records=5306

Shuffled Maps =3

Failed Shuffles=0

Merged Map outputs=3

GC time elapsed (ms)=881

CPU time spent (ms)=22320

Physical memory (bytes) snapshot=690192384

Virtual memory (bytes) snapshot=10862809088

Total committed heap usage (bytes)=380243968

Shuffle Errors

BAD_ID=0

CONNECTION=0

IO_ERROR=0

WRONG_LENGTH=0

WRONG_MAP=0

WRONG_REDUCE=0

File Input Format Counters

Bytes Read=101407

File Output Format Counters

Bytes Written=30167

8.查看wordcount运行结果（由于结果太长，只举出了部分结果）

[hadoop@ltt1 ~]$ hadoop fs -cat /output/*

worldwide, 4

would 1

writing 2

writing, 4

written 19

xmlenc 1

year 1

you 12

your 5

zlib 1

252.227-7014(a)(1)) 1

§ 1

“AS 1

“Contributor 1

“Contributor” 1

“Covered 1

“Executable” 1

“Initial 1

“Larger 1

“Licensable” 1

“License” 1

“Modifications” 1

“Original 1

“Participant”) 1

“Patent 1

“Source 1

“Your”) 1

“You” 2

“commercial 3

“control” 1

至此，通过一个wordcount的一个小栗子，简介实践了一下hdfs的创建文件夹，上传文件，查看目录，运行wordcount实例。

away30

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录