验证hadoop伪分布式

最新推荐文章于 2022-03-10 11:37:00 发布

wec153452679

最新推荐文章于 2022-03-10 11:37:00 发布

阅读量1.4k

点赞数 1

分类专栏： hadoop入门

本文链接：https://blog.csdn.net/wec153452679/article/details/46798905

版权

hadoop入门专栏收录该内容

5 篇文章 0 订阅

订阅专栏

启动hadoop，调用jps命令，会看到总共有6个进程在运行。首先介绍下 hadoop进程的作用及地位意义

1）ResourceManager YARN(Yet Another Resource Negotiate)的老大

2)SecondaryNameNode NameNode的助理，

3)NameNode HDFS的老大，“仓库管理员”

4)DataNode HDFS的小弟，“具体的仓库”

5)Jps

6)NodeManager YARN(Yet Another Resource Negotiate)的小弟

这些进程在一台机器上并不好，会互相争抢资源，最后是分布在不同的机器上。

为了验证HDFS好不好用，试试上传一个文件。操作hadoop的命令在bin目录下，sbin目录下是hadoop的启动停止命令。

如下命令，转到hadoop的目录文件下

[root@itcast01 ~]# cd /itcast/hadoop-2.4.1/bin/
[root@itcast01 bin]# ls
container-executor  hdfs      mapred.cmd               yarn
hadoop              hdfs.cmd  rcc                      yarn.cmd
hadoop.cmd          mapred    test-container-executor

可以看到里面有很多脚本，老的程序员习惯用 hadoop ，现在我们可以用 hdfs 、 yarn 等命令

不知道怎么样，可以用帮助，例如想要查看hadoop怎么用如下

[root@itcast01 bin]# hadoop
Usage: hadoop [--config confdir] COMMAND
       where COMMAND is one of:
  fs                   run a generic filesystem user client
  version              print the version
  jar <jar>            run a jar file
  checknative [-a|-h]  check native hadoop and compression libraries availability
  distcp <srcurl> <desturl> copy file or directories recursively
  archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
  classpath            prints the class path needed to get the
                       Hadoop jar and the required libraries
  daemonlog            get/set the log level for each daemon
 or
  CLASSNAME            run the class named CLASSNAME

Most commands print help when invoked w/o parameters.

在这里，可以调用 hadoop version 命令，查看hadoop的版本信息，可以用hadoop fs

查看本机HDFS系统含有的文件，发现在上传文件之前，一个文件夹都没有。

[root@itcast01 bin]# hadoop fs -ls hdfs://itcast01:9000/
[root@itcast01 bin]#

上传本地的/root/install.log文件到HDFS并且改名字为log.txt，然后查看 hdfs://itcast01:9000/下的文件，会看到这个log.txt文件,这就将一个文件从本地文件系统上传到HDFS上了

[root@itcast01 bin]# hadoop fs -put /root/install.log hdfs://itcast01:9000/log.txt
[root@itcast01 bin]# hadoop fs -ls hdfs://itcast01:9000/
Found 1 items
-rw-r--r--   1 root supergroup      49448 2015-07-08 13:43 hdfs://itcast01:9000/log.txt
[root@itcast01 bin]#

还可以通过hdfs的文件管理界面来查看，在浏览器输入 192.168.8.118:50070 会看到 hdfs的管理界面，然后，在utility-------browser the file system 里面看到里面有一个叫log.txt的文件。主要的参数包括

permission：权限

owner：拥有着，属于哪个用户

group:组

Size:文件大小

Replication:副本这里副本的数量实在hdfs-set.xml里面设置的副本数量为1，因为是伪分布式，现在只有一台机器，因此只保存了一个副本。

我们尝试现在这个log.txt 点开log.txt -------download后，发现先网址跳转到http://itcast01:50075/webhdfs/v1/log.txt?op=OPEN&namenoderpcaddress=itcast01:9000&offset=0，无法打开该页，发现跳转后的网址访问的是itcast而不是IP地址了，所以需要我们配置windows下的etc文件。地址是C:\Windows\System32\drivers\etc\hosts文件在最后一行追加 “192.168.8.118 itcast01” 注意修改完host文件后，文件类型不要改变。改完以后，点击download就能够下载文件了。

将文件从HDFS下载到本机

[root@itcast01 bin]# hadoop fs -get hdfs://itcast01:9000/log.txt /home/123.txt         下载HDFS上的文件到
[root@itcast01 bin]# cd /home
[root@itcast01 home]# ls
123.txt  lost+found  wec                                                                有了123.txt说明已经下载完成，可以通过 more 123.txt命令查看具体内容
[root@itcast01 home]#

至此为止，HDFS文件的上传、查看、下载均没有问题了，说明HDFS是可以使用的，下面验证YARN能否使用

通过管理界面来验证YARN ，管理界面为192.168.8.118:8088

在管理界面，会看到活跃节点 Active Nodes 有1活跃的节点，这个节点代表子节点，也就是NodeManager。YARN的小弟叫做NodeManager。YARN的老大叫做ResourceManager

验证MapReduce的统计功能

Linux自带的wc命令可以完成统计贴代码如下

首选是新建一个word.txt文件

[root@itcast01 ~]# vim word.txt
hello tom
hello jerry
hello tom
~
~
~
~
~
~
~
~
~
~
~
~
~
~
~
"word.txt" 3L, 32C written

统计word.txt里面有多少个单词，看代码

[root@itcast01 ~]# wc word.txt 
 3  6 32 word.txt                                              3行  6个单词  32个字符

hadoop可以完成词频的统计功能，也就是统计相同词频出现的次数
下面用mapreduce统计这些单词，因为MapReduce设计之初是打算用来做海量统计的，而海量数据应该存在HDFS上面，因此应该将word.txt上传到HDFS上。代码如下

[root@itcast01 ~]# hadoop fs -put /root/word.txt hdfs://itcast01:9000/word.avi              linux下文件没有格式，因此.avi .txt其实都是文本文件u
[root@itcast01 ~]#

仅仅是应mapReduce统计词频，发现很慢，主要是因为需要启动，需要读取文件，操作对象是海量数据，当有海量数据的时候，MR的优势就会显现出来。
spark是内存计算，可以运行在yarn上，发现yarn很牛X，

利用MR统计次词频，命令原码以及结果如下图

[root@itcast01 ~]# wc word.txt                    linux自带的统计文本词语的命令
 3  6 32 word.txt
[root@itcast01 ~]# hadoop fs -put /root/word.txt hdfs://itcast01:9000/word.avi        上传文件到HHDFS
[root@itcast01 ~]# cd /itcast/hadoop-2.4.1/share/hadoop/                              转到HHadoop文件夹
[root@itcast01 hadoop]# ls
common  hdfs  httpfs  mapreduce  tools  yarn
[root@itcast01 hadoop]# cd mapreduce/                                                 转到MR文件夹
[root@itcast01 mapreduce]# ls                                                         发现有很多JAR包
hadoop-mapreduce-client-app-2.4.1.jar
hadoop-mapreduce-client-common-2.4.1.jar
hadoop-mapreduce-client-core-2.4.1.jar
hadoop-mapreduce-client-hs-2.4.1.jar
hadoop-mapreduce-client-hs-plugins-2.4.1.jar
hadoop-mapreduce-client-jobclient-2.4.1.jar
hadoop-mapreduce-client-jobclient-2.4.1-tests.jar
hadoop-mapreduce-client-shuffle-2.4.1.jar
hadoop-mapreduce-examples-2.4.1.jar
lib
lib-examples
sources
[root@itcast01 mapreduce]# hadoop                                                     看看hadoop文件下有什么命令，输入hadoop回车，会有命令提示的
Usage: hadoop [--config confdir] COMMAND
       where COMMAND is one of:
  fs                   run a generic filesystem user client
  version              print the version
  jar <jar>            run a jar file
  checknative [-a|-h]  check native hadoop and compression libraries availability
  distcp <srcurl> <desturl> copy file or directories recursively
  archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
  classpath            prints the class path needed to get the
                       Hadoop jar and the required libraries
  daemonlog            get/set the log level for each daemon
 or
  CLASSNAME            run the class named CLASSNAME

Most commands print help when invoked w/o parameters.
[root@itcast01 mapreduce]# hadoop jar                                                 看需要什么参数
RunJar jarFile [mainClass] args...
[root@itcast01 mapreduce]# hadoop jar hadoop-mapreduce-examples-2.4.1.jar 
An example program must be given as the first argument.
Valid program names are:
  aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
  aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
  bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.
  dbcount: An example job that count the pageview counts from a database.
  distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.
  grep: A map/reduce program that counts the matches of a regex in the input.
  join: A job that effects a join over sorted, equally partitioned datasets
  multifilewc: A job that counts words from several files.
  pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
  pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
  randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
  randomwriter: A map/reduce program that writes 10GB of random data per node.
  secondarysort: An example defining a secondary sort to the reduce.
  sort: A map/reduce program that sorts the data written by the random writer.
  sudoku: A sudoku solver.
  teragen: Generate data for the terasort
  terasort: Run the terasort
  teravalidate: Checking results of terasort
  wordcount: A map/reduce program that counts the words in the input files.
  wordmean: A map/reduce program that counts the average length of the words in the input files.
  wordmedian: A map/reduce program that counts the median length of the words in the input files.
  wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.
[root@itcast01 mapreduce]# hadoop jar hadoop-mapreduce-examples-2.4.1.jar wordcount
Usage: wordcount <in> <out>
[root@itcast01 mapreduce]# hadoop jar hadoop-mapreduce-examples-2.4.1.jar wordcount hdfs://itcast01:9000/word.avi hdfs://itcast01:9000/out  <span style="color:#ff0000;">此时立刻克隆一个连接，打开jps，后面有对此的介绍</span>
15/07/08 21:12:25 INFO client.RMProxy: Connecting to ResourceManager at itcast01/192.168.8.118:8032
15/07/08 21:12:48 INFO input.FileInputFormat: Total input paths to process : 1
15/07/08 21:12:56 INFO mapreduce.JobSubmitter: number of splits:1
15/07/08 21:13:01 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1436259707754_0001
15/07/08 21:13:23 INFO impl.YarnClientImpl: Submitted application application_1436259707754_0001
15/07/08 21:13:23 INFO mapreduce.Job: The url to track the job: http://itcast01:8088/proxy/application_1436259707754_0001/
15/07/08 21:13:23 INFO mapreduce.Job: Running job: job_1436259707754_0001
15/07/08 21:14:48 INFO mapreduce.Job: Job job_1436259707754_0001 running in uber mode : false
15/07/08 21:14:48 INFO mapreduce.Job:  map 0% reduce 0%
15/07/08 21:16:06 INFO mapreduce.Job:  map 100% reduce 0%
15/07/08 21:16:56 INFO mapreduce.Job:  map 100% reduce 100%
15/07/08 21:16:59 INFO mapreduce.Job: Job job_1436259707754_0001 completed successfully
15/07/08 21:17:00 INFO mapreduce.Job: Counters: 49
        File System Counters
                FILE: Number of bytes read=40
                FILE: Number of bytes written=185833
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=126
                HDFS: Number of bytes written=22
                HDFS: Number of read operations=6
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters 
                Launched map tasks=1
                Launched reduce tasks=1
                Data-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=76880
                Total time spent by all reduces in occupied slots (ms)=41905
                Total time spent by all map tasks (ms)=76880
                Total time spent by all reduce tasks (ms)=41905
                Total vcore-seconds taken by all map tasks=76880
                Total vcore-seconds taken by all reduce tasks=41905
                Total megabyte-seconds taken by all map tasks=78725120
                Total megabyte-seconds taken by all reduce tasks=42910720
        Map-Reduce Framework
                Map input records=3
                Map output records=6
                Map output bytes=56
                Map output materialized bytes=40
                Input split bytes=94
                Combine input records=6
                Combine output records=3
                Reduce input groups=3
                Reduce shuffle bytes=40
                Reduce input records=3
                Reduce output records=3
                Spilled Records=6
                Shuffled Maps =1
                Failed Shuffles=0
                Merged Map outputs=1
                GC time elapsed (ms)=539
                CPU time spent (ms)=5880
                Physical memory (bytes) snapshot=320163840
                Virtual memory (bytes) snapshot=1685929984
                Total committed heap usage (bytes)=136122368
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters 
                Bytes Read=32
        File Output Format Counters 
                Bytes Written=22
[root@itcast01 mapreduce]#

至此，结果统计完毕

下面是查看结果文件

[root@itcast01 mapreduce]# hadoop fs -ls hdfs://itcast01:9000/                            转到HDFS文件上，看内容
Found 4 items
-rw-r--r--   1 root supergroup      49448 2015-07-08 13:43 hdfs://itcast01:9000/log.txt
drwxr-xr-x   - root supergroup          0 2015-07-08 21:16 hdfs://itcast01:9000/out
drwx------   - root supergroup          0 2015-07-08 21:12 hdfs://itcast01:9000/tmp
-rw-r--r--   1 root supergroup         32 2015-07-08 19:57 hdfs://itcast01:9000/word.avi
[root@itcast01 mapreduce]# hadoop fs -ls hdfs://itcast01:9000/out                         前面设置的输出到out文件夹
Found 2 items
-rw-r--r--   1 root supergroup          0 2015-07-08 21:16 hdfs://itcast01:9000/out/_SUCCESS
-rw-r--r--   1 root supergroup         22 2015-07-08 21:16 hdfs://itcast01:9000/out/part-r-00000
[root@itcast01 mapreduce]# hadoop fs -cat hdfs://itcast01:9000/out/part-r-00000           out文件夹下的part-r-00000是结果文件    
hello   3
jerry   1
tom     2
[root@itcast01 mapreduce]#

发现统计结果是对的 hello 3个，jerry 1个，tom2个

发现统计的时候很慢，因为用的是伪分布式，从始至终就还是一台机器在运行，还是一个人干活，因此伪分布式还是不能利用更多的资源，所以还是慢的

在上面运行hadoop的wordcount的时候，我们打开了一个链接，来看看她的JPS进程。

jps的意思就是查看java进程运行状态。

5423 NameNode
5832 ResourceManager
5515 DataNode
14498 RunJar        使用hadoop jar命令，启动一个java程序
5927 NodeManager
14604 Jps
5683 SecondaryNameNode

资源的分配交给 resource Manager 进程

任务的监控交给 MRAppMaster 进程

YarnChild进程我没有找到呢。。。里面运行着map对象会在reduce对象

wec153452679

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录