关闭

验证hadoop伪分布式

294人阅读 评论(0) 收藏 举报
分类:

启动hadoop,调用jps命令,会看到总共有6个进程在运行。首先介绍下 hadoop进程的作用及地位意义


1)ResourceManager                  YARN(Yet Another Resource Negotiate)的老大

2)SecondaryNameNode              NameNode的助理,

3)NameNode                                  HDFS的老大,“仓库管理员”

4)DataNode                                     HDFS的小弟,“具体的仓库”

5)Jps

6)NodeManager                             YARN(Yet Another Resource Negotiate)的小弟

这些进程在一台机器上并不好,会互相争抢资源,最后是分布在不同的机器上。



为了验证HDFS好不好用,试试上传一个文件。操作hadoop的命令在bin目录下,sbin目录下是hadoop的启动停止命令。



如下命令,转到hadoop的目录文件下

[root@itcast01 ~]# cd /itcast/hadoop-2.4.1/bin/
[root@itcast01 bin]# ls
container-executor  hdfs      mapred.cmd               yarn
hadoop              hdfs.cmd  rcc                      yarn.cmd
hadoop.cmd          mapred    test-container-executor
可以看到里面有很多脚本,老的程序员习惯用 hadoop ,现在我们可以用   hdfs  、    yarn  等命令

不知道怎么样,可以用帮助,例如想要查看hadoop怎么用如下

[root@itcast01 bin]# hadoop
Usage: hadoop [--config confdir] COMMAND
       where COMMAND is one of:
  fs                   run a generic filesystem user client
  version              print the version
  jar <jar>            run a jar file
  checknative [-a|-h]  check native hadoop and compression libraries availability
  distcp <srcurl> <desturl> copy file or directories recursively
  archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
  classpath            prints the class path needed to get the
                       Hadoop jar and the required libraries
  daemonlog            get/set the log level for each daemon
 or
  CLASSNAME            run the class named CLASSNAME

Most commands print help when invoked w/o parameters.
在这里,可以调用 hadoop version 命令,查看hadoop的版本信息,可以用hadoop fs



查看本机HDFS系统含有的文件,发现在上传文件之前,一个文件夹都没有。

[root@itcast01 bin]# hadoop fs -ls hdfs://itcast01:9000/
[root@itcast01 bin]# 



上传本地的/root/install.log文件到HDFS并且改名字为log.txt,然后查看 hdfs://itcast01:9000/下的文件,会看到这个log.txt文件,这就将一个文件从本地文件系统上传到HDFS上了

[root@itcast01 bin]# hadoop fs -put /root/install.log hdfs://itcast01:9000/log.txt
[root@itcast01 bin]# hadoop fs -ls hdfs://itcast01:9000/
Found 1 items
-rw-r--r--   1 root supergroup      49448 2015-07-08 13:43 hdfs://itcast01:9000/log.txt
[root@itcast01 bin]# 

还可以通过hdfs的文件管理界面来查看,在浏览器输入  192.168.8.118:50070 会看到 hdfs的管理界面,然后,在utility-------browser the file system 里面看到里面有一个叫log.txt的文件。主要的参数包括

permission:权限

owner:拥有着,属于哪个用户

group:组

Size:文件大小

Replication:副本             这里副本的数量实在hdfs-set.xml里面设置的副本数量为1,因为是伪分布式,现在只有一台机器,因此只保存了一个副本。



我们尝试现在这个log.txt     点开log.txt -------download后,发现先网址跳转到http://itcast01:50075/webhdfs/v1/log.txt?op=OPEN&namenoderpcaddress=itcast01:9000&offset=0,无法打开该页,发现跳转后的网址访问的是itcast而不是IP地址了,所以需要我们配置windows下的etc文件。地址是C:\Windows\System32\drivers\etc\hosts文件 在最后一行追加  “192.168.8.118           itcast01” 注意修改完host文件后,文件类型不要改变。改完以后,点击download就能够下载文件了。


将文件从HDFS下载到本机

[root@itcast01 bin]# hadoop fs -get hdfs://itcast01:9000/log.txt /home/123.txt         下载HDFS上的文件到
[root@itcast01 bin]# cd /home
[root@itcast01 home]# ls
123.txt  lost+found  wec                                                                有了123.txt说明已经下载完成,可以通过 more 123.txt命令查看具体内容
[root@itcast01 home]# 

至此为止,HDFS文件的上传、查看、下载均没有问题了,说明HDFS是可以使用的,下面验证YARN能否使用


通过管理界面来验证YARN ,管理界面为192.168.8.118:8088

在管理界面,会看到活跃节点 Active Nodes 有1活跃的节点,这个节点代表子节点,也就是NodeManager。YARN的小弟叫做NodeManager。YARN的老大叫做ResourceManager


验证MapReduce的统计功能

Linux自带的wc命令可以完成统计  贴代码如下 

首选是新建一个word.txt文件

[root@itcast01 ~]# vim word.txt
hello tom
hello jerry
hello tom
~
~
~
~
~
~
~
~
~
~
~
~
~
~
~
"word.txt" 3L, 32C written  

统计word.txt里面有多少个单词,看代码

[root@itcast01 ~]# wc word.txt 
 3  6 32 word.txt                                              3行  6个单词  32个字符



hadoop可以完成词频的统计功能,也就是统计相同词频出现的次数
下面用mapreduce统计这些单词,因为MapReduce设计之初是打算用来做海量统计的,而海量数据应该存在HDFS上面,因此应该将word.txt上传到HDFS上。代码如下

[root@itcast01 ~]# hadoop fs -put /root/word.txt hdfs://itcast01:9000/word.avi              linux下文件没有格式,因此.avi .txt其实都是文本文件u
[root@itcast01 ~]# 
仅仅是应mapReduce统计词频,发现很慢,主要是因为需要启动,需要读取文件,操作对象是海量数据,当有海量数据的时候,MR的优势就会显现出来。
spark是内存计算,可以运行在yarn上,发现yarn很牛X,

利用MR统计次词频,命令原码以及结果如下图

[root@itcast01 ~]# wc word.txt                    linux自带的统计文本词语的命令
 3  6 32 word.txt
[root@itcast01 ~]# hadoop fs -put /root/word.txt hdfs://itcast01:9000/word.avi        上传文件到HHDFS
[root@itcast01 ~]# cd /itcast/hadoop-2.4.1/share/hadoop/                              转到HHadoop文件夹
[root@itcast01 hadoop]# ls
common  hdfs  httpfs  mapreduce  tools  yarn
[root@itcast01 hadoop]# cd mapreduce/                                                 转到MR文件夹
[root@itcast01 mapreduce]# ls                                                         发现有很多JAR包
hadoop-mapreduce-client-app-2.4.1.jar
hadoop-mapreduce-client-common-2.4.1.jar
hadoop-mapreduce-client-core-2.4.1.jar
hadoop-mapreduce-client-hs-2.4.1.jar
hadoop-mapreduce-client-hs-plugins-2.4.1.jar
hadoop-mapreduce-client-jobclient-2.4.1.jar
hadoop-mapreduce-client-jobclient-2.4.1-tests.jar
hadoop-mapreduce-client-shuffle-2.4.1.jar
hadoop-mapreduce-examples-2.4.1.jar
lib
lib-examples
sources
[root@itcast01 mapreduce]# hadoop                                                     看看hadoop文件下有什么命令,输入hadoop回车,会有命令提示的
Usage: hadoop [--config confdir] COMMAND
       where COMMAND is one of:
  fs                   run a generic filesystem user client
  version              print the version
  jar <jar>            run a jar file
  checknative [-a|-h]  check native hadoop and compression libraries availability
  distcp <srcurl> <desturl> copy file or directories recursively
  archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
  classpath            prints the class path needed to get the
                       Hadoop jar and the required libraries
  daemonlog            get/set the log level for each daemon
 or
  CLASSNAME            run the class named CLASSNAME

Most commands print help when invoked w/o parameters.
[root@itcast01 mapreduce]# hadoop jar                                                 看需要什么参数
RunJar jarFile [mainClass] args...
[root@itcast01 mapreduce]# hadoop jar hadoop-mapreduce-examples-2.4.1.jar 
An example program must be given as the first argument.
Valid program names are:
  aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
  aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
  bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.
  dbcount: An example job that count the pageview counts from a database.
  distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.
  grep: A map/reduce program that counts the matches of a regex in the input.
  join: A job that effects a join over sorted, equally partitioned datasets
  multifilewc: A job that counts words from several files.
  pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
  pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
  randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
  randomwriter: A map/reduce program that writes 10GB of random data per node.
  secondarysort: An example defining a secondary sort to the reduce.
  sort: A map/reduce program that sorts the data written by the random writer.
  sudoku: A sudoku solver.
  teragen: Generate data for the terasort
  terasort: Run the terasort
  teravalidate: Checking results of terasort
  wordcount: A map/reduce program that counts the words in the input files.
  wordmean: A map/reduce program that counts the average length of the words in the input files.
  wordmedian: A map/reduce program that counts the median length of the words in the input files.
  wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.
[root@itcast01 mapreduce]# hadoop jar hadoop-mapreduce-examples-2.4.1.jar wordcount
Usage: wordcount <in> <out>
[root@itcast01 mapreduce]# hadoop jar hadoop-mapreduce-examples-2.4.1.jar wordcount hdfs://itcast01:9000/word.avi hdfs://itcast01:9000/out  <span style="color:#ff0000;">此时立刻克隆一个连接,打开jps,后面有对此的介绍</span>
15/07/08 21:12:25 INFO client.RMProxy: Connecting to ResourceManager at itcast01/192.168.8.118:8032
15/07/08 21:12:48 INFO input.FileInputFormat: Total input paths to process : 1
15/07/08 21:12:56 INFO mapreduce.JobSubmitter: number of splits:1
15/07/08 21:13:01 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1436259707754_0001
15/07/08 21:13:23 INFO impl.YarnClientImpl: Submitted application application_1436259707754_0001
15/07/08 21:13:23 INFO mapreduce.Job: The url to track the job: http://itcast01:8088/proxy/application_1436259707754_0001/
15/07/08 21:13:23 INFO mapreduce.Job: Running job: job_1436259707754_0001
15/07/08 21:14:48 INFO mapreduce.Job: Job job_1436259707754_0001 running in uber mode : false
15/07/08 21:14:48 INFO mapreduce.Job:  map 0% reduce 0%
15/07/08 21:16:06 INFO mapreduce.Job:  map 100% reduce 0%
15/07/08 21:16:56 INFO mapreduce.Job:  map 100% reduce 100%
15/07/08 21:16:59 INFO mapreduce.Job: Job job_1436259707754_0001 completed successfully
15/07/08 21:17:00 INFO mapreduce.Job: Counters: 49
        File System Counters
                FILE: Number of bytes read=40
                FILE: Number of bytes written=185833
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=126
                HDFS: Number of bytes written=22
                HDFS: Number of read operations=6
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters 
                Launched map tasks=1
                Launched reduce tasks=1
                Data-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=76880
                Total time spent by all reduces in occupied slots (ms)=41905
                Total time spent by all map tasks (ms)=76880
                Total time spent by all reduce tasks (ms)=41905
                Total vcore-seconds taken by all map tasks=76880
                Total vcore-seconds taken by all reduce tasks=41905
                Total megabyte-seconds taken by all map tasks=78725120
                Total megabyte-seconds taken by all reduce tasks=42910720
        Map-Reduce Framework
                Map input records=3
                Map output records=6
                Map output bytes=56
                Map output materialized bytes=40
                Input split bytes=94
                Combine input records=6
                Combine output records=3
                Reduce input groups=3
                Reduce shuffle bytes=40
                Reduce input records=3
                Reduce output records=3
                Spilled Records=6
                Shuffled Maps =1
                Failed Shuffles=0
                Merged Map outputs=1
                GC time elapsed (ms)=539
                CPU time spent (ms)=5880
                Physical memory (bytes) snapshot=320163840
                Virtual memory (bytes) snapshot=1685929984
                Total committed heap usage (bytes)=136122368
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters 
                Bytes Read=32
        File Output Format Counters 
                Bytes Written=22
[root@itcast01 mapreduce]# 

至此,结果统计完毕

下面是查看结果文件


[root@itcast01 mapreduce]# hadoop fs -ls hdfs://itcast01:9000/                            转到HDFS文件上,看内容
Found 4 items
-rw-r--r--   1 root supergroup      49448 2015-07-08 13:43 hdfs://itcast01:9000/log.txt
drwxr-xr-x   - root supergroup          0 2015-07-08 21:16 hdfs://itcast01:9000/out
drwx------   - root supergroup          0 2015-07-08 21:12 hdfs://itcast01:9000/tmp
-rw-r--r--   1 root supergroup         32 2015-07-08 19:57 hdfs://itcast01:9000/word.avi
[root@itcast01 mapreduce]# hadoop fs -ls hdfs://itcast01:9000/out                         前面设置的输出到out文件夹
Found 2 items
-rw-r--r--   1 root supergroup          0 2015-07-08 21:16 hdfs://itcast01:9000/out/_SUCCESS
-rw-r--r--   1 root supergroup         22 2015-07-08 21:16 hdfs://itcast01:9000/out/part-r-00000
[root@itcast01 mapreduce]# hadoop fs -cat hdfs://itcast01:9000/out/part-r-00000           out文件夹下的part-r-00000是结果文件    
hello   3
jerry   1
tom     2
[root@itcast01 mapreduce]# 

发现统计结果是对的  hello 3个,jerry 1个,tom2个


发现统计的时候很慢,因为用的是伪分布式,从始至终就还是一台机器在运行,还是一个人干活,因此伪分布式还是不能利用更多的资源,所以还是慢的



在上面运行hadoop的wordcount的时候,我们打开了一个链接,来看看她的JPS进程。

jps的意思就是查看java进程运行状态。

5423 NameNode
5832 ResourceManager
5515 DataNode
14498 RunJar        使用hadoop jar命令,启动一个java程序
5927 NodeManager
14604 Jps
5683 SecondaryNameNode

资源的分配交给 resource Manager 进程

任务的监控交给 MRAppMaster  进程

YarnChild进程我没有找到呢。。。里面运行着map对象会在reduce对象





1
0

查看评论
* 以上用户言论只代表其个人观点,不代表CSDN网站的观点或立场
    个人资料
    • 访问:2726次
    • 积分:123
    • 等级:
    • 排名:千里之外
    • 原创:8篇
    • 转载:2篇
    • 译文:0篇
    • 评论:1条
    文章分类
    最新评论