Hadoop-2.6.0分布式单机环境搭建HDFS讲解Mapreduce示例

Hadoop安装使用

1.1 Hadoop简介
1.2 Mapreduce 案例
1.3 单机安装
1.4 HDFS分布式存储系统 
1.5 伪分布式安装
1.6 课后作业

1.1 Hadoop简介

    在文章的时候已经讲解了Hadoop的简介以及生态圈,有什么不懂的可以"出门右转"

http://dwz.cn/4rdSdU

Mapreduce 案例

首先运行计算就要启动mapreduce,我们刚刚才启动了start-dfs.sh (HDFS文件系统),所以不能计算那么我们把mapreduce的程序也启动起来。

启动mapreduce命令
[root@localhost hadoop]# start-yarn.sh 
starting yarn daemons
starting resourcemanager, logging to /opt/hadoop/hadoop-2.6.0-cdh5.8.2/logs/yarn-root-resourcemanager-localhost.localdomain.out
localhost: starting nodemanager, logging to /opt/hadoop/hadoop-2.6.0-cdh5.8.2/logs/yarn-root-nodemanager-localhost.localdomain.out

启动完成以后我们调用"jps"命令看是否真的启动成功了
28113 NodeManager
28011 ResourceManager
28442 Jps
27137 NameNode
27401 SecondaryNameNode
27246 DataNode

可以很明显的看到多出来两个程序。

创建测试文件

创建一个测试的数据:
vi /opt/test/test.txt
麒麟
小张
张张
果哥
泽安
跨越
天天顺利
泽安
祖渊
张张

将测试文件上传到HDFS
首先我们要在HDFS上再创建两个文件,一个 input(输入) / ouput(输出)的文件夹。

[root@localhost ~]# hdfs dfs -mkdir /input /ouput
16/10/26 04:30:33 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

将测试数据上传到"input"文件夹
[root@localhost ~]# hdfs dfs -put /opt/test/test.txt /input
16/10/26 04:33:03 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

查看是上传成功
[root@localhost ~]# hdfs dfs -cat /input/test.txt
16/10/26 04:34:04 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
麒麟
小张
张张
果哥
泽安
跨越
天天顺利
泽安
祖渊
张张

调用Hadoop自带的WordCount方法

[root@localhost ~]# hadoop jar /opt/hadoop/hadoop-2.6.0-cdh5.8.2/share/hadoop/mapreduce2/hadoop-mapreduce-examples-2.6.0-cdh5.8.2.jar wordcount /input /ouput/test
16/10/26 04:49:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/10/26 04:49:38 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8032
16/10/26 04:49:42 INFO input.FileInputFormat: Total input paths to process : 1
16/10/26 04:49:43 INFO mapreduce.JobSubmitter: number of splits:1
16/10/26 04:49:44 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1477471653063_0001
16/10/26 04:49:46 INFO impl.YarnClientImpl: Submitted application application_1477471653063_0001
16/10/26 04:49:47 INFO mapreduce.Job: The url to track the job: http://localhost:8088/proxy/application_1477471653063_0001/
16/10/26 04:49:47 INFO mapreduce.Job: Running job: job_1477471653063_0001
16/10/26 04:50:21 INFO mapreduce.Job: Job job_1477471653063_0001 running in uber mode : false
16/10/26 04:50:21 INFO mapreduce.Job:  map 0% reduce 0%
16/10/26 04:50:44 INFO mapreduce.Job:  map 100% reduce 0%
16/10/26 04:51:04 INFO mapreduce.Job:  map 100% reduce 100%
16/10/26 04:51:06 INFO mapreduce.Job: Job job_1477471653063_0001 completed successfully
16/10/26 04:51:06 INFO mapreduce.Job: Counters: 49
    File System Counters
            FILE: Number of bytes read=116
            FILE: Number of bytes written=232107
            FILE: Number of read operations=0
            FILE: Number of large read operations=0
            FILE: Number of write operations=0
            HDFS: Number of bytes read=177
            HDFS: Number of bytes written=78
            HDFS: Number of read operations=6
            HDFS: Number of large read operations=0
            HDFS: Number of write operations=2
    Job Counters 
            Launched map tasks=1
            Launched reduce tasks=1
            Data-local map tasks=1
            Total time spent by all maps in occupied slots (ms)=18128
            Total time spent by all reduces in occupied slots (ms)=17756
            Total time spent by all map tasks (ms)=18128
            Total time spent by all reduce tasks (ms)=17756
            Total vcore-seconds taken by all map tasks=18128
            Total vcore-seconds taken by all reduce tasks=17756
            Total megabyte-seconds taken by all map tasks=18563072
            Total megabyte-seconds taken by all reduce tasks=18182144
    Map-Reduce Framework
            Map input records=10
            Map output records=10
            Map output bytes=116
            Map output materialized bytes=116
            Input split bytes=101
            Combine input records=10
            Combine output records=8
            Reduce input groups=8
            Reduce shuffle bytes=116
            Reduce input records=8
            Reduce output records=8
            Spilled Records=16
            Shuffled Maps =1
            Failed Shuffles=0
            Merged Map outputs=1
            GC time elapsed (ms)=454
            CPU time spent (ms)=3450
            Physical memory (bytes) snapshot=306806784
            Virtual memory (bytes) snapshot=3017633792
            Total committed heap usage (bytes)=163450880
    Shuffle Errors
            BAD_ID=0
            CONNECTION=0
            IO_ERROR=0
            WRONG_LENGTH=0
            WRONG_MAP=0
            WRONG_REDUCE=0
    File Input Format Counters 
            Bytes Read=76
    File Output Format Counters 
            Bytes Written=78

运行完成我们看看计算出来的结果:
[root@localhost ~]# hdfs dfs -ls /ouput/test
16/10/26 04:53:22 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
-rw-r--r--   1 root supergroup          0 2016-10-26 04:51 /ouput/test/_SUCCESS
-rw-r--r--   1 root supergroup         78 2016-10-26 04:51 /ouput/test/part-r-00000

[root@localhost ~]# hdfs dfs -cat /ouput/test/part-r-00000
16/10/26 04:53:41 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
天天顺利        1
小张    1
张张    2
果哥    1
泽安    2
祖渊    1
跨越    1
麒麟    1

HDFS分布式存储系统(Hadoop Distributed File System)

HDFS优点

  1. 高容错性
    1. 数据自动保存多个副本
    2. 副本都时候会自动恢复
  2. 适合批量处理
    1. 移动计算而非数据
    2. 数据位置暴露给计算框架
  3. 适合大数据处理
    1. GB/TB/甚至PB级数据
    2. 百万规模以上文件数量
    3. 10k+
  4. 可构建廉价的机器上
    1. 不管机器人有多么的垃圾,只要有空间内存就可以搭建Hadoop

*HDFS缺点*

  1. 低延迟数据访问
    1. 比如毫秒级
    2. 低延迟与高吞吐率
  2. 小文件存取
    1. 占用NameNode大量内存
    2. 寻道时间超过读取时间
  3. 并发写入/文件随机修改
    1. 一个文件只能有一个写者
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值