Hadoop安装使用
1.1 Hadoop简介
1.2 Mapreduce 案例
1.3 单机安装
1.4 HDFS分布式存储系统
1.5 伪分布式安装
1.6 课后作业
1.1 Hadoop简介
在文章的时候已经讲解了Hadoop的简介以及生态圈,有什么不懂的可以"出门右转"
Mapreduce 案例
首先运行计算就要启动mapreduce,我们刚刚才启动了start-dfs.sh (HDFS文件系统),所以不能计算那么我们把mapreduce的程序也启动起来。
启动mapreduce命令
[root@localhost hadoop]# start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /opt/hadoop/hadoop-2.6.0-cdh5.8.2/logs/yarn-root-resourcemanager-localhost.localdomain.out
localhost: starting nodemanager, logging to /opt/hadoop/hadoop-2.6.0-cdh5.8.2/logs/yarn-root-nodemanager-localhost.localdomain.out
启动完成以后我们调用"jps"命令看是否真的启动成功了
28113 NodeManager
28011 ResourceManager
28442 Jps
27137 NameNode
27401 SecondaryNameNode
27246 DataNode
可以很明显的看到多出来两个程序。
创建测试文件
创建一个测试的数据:
vi /opt/test/test.txt
麒麟
小张
张张
果哥
泽安
跨越
天天顺利
泽安
祖渊
张张
将测试文件上传到HDFS
首先我们要在HDFS上再创建两个文件,一个 input(输入) / ouput(输出)的文件夹。
[root@localhost ~]# hdfs dfs -mkdir /input /ouput
16/10/26 04:30:33 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
将测试数据上传到"input"文件夹
[root@localhost ~]# hdfs dfs -put /opt/test/test.txt /input
16/10/26 04:33:03 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
查看是上传成功
[root@localhost ~]# hdfs dfs -cat /input/test.txt
16/10/26 04:34:04 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
麒麟
小张
张张
果哥
泽安
跨越
天天顺利
泽安
祖渊
张张
调用Hadoop自带的WordCount方法
[root@localhost ~]# hadoop jar /opt/hadoop/hadoop-2.6.0-cdh5.8.2/share/hadoop/mapreduce2/hadoop-mapreduce-examples-2.6.0-cdh5.8.2.jar wordcount /input /ouput/test
16/10/26 04:49:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/10/26 04:49:38 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8032
16/10/26 04:49:42 INFO input.FileInputFormat: Total input paths to process : 1
16/10/26 04:49:43 INFO mapreduce.JobSubmitter: number of splits:1
16/10/26 04:49:44 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1477471653063_0001
16/10/26 04:49:46 INFO impl.YarnClientImpl: Submitted application application_1477471653063_0001
16/10/26 04:49:47 INFO mapreduce.Job: The url to track the job: http://localhost:8088/proxy/application_1477471653063_0001/
16/10/26 04:49:47 INFO mapreduce.Job: Running job: job_1477471653063_0001
16/10/26 04:50:21 INFO mapreduce.Job: Job job_1477471653063_0001 running in uber mode : false
16/10/26 04:50:21 INFO mapreduce.Job: map 0% reduce 0%
16/10/26 04:50:44 INFO mapreduce.Job: map 100% reduce 0%
16/10/26 04:51:04 INFO mapreduce.Job: map 100% reduce 100%
16/10/26 04:51:06 INFO mapreduce.Job: Job job_1477471653063_0001 completed successfully
16/10/26 04:51:06 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=116
FILE: Number of bytes written=232107
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=177
HDFS: Number of bytes written=78
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=18128
Total time spent by all reduces in occupied slots (ms)=17756
Total time spent by all map tasks (ms)=18128
Total time spent by all reduce tasks (ms)=17756
Total vcore-seconds taken by all map tasks=18128
Total vcore-seconds taken by all reduce tasks=17756
Total megabyte-seconds taken by all map tasks=18563072
Total megabyte-seconds taken by all reduce tasks=18182144
Map-Reduce Framework
Map input records=10
Map output records=10
Map output bytes=116
Map output materialized bytes=116
Input split bytes=101
Combine input records=10
Combine output records=8
Reduce input groups=8
Reduce shuffle bytes=116
Reduce input records=8
Reduce output records=8
Spilled Records=16
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=454
CPU time spent (ms)=3450
Physical memory (bytes) snapshot=306806784
Virtual memory (bytes) snapshot=3017633792
Total committed heap usage (bytes)=163450880
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=76
File Output Format Counters
Bytes Written=78
运行完成我们看看计算出来的结果:
[root@localhost ~]# hdfs dfs -ls /ouput/test
16/10/26 04:53:22 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
-rw-r--r-- 1 root supergroup 0 2016-10-26 04:51 /ouput/test/_SUCCESS
-rw-r--r-- 1 root supergroup 78 2016-10-26 04:51 /ouput/test/part-r-00000
[root@localhost ~]# hdfs dfs -cat /ouput/test/part-r-00000
16/10/26 04:53:41 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
天天顺利 1
小张 1
张张 2
果哥 1
泽安 2
祖渊 1
跨越 1
麒麟 1
HDFS分布式存储系统(Hadoop Distributed File System)
HDFS优点
- 高容错性
- 数据自动保存多个副本
- 副本都时候会自动恢复
- 适合批量处理
- 移动计算而非数据
- 数据位置暴露给计算框架
- 适合大数据处理
- GB/TB/甚至PB级数据
- 百万规模以上文件数量
- 10k+
- 可构建廉价的机器上
- 不管机器人有多么的垃圾,只要有空间内存就可以搭建Hadoop
*HDFS缺点*
- 低延迟数据访问
- 比如毫秒级
- 低延迟与高吞吐率
- 小文件存取
- 占用NameNode大量内存
- 寻道时间超过读取时间
- 并发写入/文件随机修改
- 一个文件只能有一个写者