1、Hadoop安装
1)集群规划:
服务器flink105 | 服务器flink106 | 服务器flink107 | |
---|---|---|---|
HDFS | NameNode DataNode | DataNode | DataNode SecondaryNameNode |
Yarn | NodeManager | Resourcemanager NodeManager | NodeManager |
注意:尽量使用离线方式安装
2、项目经验之HDFS存储多目录
1)确认HDFS的存储目录,保证存储在空间最大硬盘上
[root@Flink105 mapreduce]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/centos-root 17G 4.6G 13G 27% /
devtmpfs 898M 0 898M 0% /dev
tmpfs 910M 0 910M 0% /dev/shm
tmpfs 910M 9.7M 901M 2% /run
tmpfs 910M 0 910M 0% /sys/fs/cgroup
/dev/sda1 1014M 146M 869M 15% /boot
tmpfs 182M 0 182M 0% /run/user/0
[root@Flink105 mapreduce]#
2)在hdfs-site.xml文件中配置多目录,最好提前配置好,否则更改目录需要重新启动集群
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///${hadoop.tmp.dir}/dfs/data1,file:///hd2/dfs/data2,file:///hd3/dfs/data3,file:///hd4/dfs/data4</value>
</property>
3)自己的hadoop已经启动了,访问50070端口
http://flink105:50070/explorer.html#/
3、项目经验之基准测试
1) 测试HDFS写性能
测试内容:向HDFS集群写10个128M的文件
[root@Flink105 mapreduce]# hadoop jar /opt/hadoop/module/hadoop-2.7.2/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.2-tests.jar TestDFSIO -write -nrFiles 10 -fileSize 128MB
再次访问:
http://flink105:50070/explorer.html#/benchmarks/TestDFSIO
2)测试HDFS读性能
测试内容:读取HDFS集群10个128M的文件
[root@Flink105 mapreduce]# hadoop jar /opt/hadoop/module/hadoop-2.7.2/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.2-tests.jar TestDFSIO -read -nrFiles 10 -fileSize 128MB
3)删除测试生成数据
[root@Flink105 mapreduce]# hadoop jar /opt/hadoop/module/hadoop-2.7.2/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.2-tests.jar TestDFSIO -clean
20/04/03 15:55:06 INFO fs.TestDFSIO: TestDFSIO.1.8
20/04/03 15:55:06 INFO fs.TestDFSIO: nrFiles = 1
20/04/03 15:55:06 INFO fs.TestDFSIO: nrBytes (MB) = 1.0
20/04/03 15:55:06 INFO fs.TestDFSIO: bufferSize = 1000000
20/04/03 15:55:06 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO
20/04/03 15:55:08 INFO fs.TestDFSIO: Cleaning up test files
4)使用Sort程序评测MapReduce
(1)使用RandomWriter来产生随机数,每个节点运行10个Map任务,每个Map产生大约1G大小的二进制随机数
[root@Flink105 mapreduce]# hadoop jar /opt/hadoop/module/hadoop-2.7.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar randomwriter random-data
(2)执行Sort程序
[root@Flink105 mapreduce]# hadoop jar /opt/hadoop/module/hadoop-2.7.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar sort random-data sorted-data
(3)验证数据是否真正排好序了
[root@Flink105 mapreduce]# hadoop jar /opt/hadoop/module/hadoop-2.7.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar testmapredsort -sortInput random-data -sortOutput sorted-data