Hadoop 1.x 伪分布式部署与测试
第一步:
安装JAK,并配置/etc/profile 文件 , 执行【$ source /etc/profile 】语句使得配置文件生效
第二步:
安装解压Hadoop-1.x ,并配置/etc/profile 文件 , 执行【 $ source /etc/profile 】语句使得配置文件生效.
(第一步第二步在上一篇,Hadoop 1.x 单机模式部署已经讲过了,创建文件夹规则也在上一篇文章 ,在此不过多累赘)
第三步:
对core-site.xml 、 hdfs-site.xml 、mapred-site.xml 、masters.xml 、slaves.xml进行进行配置
1、配置core-site.xml
<configuration> <property> <name>fs.default.name</name> <value>hdfs://hadoop-master.dragon.org:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/opt/data/tmp</value> </property> </configuration>
配置Hadoop Common Project 相关的属性配置,Hadoop-1.x 框架基础属性的配置
2、配置 hdfs-site.xml
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.permissions</name> <value>false</value> </property> </configuration>
配置HDFS Project文件系统 相关属性
3、配置mapred-site.xml
<configuration> <property> <name>mapred.job.tracker</name> <value>hadoop-master.dragon.org:9001</value> </property> </configuration>
配置与MapReduce框架相关的属性
4、配置masters.xml
将原来的改成hadoop-master.dragon.org
hadoop-master.dragon.org为主机名称,改成你的主机名称
主节点,并不是配置Hadoop中的主节点的相关信息,配置HDFS辅助节点信息
5、配置 slaves.xml
将原来的改成hadoop-master.dragon.org
hadoop-master.dragon.org为主机名称,改成你的主机名称
从节点,配置Hadoop-1.x 中HDFS和MapReduce 框架的从节点信息
第四步:
进入【 Hadoop-1.x/bin 】目录下,执行start-all.sh
执行【$ jps 】语句查看进程
[uckyk@hadoop-master hadoop-1.2.1]$ jps
19043 NameNode
19156 DataNode
19271 SecondaryNameNode
19479 TaskTracker
23176 Jps
19353 JobTracker
五个守护进程就全部完美启动成功了。
第五步:
Wordcount 测试,测试的是hadoop-1.x自带的测试用例,用来统计单词出现个数
1、在hdfs分布式文件系统上创建文件夹,命令行如下:
【$ hadoop fs -mkdir /wc】
【$ hadoop fs -mkdir /wc/input】
2、将一个要计算单词数的文件上传到hdfs分布式文件系统,博主是将xml文件上传到hdfs文件系统上去了,命令行如下:
【$ hadoop fs -put /opt/data/input/*.xml /wc/input 】
3、测试用例是 【 Hadoop-1.x 】目录下的,hadoop-examples-1.2.1.jar
4、执行【$ hadoop jar hadoop-examples-1.2.1.jar wordcount /wc/input /wc/output/ 】命令得到如下输出结果:
15/10/29 19:36:28 INFO input.FileInputFormat: Total input paths to process : 7
15/10/29 19:36:29 INFO util.NativeCodeLoader: Loaded the native-hadoop library
15/10/29 19:36:29 WARN snappy.LoadSnappy: Snappy native library not loaded
15/10/29 19:36:30 INFO mapred.JobClient: Running job: job_201510291903_0001
15/10/29 19:36:31 INFO mapred.JobClient: map 0% reduce 0%
15/10/29 19:36:57 INFO mapred.JobClient: map 28% reduce 0%
15/10/29 19:37:19 INFO mapred.JobClient: map 42% reduce 0%
15/10/29 19:37:20 INFO mapred.JobClient: map 57% reduce 0%
15/10/29 19:37:29 INFO mapred.JobClient: map 57% reduce 19%
15/10/29 19:37:35 INFO mapred.JobClient: map 85% reduce 19%
15/10/29 19:37:40 INFO mapred.JobClient: map 100% reduce 28%
15/10/29 19:37:47 INFO mapred.JobClient: map 100% reduce 100%
15/10/29 19:37:49 INFO mapred.JobClient: Job complete: job_201510291903_0001
15/10/29 19:37:49 INFO mapred.JobClient: Counters: 29
15/10/29 19:37:49 INFO mapred.JobClient: Map-Reduce Framework
15/10/29 19:37:49 INFO mapred.JobClient: Spilled Records=1164
15/10/29 19:37:49 INFO mapred.JobClient: Map output materialized bytes=10189
15/10/29 19:37:49 INFO mapred.JobClient: Reduce input records=582
15/10/29 19:37:49 INFO mapred.JobClient: Virtual memory (bytes) snapshot=15438655488
15/10/29 19:37:49 INFO mapred.JobClient: Map input records=369
15/10/29 19:37:49 INFO mapred.JobClient: SPLIT_RAW_BYTES=896
15/10/29 19:37:49 INFO mapred.JobClient: Map output bytes=20817
15/10/29 19:37:49 INFO mapred.JobClient: Reduce shuffle bytes=10189
15/10/29 19:37:49 INFO mapred.JobClient: Physical memory (bytes) snapshot=1238503424
15/10/29 19:37:49 INFO mapred.JobClient: Reduce input groups=419
15/10/29 19:37:49 INFO mapred.JobClient: Combine output records=582
15/10/29 19:37:49 INFO mapred.JobClient: Reduce output records=419
15/10/29 19:37:49 INFO mapred.JobClient: Map output records=1741
15/10/29 19:37:49 INFO mapred.JobClient: Combine input records=1741
15/10/29 19:37:49 INFO mapred.JobClient: CPU time spent (ms)=12930
15/10/29 19:37:49 INFO mapred.JobClient: Total committed heap usage (bytes)=962293760
15/10/29 19:37:49 INFO mapred.JobClient: File Input Format Counters
15/10/29 19:37:49 INFO mapred.JobClient: Bytes Read=14995
15/10/29 19:37:49 INFO mapred.JobClient: FileSystemCounters
15/10/29 19:37:49 INFO mapred.JobClient: HDFS_BYTES_READ=15891
15/10/29 19:37:49 INFO mapred.JobClient: FILE_BYTES_WRITTEN=453595
15/10/29 19:37:49 INFO mapred.JobClient: FILE_BYTES_READ=10153
15/10/29 19:37:49 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=6289
15/10/29 19:37:49 INFO mapred.JobClient: Job Counters
15/10/29 19:37:49 INFO mapred.JobClient: Launched map tasks=7
15/10/29 19:37:49 INFO mapred.JobClient: Launched reduce tasks=1
15/10/29 19:37:49 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=49873
15/10/29 19:37:49 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
15/10/29 19:37:49 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=123766
15/10/29 19:37:49 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
15/10/29 19:37:49 INFO mapred.JobClient: Data-local map tasks=7
15/10/29 19:37:49 INFO mapred.JobClient: File Output Format Counters
15/10/29 19:37:49 INFO mapred.JobClient: Bytes Written=6289
5、等到如上信息说明你已经运行成功了,查看输出结果了!
有两种方法
(1)在端口http://hadoop-master.dragon.org:50075上查看
(2)用命令行查看执行如下命令
【$ hadoop fs -text /wc/output/part*】得到如下信息:
A 11
ACL 13
AdminOperationsProtocol, 1
By 1
Capacity 2
CapacityScheduler. 1
ClientDatanodeProtocol, 1
ClientProtocol, 1
Comma 2
DatanodeProtocol, 1
Default 1
DistributedFileSystem. 1
Each 1
Fair 2
For 13
Hadoop. 1
If 9
(部分截图)
以上就是Hadoop-1.x伪分布式部署与测试的全部内容了。个别部分参考了,云帆大数据视频课程。如果有配置错误或配置问题,评论留言一起积极讨论。