文章目录
HADOOP伪分布式(单节点集群搭建)
hadoop官网下载地址
https://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.9.2/hadoop-2.9.2.tar.gz
最好下稳定版的,之前装的最新版出了点问题,不确定和版本之间有没有必然的关系,反正换回稳定版之后好像就少了些问题吧,记不清了。
单机集群搭建,还是官方的教程最权威
http://hadoop.apache.org/docs/r3.1.1/hadoop-project-dist/hadoop-common/SingleCluster.html
单节点就是伪分布式,官方文档:Hadoop可以在单节点上以所谓的伪分布式模式运行,此时每一个Hadoop守护进程都作为一个独立的Java进程运行。
http://hadoop.apache.org/docs/r1.0.4/cn/quickstart.html#运行Hadoop集群的准备工作
搭建完之后
格式化namenode
hdfs namenode -format
(namenode只需要格式化一次就好,不用每次都格式化,重复格式化namenode版本不一致,namenode会启动不起来,文末有解决方案)
启动namenode和datanode
start-dfs.sh
jps查看jvm进程信息
16098 NameNode
16245 DataNode
16437 SecondaryNameNode
16590 Jps
(之前是没有看到datanode的进程的,然后修改了hadoop安装目录下etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>/root/clound/hadoop/data/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/root/clound/hadoop/data/data</value>
</property>
<property>
<name>fs.default.name</name>
<value>localhost:9000</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<!--配置HDFS的权限-->
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
加入了namenode和datanode的目录,这个目录是我自己在安装目录下面手动创建的。
参考:https://blog.csdn.net/weixin_35353187/article/details/81779973)
启动ResourceManager和NodeManger
start-yarn.sh
jps
25078 ResourceManager
25271 NodeManager
13352 SecondaryNameNode
26536 Jps
17530 DataNode
13038 NameNode
然后可以测试一下系统自带的wordcount例子
先在hdfs上创建一个你自己的文件路径
hdfs dfs -mkdir -p /flower/hadoop/input
编写一个测试文件,用来之后测试里面的单词数
vim testHadoop.txt
上传到hdfs上
hadoop fs -put testHadoop.txt /flower/hadoop/input
hadoop fs -ls /flower/hadoop/input
运行hadoop自带的wordcount例子
hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar wordcount /flower/hadoop/input/testHadoop.txt /flower/hadoop/output
这里的wordcount是这个示例mappreduce的类名
最后运行成功之后的输出
[root@flower-server hadoop]# hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar wordcount /flower/hadoop/input/testHadoop.txt /flower/hadoop/output
19/01/22 21:54:54 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
19/01/22 21:54:55 INFO input.FileInputFormat: Total input files to process : 1
19/01/22 21:54:56 INFO mapreduce.JobSubmitter: number of splits:1
19/01/22 21:54:57 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
19/01/22 21:54:57 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1548165280849_0001
19/01/22 21:54:58 INFO impl.YarnClientImpl: Submitted application application_1548165280849_0001
19/01/22 21:54:58 INFO mapreduce.Job: The url to track the job: http://izbp1czpl17je74lb8g7gbz:8088/proxy/application_1548165280849_0001/
19/01/22 21:54:58 INFO mapreduce.Job: Running job: job_1548165280849_0001
19/01/22 21:55:12 INFO mapreduce.Job: Job job_1548165280849_0001 running in uber mode : false
19/01/22 21:55:12 INFO mapreduce.Job: map 0% reduce 0%
19/01/22 21:55:19 INFO mapreduce.Job: map 100% reduce 0%
19/01/22 21:55:27 INFO mapreduce.Job: map 100% reduce 100%
19/01/22 21:55:28 INFO mapreduce.Job: Job job_1548165280849_0001 completed successfully
19/01/22 21:55:28 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=184
FILE: Number of bytes written=397769
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=236
HDFS: Number of bytes written=130
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=4741
Total time spent by all reduces in occupied slots (ms)=5016
Total time spent by all map tasks (ms)=4741
Total time spent by all reduce tasks (ms)=5016
Total vcore-milliseconds taken by all map tasks=4741
Total vcore-milliseconds taken by all reduce tasks=5016
Total megabyte-milliseconds taken by all map tasks=4854784
Total megabyte-milliseconds taken by all reduce tasks=5136384
Map-Reduce Framework
Map input records=7
Map output records=14
Map output bytes=167
Map output materialized bytes=184
Input split bytes=125
Combine input records=14
Combine output records=12
Reduce input groups=12
Reduce shuffle bytes=184
Reduce input records=12
Reduce output records=12
Spilled Records=24
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=250
CPU time spent (ms)=1300
Physical memory (bytes) snapshot=366415872
Virtual memory (bytes) snapshot=4209389568
Total committed heap usage (bytes)=165810176
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=111
File Output Format Counters
Bytes Written=130
可是我这里显示的观察mapreduce job的url还是访问不了,暂时还不知道怎么回事。
[root@flower-server hadoop]# hadoop fs -ls /flower/hadoop/output
Found 2 items
-rw-r–r-- 1 root supergroup 0 2019-01-22 21:55 /flower/hadoop/output/_SUCCESS
-rw-r–r-- 1 root supergroup 130 2019-01-22 21:55 /flower/hadoop/output/part-r-00000
[root@flower-server hadoop]# hadoop fs -cat /flower/hadoop/output/part-r-00000
am 2
best! 1
comming… 1
do 1
hadoop 1
haha 1
i 2
konw? 1
the 1
you 1
我就知道我是最聪明的! 1
我是最棒的! 1
[root@flower-server hadoop]# hadoop fs -cat /flower/hadoop/input/testHadoop.txt
haha
我就知道我是最聪明的!
我是最棒的!
i am the best!
do you konw?
hadoop
i am comming…
hadoop namenode多次格式化namenode之后datanode无法启动
可以将hadoop安装目录下的etc下的core-site.xml中配置的hadoop.tmp.dir目录清空,再将hdfs-site.xml中配置的namenode和datenode目录也清空再重新格式化namenode即可。
Hadoop failed on connection exception: java.net.ConnectException: Connection
将各个配置文件中的localhost替换为真实的本机ip地址
可以使用ifconfig查看