3一、虚拟机以及系统的搭建、网络的设置。
1.装VMware(我用的就是11的)
下载地址: https://www.fixdown.com/vmware.htm
2.下载centos镜像 (我用的是centos7)
下载地址:https://www.centos.org/download/
3.把镜像文件加入到虚拟机中,发现问题(我用的是thinkpad)
问题1:
解决方案:https://zhidao.baidu.com/question/440691460.html
问题2:设置了静态ip重启后ip依然改变,
解决方案:https://blog.csdn.net/springyh/article/details/79001714
问题3:主机名称明明设置成了CO002可是一直显示CO003,设置后已经重启过了也没用。
解决方案:重新设置,然后多重启几次。具体原因至今也不太清楚。希望大牛看到能解决一下。
二、jdk的安装以及免密登录设置,以及hadoop的安装配置。
https://www.linuxidc.com/Linux/2017-03/142051.htm
https://blog.csdn.net/u010270403/article/details/51446674
sz\rz安装:
// 安装lrzsz # yum -y install lrzsz
三:集群搭建,
namenode启动不起来的问题:
https://blog.csdn.net/u011118321/article/details/58672243
一、启动集群 :https://www.2cto.com/net/201705/635434.html
http://www.cnblogs.com/jun1019/p/6240770.html
一、启动方式:
注意:严格按照下面的步骤
1、启动zookeeper集群(分别在装了zookeeper的集群上如CO000、CO001、CO002上启动zk)
cd /weekend/zookeeper-3.4.5/bin/
./zkServer.sh start
#查看状态:一个leader,两个follower
./zkServer.sh status
2.在所有的节点上!!,启动journalnode,
cd /weekend/hadoop-2.4.1
sbin/hadoop-daemon.sh start journalnode
3.格式化namenode
//在两台namenode节点上分别执行命令:
hdfs namenode -format
如果报错显示HA连接拒绝,具体报错如下就是请在所有节点执行2步骤
Call From CO000/192.168.56.130 to CO000:8485 failed on connection exception: java.net.ConnectException: Connection refused;ERROR namenode.NameNode: Failed to start namenode
4、格式化ZKFC(在自己的含有hadoop和zk的节点一台机器上如CO000上执行即可)
hdfs zkfc -formatZK
5、启动HDFS(在CO000上执行)
sbin/start-dfs.sh
6、启动YARN
sbin/start-yarn.sh
(#####注意#####:一般是RM和nn是分开的,我没有分开是在CO000上执行start-yarn.sh,把namenode和resourcemanager分开是因为性能问题,因为他们都要占用大量资源,他们分开了就要分别在不同的机器上启动)
上述命令只能启动当前的yarn的resourcemanager,另一台需要单独启动
使用命令
cd sbin
./yarn-daemon.sh start resourcemanager
7、第一次如果顺利启动,并且正常,以后只要按如下顺序启动即可
1、启动zookeeper
2、启动journalnode
3、启动dfs
4、启动yarn
二、停止集群。
如果某一个NameNode进程挂掉了的话,就使用如下命令单独启动一个NameNode:
/root/apps/hadoop/sbin/hadoop-daemon.sh start namenode
================================================================
下面是停止Hadoop的HA集群的流程:
第一步,在CO000机器上停止HDFS:
[root@CO000 ~]# /root/apps/hadoop/sbin/stop-dfs.sh
第二步,在CO000机器上停止YARN:
[root@CO000 ~]# /root/apps/hadoop/sbin/stop-yarn.sh
第三步,在CO001机器上单独停止ResourceManager:
[root@CO001 ~]# /root/apps/hadoop/sbin/yarn-daemon.sh stop resourcemanager
第四步,在CO000机器上停止Zookeeper:
[root@CO000 ~]# /root/apps/zookeeper/bin/zkServer.sh stop
第五步,在CO001机器上停止Zookeeper:
[root@CO001 ~]# /root/apps/zookeeper/bin/zkServer.sh stop
第六步,在CO002机器上停止Zookeeper:
[root@CO002 ~]# /root/apps/zookeeper/bin/zkServer.sh stop
hadoop集群安装问题:Hadoop错误8_dfsadmin -report后livenodes为0 或 报错could only be replicated to 0 nodes, instead of 1
解决方案:https://blog.csdn.net/wang_zhenwei/article/details/48027063
maven安装:https://blog.csdn.net/cs4380/article/details/79158268
intellij idea安装:https://blog.csdn.net/yangying496875002/article/details/73603303#
flume安装:https://blog.csdn.net/bingoxubin/article/details/78611032
https://blog.csdn.net/jiangsanfeng1111/articlebendetails/53337887
flume读取本地日志文件往hdfs写:https://blog.csdn.net/lmh94604/article/details/51513517
# example.conf: A single-node Flume configuration
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = tailDir
a1.sources.r1.spoolDir =/soft/workspace/log_receive/log_receive/target/log
# Describe the sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://cluster1/flume/%Y-%m-%d/%H%M
a1.sinks.k1.hdfs.rollCount = 0
a1.sinks.k1.hdfs.rollInterval = 60
a1.sinks.k1.hdfs.rollSize = 10240
a1.sinks.k1.hdfs.idleTimeout = 3
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.useLocalTimeStamp = true
a1.sinks.k1.hdfs.round = true
a1.sinks.k1.hdfs.roundValue = 5
a1.sinks.k1.hdfs.roundUnit = minute
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
tail模式写入:https://blog.csdn.net/weixin_39478115/article/details/78884902
a1.sources = r3
a1.channels = c3
a1.sinks = k3
### define sources
a1.sources.r3.type = TAILDIR
#a1.sources.r3.spoolDir = /home/hadoop/app/flumelog/gps
a1.sources.r3.positionFile = /opt/software/flume-1.6.0-cdh5.11.0/source/gps/taildir_position.json
a1.sources.r3.filegroups = f1
a1.sources.r3.filegroups.f1 = /soft/workspace/log_receive/log_receive/target/log/debug*
a1.sources.r3.headers.f1.headerKey1 = value1
a1.sources.r3.fileHeader = true
### define channels
a1.channels.c3.type = file
a1.channels.c3.checkpointDir = /home/hadoop/app/flumelog/log/logdfstmp/point
a1.channels.c3.dataDirs = /home/hadoop/app/flumelog/log/logdfstmp
### define sink
a1.sinks.k3.type = hdfs
a1.sinks.k3.hdfs.path = hdfs://cluster1/flume/%Y-%m-%d/%H%M
a1.sinks.k3.hdfs.fileType = DataStream
a1.sinks.k3.hdfs.writeFormat = Text
a1.sinks.k3.hdfs.useLocalTimeStamp = true
a1.sinks.k3.hdfs.rollInterval = 14400
a1.sinks.k3.hdfs.rollSize = 0
a1.sinks.k3.hdfs.rollCount = 0
a1.sinks.k3.hdfs.idleTimeout = 0
a1.sinks.k3.hdfs.filePrefix = %Y%m%d_
a1.sinks.k3.hdfs.fileSuffix = .log
### bind the soures and sink to the channel
a1.sources.r3.channels = c3
a1.sinks.k3.channel = c3
运行flume的命令
flume-ng agent --conf conf --conf-file localToHdfs.conf --name a1 -Dflume.root.logger=INFO
flume写入异常:
Spool Directory source source1: { spoolDir: /usr/local/logs }: Uncaught exception in SpoolDirectorySource thread. Restart or reconfigure Flume to continue processing.
解决方案:
https://blog.csdn.net/wangxj_2017/article/details/76738076
Kafka安装:https://blog.csdn.net/zxy987872674/article/details/72466504
kafka常用命令:https://blog.csdn.net/zxy987872674/article/details/72493128
测试:https://blog.csdn.net/gongxinju/article/details/53415051
mysql安装:https://www.cnblogs.com/bigbrotherer/p/7241845.html
mysql重启:https://zhidao.baidu.com/question/1112387292109629179.html
hive安装:https://www.cnblogs.com/xieyulin/p/7050545.html
https://www.cnblogs.com/kinginme/p/7233315.html
mvn如何打jar包(并添加依赖jar包)技术详解
https://blog.csdn.net/leone911/article/details/50347375
创建spark(HelloWorld)项目:https://blog.csdn.net/qq_21586623/article/details/71403428