文章目录
大数据环境安装
机器准备
机器配置
静态IP配置
vi /etc/sysconfig/network-scripts/ifcfg-eno1677773
BOOTPROTO=static
IPADDR=192.168.254.130
NETMASK=255.255.255.0
GATWAY=192.168.254.1
DNS1=114.114.114.114
scp /etc/sysconfig/network-scripts/ifcfg-eno16777736 192.168.254.130:/etc/sysconfig/network-scripts/ifcfg-eno16777736
scp /etc/sysconfig/network-scripts/ifcfg-eno16777736 192.168.254.131:/etc/sysconfig/network-scripts/ifcfg-eno16777736
配置130 131 的IP 地址
主机名配置
vi /etc/hosts 192.168.254.129 hadoop1 192.168.254.130 hadoop2 192.168.254.131 hadoop3
linux 无密码登录
ssh-keygen ssh-copy-id hadoop1 ssh-copy-id hadoop2 ssh-copy-id hadoop3
ntp时间同步
yum -y install ntp chkconfig ntpd on
编辑配置文件 hadoop1 hadoop2 hadoop3 都需要编辑
vim /etc/ntp.conf server cn.ntp.org.cn ntpdate cn.ntp.org.cn
service ntpd start
jdk 安装
下载 jdk-8u111-linux-x64.rpm
安装 rpm -ivh jdk-8u111-linux-x64.rpm
zookeeper安装
确定防火墙已经关闭
tar -zxvf zookeeper-3.4.10.tar.gz
cd /opt/zookeeper-3.4.10
mkdir data
mkdir logs
vi bin/zkEnv.sh
ZOO_LOG_DIR="." 修改为 ZOO_LOG_DIR="/opt/zookeeper-3.4.10/logs"
vi conf/zoo.cfg
tickTime=2000
initLimit=5
syncLimit=2
dataDir=/opt/zookeeper-3.4.10/data
clientPort=2181
server.1=host1:2888:3888
server.2=host2:2888:3888
server.3=host3:2888:3888
ssh host1 "echo 1 > /opt/zookeeper-3.4.10/data/myid"
ssh host2 "echo 2 > /opt/zookeeper-3.4.10/data/myid"
ssh host3 "echo 3 > /opt/zookeeper-3.4.10/data/myid"
/opt/zookeeper-3.4.10/bin/zkServer.sh start
zookeeper 使用
CuratorFramework client = CuratorFrameworkFactory.newClient( ZK_ADDRESS, new RetryNTimes(10, 5000) ); client.start(); System.out.println("zk client start successfully!");
// 2.Client API test
// 2.1 Create node
String data1 = "hello";
print("create", ZK_PATH, data1);
client.create().
creatingParentsIfNeeded().
forPath(ZK_PATH, data1.getBytes());
// 2.2 Get node and data
print("ls", "/");
print(client.getChildren().forPath("/"));
print("get", ZK_PATH);
print(client.getData().forPath(ZK_PATH));
// 2.3 Modify data
String data2 = "world";
print("set", ZK_PATH, data2);
client.setData().forPath(ZK_PATH, data2.getBytes());
print("get", ZK_PATH);
print(client.getData().forPath(ZK_PATH));
// 2.4 Remove node
print("delete", ZK_PATH);
client.delete().forPath(ZK_PATH);
print("ls", "/");
print(client.getChildren().forPath("/"));
print("测试不存在路径是否会报错");
print(client.checkExists().forPath("/"));
kafka安装
kafaka 使用
mysql 安装
参见文章/blog/archives/134
mysql-binlog-sync使用
hadoop 配置安装
配置
hadoop2.6.2centos 配置文件说明
参见/blog/archives/246
完整配置下载
初次启动
注意:严格按照下面的步骤
1 启动zookeeper集群(分别在host1、host2、host3上启动zk)
cd $ZOOKEEPER_HOME/bin/
./zkServer.sh start
2 启动journalnode(分别在在host1、host2、host3上执行)
cd ${HADOOP_HOME}
sbin/hadoop-daemon.sh start journalnode
3 格式化HDFS
在host1上执行命令:
bin/hdfs namenode -format
格式化后会在根据core-site.xml中的hadoop.tmp.dir配置生成个文件,这里我配置的是 H A D O O P H O M E / t m p , 然 后 将 {HADOOP_HOME}/tmp,然后将 HADOOPHOME/tmp,然后将{HADOOP_HOME}/tmp拷贝到host2的${HADOOP_HOME}下
scp -r tmp/ host2:/opt/hadoop-2.8.2/
也可以这样,建议hdfs namenode -bootstrapStandby
4 格式化ZKFC(在hadoop1上执行即可)
bin/hdfs zkfc -formatZK
5 启动HDFS(在host1上执行)
sbin/start-dfs.sh
6 启动YARN
(#####注意#####:是在hadoop3上执行start-yarn.sh,把namenode和resourcemanager分开是因为性能问题,因为他们都要占用大量资源,所以把他们分开了,他们分开了就要分别在不同的机器上启动)
sbin/start-yarn.sh
hadoop验证
1 hadoop-2.6.2配置完毕,可以统计浏览器访问:
http://hadoop1:50070
NameNode 'hadoop1:9000' (active)
http://hadoop2:50070
NameNode 'hadoop2:9000' (standby)
2 验证HDFS HA
首先向hdfs上传一个文件
hadoop fs -put /etc/profile /profile
hadoop fs -ls /
然后再kill掉active的NameNode
kill -9
通过浏览器访问:http://192.168.1.202:50070 NameNode ‘hadoop2:9000’ (active)
这个时候hadoop2上的NameNode变成了active 在执行命令:
hadoop fs -ls /
-rw-r–r-- 3 root supergroup 1926 2014-02-06 15:36 /profile 刚才上传的文件依然存在!!!
手动启动那个挂掉的NameNode
sbin/hadoop-daemon.sh start namenode
通过浏览器访问:http://192.168.1.201:50070
NameNode ‘hadoop1:9000’ (standby)
3 验证YARN: 运行一下hadoop提供的demo中的WordCount程序:
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.2.jar wordcount /profile /out
OK,大功告成!!!
测试集群工作状态的一些指令 :
bin/hdfs dfsadmin -report 查看hdfs的各节点状态信息
bin/hdfs haadmin -getServiceState nn1 获取一个namenode节点的HA状态
sbin/hadoop-daemon.sh start namenode 单独启动一个namenode进程
./hadoop-daemon.sh start zkfc 单独启动一个zkfc进程
3台主机部署安装
hadoop1 zookeeper journalnode datanode namenode zkfc resourcemanager
hadoop2 zookeeper journalnode datanode namenode zkfc resourcemanager
hadoop3 zookeeper journalnode datanode
日常启动和停止
sbin/start-dfs.sh
sbin/start-yarn.sh
sbin/stop-hdfs.sh
sbin/stop-yarn.sh
开机启动脚本
mkdir /home/scripts
vi /home/scripts/startHadoop.sh
#!/bin/bash
#zookeeper 启动
/opt/zookeeper-3.4.10/bin/zkServer.sh start
#hdfs启动
/opt/hadoop-2.8.2/sbin/start-dfs.sh
#yarn启动
/opt/hadoop-2.8.2/sbin/start-yarn.sh
chmod +x /home/scripts/startHadoop.sh