大数据zookeeper+hadoop+flink+kafka+canal应用

准备工作

服务器准备

IP主机名系统组件
192.168.213.131node1CentOS Linux release 7.5Zookeeper+hadoop (master)+flink+kafka (master)+canal
192.168.213.132node2CentOS Linux release 7.5Zookeeper+hadoop (slave)+flink+kafka (slave)
192.168.213.133node3CentOS Linux release 7.5Zookeeper+hadoop (slave)+flink+kafka (slave)

设置主机名

分别修改每台主机名称为:node1,node2,node3
然后配置hosts,具体内容如下:

[root@node1 ~]# echo 'node1' > /etc/hostname
[root@node1 ~]# vim /etc/hosts
192.168.213.131    node1
192.168.213.132    node2
192.168.213.133    node3

免密设置

分别在每台主机上生成ssh密钥:

[root@node1 ~]#  ssh-keygen -t rsa

分别copy密钥到每台主机上:

[root@node1 ~]#  ssh-copy-id node1
[root@node1 ~]#  ssh-copy-id node2
[root@node1 ~]#  ssh-copy-id node3
[root@node1 ~]#  scp ~/.ssh/authorized_keys node2:~/.ssh/
[root@node1 ~]#  scp ~/.ssh/authorized_keys node3:~/.ssh/

关闭防火墙和关闭selinux

分别关闭每台主机防火墙

[root@node1 ~]# systemctl stop firewalld && systemctl disable firewalld
[root@node1 ~]# swapoff -a && sed -i ‘/ swap / s/^(.)$/#\1/g’ /etc/fstab
[root@node1 ~]# setenforce 0 && sed -i 's/^SELINUX=./SELINUX=disabled/’ /etc/selinux/config

创建用户

分别为每台主机创建hadoop用户

[root@node1 ~]# adduser hadoop
[root@node1 ~]# usermod -g root hadoop
[root@node1 ~]# passwd hadoop

hadoop用户免密登录

分别在每台主机上生成ssh密钥:

[root@node1 ~]# su - hadoop 
[hadoop@node1 ~]#  ssh-keygen -t rsa

分别copy密钥到每台主机上:

[hadoop@node1 ~]#  ssh-copy-id node1
[hadoop@node1 ~]#  ssh-copy-id node2
[hadoop@node1 ~]#  ssh-copy-id node3
[hadoop@node1 ~]#  scp ~/.ssh/authorized_keys node2:~/.ssh/
[hadoop@node1 ~]#  scp ~/.ssh/authorized_keys node3:~/.ssh/

环境搭建

安装java环境(root用户)

切换到root用户,分别在每台主机安装java环境,采用yum安装,选择devel版本,

[hadoop@node1 ~]# su root
[root@node1 ~]# yum search java-1.8.0
[root@node1 ~]# yum install -y  java-1.8.0-openjdk-devel.x86_64
[root@node1 ~]# java -version
[root@node1 ~]# vim /etc/profile

分别在配置环境变量,采用scp同步环境变量配置

[root@node1 ~]# vim /etc/profile
# JAVA_HOME
export JAVA_HOME=/usr/lib/jvm/java
export CLASSPATH=.:${JAVA_HOME}/jre/lib/rt.jar:${JAVA_HOME}/lib/dt.jar:${JAVA_HOME}/lib/tools.jar
export PATH=$PATH:${JAVA_HOME}/bin
[root@node1 ~]#  soure /etc/profile
[root@node1 ~]#  scp /etc/profile node2:/etc/profile
[root@node1 ~]#  ssh node2 soure /etc/profile
[root@node1 ~]#  scp /etc/profile node3:/etc/profile
[root@node1 ~]#  ssh node3 soure /etc/profile

安装Zookeeper3.6.4(hadoop用户)

采用阿里镜像站下载安装包:https://developer.aliyun.com/mirror/

  1. 下载解压后重命名为zookeeper
[hadoop@node1 ~]# wget https://mirrors.aliyun.com/apache/zookeeper/zookeeper-3.6.4/apache-zookeeper-3.6.4-bin.tar.gz
[hadoop@node1 ~]# tar -zxvf apache-zookeeper-3.6.4-bin.tar.gz
[hadoop@node1 ~]# mv apache-zookeeper-3.6.4-bin zookeeper
  1. 将/home/hadoop/zookeeper/conf/ 路径下的 zoo_sample.cfg 修改为 zoo.cfg;
[hadoop@node1 ~]# cp zookeeper/conf/zoo_sample.cfg zookeeper/conf/zoo.cfg 
  1. 修改 zoo.cfg 文件配置
[hadoop@node1 ~]# vim zookeeper/conf/zoo.cfg 
# 修改数据保存位置
dataDir=/home/hadoop/zookeeper/tmp
# 添加集群节点
server.1=node1:2888:3888
server.2=node1:2888:3888
server.3=node1:2888:3888
  1. 创建数据目录及myid(每台的配置myid不同,请自行修改)
[hadoop@node1 ~]# mkdir zookeeper/tmp
[hadoop@node1 ~]# vim mkdir zookeeper/tmp/myid
# myid 集群中每台配置的id值不能相同
1
  1. 同步应用到其他主机
[hadoop@node1 ~]# scp -r zookeeper node2:~/
[hadoop@node1 ~]# scp -r zookeeper node3:~/
  1. 启动服务(每台设备执行)
[hadoop@node1 ~]#  ./zookeeper/bin/zkServer.sh start
[hadoop@node1 ~]#  ssh node2
[hadoop@node2 ~]#  echo '2' > zookeeper/tmp/myid
[hadoop@node2 ~]#  ./zookeeper/bin/zkServer.sh start
[hadoop@node2 ~]#  exit
[hadoop@node1 ~]#  ssh node3
[hadoop@node3 ~]#  echo '3' > zookeeper/tmp/myid
[hadoop@node3 ~]#  ./zookeeper/bin/zkServer.sh start
  1. 查看进程是否启动
[hadoop@node1 ~]#  jps
4020 Jps
4001 QuorumPeerMain
  1. 查看状态
[hadoop@node1 ~]#  ./zookeeper/bin/zkServer.sh status

安装hadoop2.10.1(hadoop用户)

采用阿里镜像站下载安装包:https://developer.aliyun.com/mirror/

  1. 下载解压后重命名为zookeeper
[hadoop@node1 ~]# wget https://mirrors.aliyun.com/apache/hadoop/core/hadoop-2.10.1/hadoop-2.10.1.tar.gz
[hadoop@node1 ~]# tar -zxvf hadoop-2.10.1.tar.gz
[hadoop@node1 ~]# mv hadoop-2.10.1 hadoop
  1. 创建数据储存目录
[hadoop@node1 ~]# mkdir -p hadoop/hdfs/tmp
[hadoop@node1 ~]# mkdir -p hadoop/hdfs/name
[hadoop@node1 ~]# mkdir -p hadoop/hdfs/data
  1. 修改配置文件
[hadoop@node1 ~]# vim hadoop/etc/hadoop/core-site.xml
<property>
	<name>hadoop.tmp.dir</name>
	<value>file:/home/hadoop/hadoop/hdfs/tmp</value>
</property>
<property>
	<name>io.file.buffer.size</name>
	<value>131072</value>
</property>
<property>
	<name>fs.defaultFS</name>
	<value>hdfs://node1:9000</value>
</property>
[hadoop@node1 ~]# vim hadoop/etc/hadoop/hdfs-site.xml
<property>
	<name>dfs.replication</name>
	<value>2</value>
</property>
<property>
	<name>dfs.namenode.name.dir</name>
	<value>file:/home/hadoop/hadoop/hdfs/name</value>
	<final>true</final>
</property>
<property>
	<name>dfs.datanode.data.dir</name>
	<value>file:/home/hadoop/hadoop/hdfs/data</value>
	<final>true</final>
</property>
<property>
	<name>dfs.namenode.secondary.http-address</name>
	<value>node1:9001</value>
</property>
<property>
	<name>dfs.webhdfs.enabled</name>
	<value>true</value>
</property>
<property>
	<name>dfs.permissions</name>
	<value>false</value>
</property>
  1. 添加java环境变量
[hadoop@node1 ~]# vim hadoop/etc/hadoop/hadoop-env.sh
# 添加java环境变量
export JAVA_HOME=/usr/lib/jvm/java
[hadoop@node1 ~]# vim hadoop/etc/hadoop/yarn-env.sh
# 添加java环境变量
export JAVA_HOME=/usr/lib/jvm/java
[hadoop@node1 ~]# vim hadoop/etc/hadoop/slaves
# 从主机地址,新版文件名为works
node2
node3
  1. 同步应用到其他主机
[hadoop@node1 ~]# scp -r hadoop node2:~/
[hadoop@node1 ~]# scp -r hadoop node3:~/
  1. 添加环境变量(切换root用户)
[hadoop@node1 ~]# su root
[root@node1 ~]# vim /etc/profile
# HADOOP_HOME
export HADOOP_HOME=/home/hadoop/hadoop
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
[root@node1 ~]# soure /etc/profile
[root@node1 ~]# scp /etc/profile node2:~/etc/profile
[root@node1 ~]# ssh node2 soure /etc/profile
[root@node1 ~]# scp /etc/profile node3:~/etc/profile
[root@node1 ~]# ssh node3 soure /etc/profile
  1. 启动服务
[hadoop@node1 ~]# hadoop namenode -format
[hadoop@node1 ~]# mr-jobhistory-daemon.sh start historyserver
[hadoop@node1 ~]# start-all.sh
#查看数据节点情况
[hadoop@node1 ~]# hadoop dfsadmin -report
  1. 停止服务(选用)
[hadoop@node1 ~]# stop-all.sh

安装flink(hadoop用户)

  1. 下载解压后重命名为flink
[hadoop@node1 ~]# wget https://mirrors.aliyun.com/apache/flink/flink-1.12.5/flink-1.12.5-bin-scala_2.12.tgz
[hadoop@node1 ~]# tar -xvf flink-1.12.5-bin-scala_2.12.tgz
[hadoop@node1 ~]# mv flink-1.12.5-bin-scala_2.12 flink
  1. 配置flink-conf.yml
[hadoop@node1 ~]# vim fink/conf/flink-conf.yml
jobmanager.rpc.address: node1
high-availability: zookeeper
high-availability.storageDir: hdfs:///flink/ha/
high-availability.zookeeper.quorum: node1:2181,node1:2181,node1:2181
taskmanager.numberOfTaskSlots: 2 #等于cpu个数
  1. 配置masters和slaves
[hadoop@node1 ~]# vim fink/conf/masters
master:8081
[hadoop@node1 ~]# vim fink/conf/slaves
node2
node3
  1. 拷贝到其他机器
[hadoop@node1 ~]# scp -r fink node2:~/
[hadoop@node1 ~]# scp -r fink node3:~/
  1. 每个节点启动服务
[hadoop@node1 ~]# ./flink/bin/start-cluster.sh
[hadoop@node2 ~]# ./flink/bin/start-cluster.sh
[hadoop@node3 ~]# ./flink/bin/start-cluster.sh
  1. 访问服务及应用测试
    游览器访问:http://192.168.213.131:8081/
    使用 submit new job 功能上传案列: flink/examples/streaming/wordCount.jar

安装kafka(hadoop用户)

  1. 下载解压后重命名为kafka
[hadoop@node1 ~]# wget https://mirrors.aliyun.com/apache/kafka/3.0.0/kafka_2.12-3.0.0.tgz
[hadoop@node1 ~]# tar -xvf kafka_2.12-3.0.0.tgz
[hadoop@node1 ~]# mv kafka_2.12-3.0.0 kafka
  1. 配置server.properties
[hadoop@node1 ~]# vim kafka/conf/server.properties
#从1开始,每个节点一个值,不能重复
broker.id=1 
log.dirs=/home/hadoop/kafka/logs
zookeeper.connect=node1:2181,node2:2181,node3:2181
#机器数量
num.parititions=3
listeners=PLAINTEXT://:9092
advertised.listeners=PLAINTEXT://node1:9092
  1. 拷贝到其他机器
[hadoop@node1 ~]# scp -r kafka node2:~/
[hadoop@node1 ~]# scp -r kafka node3:~/
  1. 每个节点启动服务
[hadoop@node1 ~]# ./kafka/bin/kafka-server-start.sh -daemon ./kafka/config/server.properties
[hadoop@node2 ~]# ./kafka/bin/kafka-server-start.sh -daemon ./kafka/config/server.properties
[hadoop@node3 ~]# ./kafka/bin/kafka-server-start.sh -daemon ./kafka/config/server.properties
  1. 停止服务(选用)
[hadoop@node1 ~]# ./kafka/bin/kafka-server-stop.sh
  1. 创建和查询节点
[hadoop@node1 ~]# ./kafka/bin/kafka-topics.sh --bootstarp-server node1:9092 --topic --create zljd
[hadoop@node1 ~]# ./kafka/bin/kafka-topics.sh --bootstarp-server node1:9092 --topic --list 

安装canal(hadoop用户)

  1. 修改mysql配置开启log-bin并重启服务
[hadoop@node1 ~]# vim /etc/my.conf
[mysqld]
log-bin=mysql-bin # 开启 binlog
binlog-format=ROW # 选择 ROW 模式
server_id=1 # 配置 MySQL replaction 需要定义,不要和 canal 的 slaveId 重复
  1. 下载解压为canal
[hadoop@node1 ~]# wget https://github.com/alibaba/canal/releases/download/canal-1.1.5/canal.deployer-1.1.5.tar.gz
[hadoop@node1 ~]# mkdir canal
[hadoop@node1 ~]# tar -zxvf canal.deployer-1.1.5.tar.gz -C canal
  1. 修改全局配置canal.properties
[hadoop@node1 ~]# vim canal/conf/canal.properties
#运行模式,默认tcp,支持kafka、mq等
canal.serverMode = kafka
  1. 修改实例配置canal.properties
[hadoop@node1 ~]# vim canal/conf/example/instance.properties
#数据库地址及账号密码
canal.instance.dbUsername=root
canal.instance.dbPassword=zwp123456+
canal.instance.master.address=node1:3306
#kafka节点名称
canal.mq.topic=zljd
  1. 启动服务
[hadoop@node1 ~]# ./canal/bin/startup.sh
  1. 停止服务
[hadoop@node1 ~]# ./canal/bin/stop.sh

canal+flink应用demo

官方文档地址:https://nightlies.apache.org/flink/flink-docs-release-1.12

  1. 使用maven快速构建应用,修改DarchetypeVersion为指定1.12.5版本
mvn archetype:generate -DarchetypeGroupId=org.apache.flink -DarchetypeArtifactId=flink-quickstart-java -DarchetypeVersion=1.12.5 -DgroupId=com.zwp -DartifactId=flink-java -Dversion=1.0 -Dpackage=com.zwp
  1. 修改pom.xml支持kafka和mysql
<dependency>
	<groupId>org.apache.flink</groupId>
	<artifactId>flink-connector-kafka_${scala.binary.version}</artifactId>
	<version>${flink.version}</version>
</dependency>
<dependency>
	<groupId>mysql</groupId>
	<artifactId>mysql-connector-java</artifactId>
	<version>8.0.31</version>
</dependency>
<dependency>
	<groupId>com.alibaba.otter</groupId>
	<artifactId>canal.protocol</artifactId>
	<version>1.1.6</version>
</dependency>
  1. 修改StreamingJob.java入口文件方法
public static void main(String[] args) throws Exception {
    // 1.获取运行环境
    final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
    // 2.配置kafka连接
	Properties properties = new Properties();
	properties.setProperty("bootstrap.servers", "node1:9092");
	properties.setProperty("group.id", "flink");
	FlinkKafkaConsumer<String> kafkaConsumer = new FlinkKafkaConsumer<>("zljd", new SimpleStringSchema(), properties);
	DataStream<String> stream = env.addSource(kafkaConsumer);
    // 3.配置接收kafka消息的mysql处理sink
	stream.addSink(new MysqlSink());
	// execute program
	env.execute("Flink Streaming Java API Skeleton");
}
  1. 新增mysql处理任务MysqlSink.java
package com.lzm.sink;

import com.alibaba.fastjson2.JSON;
import com.alibaba.otter.canal.protocol.FlatMessage;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.functions.sink.RichSinkFunction;

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.util.List;
import java.util.Map;

public class MysqlSink extends RichSinkFunction<String> {

    private Connection connection;
    private PreparedStatement preparedStatement;

    @Override
    public void open(Configuration parameters) throws Exception {
        super.open(parameters);
        // 加载数据库驱动
        Class.forName("com.mysql.cj.jdbc.Driver");
        // 获取连接
        connection = DriverManager.getConnection("jdbc:mysql://node1:3306/zljd", "root", "zwp2018");
    }

    @Override
    public void close() throws Exception {
        super.close();
        if (connection != null) {
            connection.close();
        }
    }

    @Override
    public void invoke(String message, Context context) throws Exception {
        try {
            System.out.println("FlatMessage:"+message);
            FlatMessage value = null;
            if (JSON.isValid(message)){
                value = JSON.parseObject(message, FlatMessage.class);
            }
            if(value == null || !"zljd".equals(value.getDatabase()) || !"zljk_bd_check_unit_ability".equals(value.getTable())){
                return;
            }
            List<Map<String, String>> data = value.getData();
            List<Map<String, String>> old = value.getOld();
            // 拼接SQL, 执行SQL
            switch (value.getType()){
                case "INSERT":
                    preparedStatement = connection.prepareStatement("INSERT INTO zljk_bd_check_unit (id, ability) VALUES (?, ?)");
                    preparedStatement.setString(1, data.get(0).get("id"));
                    preparedStatement.setString(2, data.get(0).get("ability"));
                    preparedStatement.executeUpdate();
                    break;
                case "UPDATE":
                    preparedStatement = connection.prepareStatement("UPDATE zljk_bd_check_unit SET id=?,ability=? WHERE id=?");
                    String id = data.get(0).get("id");
                    if(old != null && old.size() > 0 && old.get(0).get("id") != null){
                        id = old.get(0).get("id");
                    }
                    preparedStatement.setString(1, data.get(0).get("id"));
                    preparedStatement.setString(2, data.get(0).get("ability"));
                    preparedStatement.setString(3, id);
                    preparedStatement.executeUpdate();
                    break;
                case "DELETE":
                    preparedStatement = connection.prepareStatement("DELETE FROM zljk_bd_check_unit WHERE id=?");
                    preparedStatement.setString(1, data.get(0).get("id"));
                    preparedStatement.executeUpdate();
                    break;
                default:
            }
            if (preparedStatement != null) {
                preparedStatement.close();
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}
  1. 打包flink应用
    修改pom.xml文件,注释scope使用mvn打包为jar包:
    clean+compile+package
<dependency>
	<groupId>org.apache.flink</groupId>
	<artifactId>flink-java</artifactId>
	<version>${flink.version}</version>
	<!--<scope>provided</scope>-->
</dependency>
<dependency>
	<groupId>org.apache.flink</groupId>
	<artifactId>flink-streaming-java_${scala.binary.version}</artifactId>
	<version>${flink.version}</version>
	<!--<scope>provided</scope>-->
</dependency>
<dependency>
	<groupId>org.apache.flink</groupId>
	<artifactId>flink-clients_${scala.binary.version}</artifactId>
	<version>${flink.version}</version>
	<!--<scope>provided</scope>-->
</dependency>
  1. 上传jar包到flink运行
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值