准备做实时数据计算。
数据源为mysql的20张表吧。通过flume解析binlog日志,然后sink到kafka,由sparkstreaming消费,实时处理业务数据生成目标数据写到我们的mysql中。
一.mysql搭建
0. 检查是否已安装并删除已安装的包
yum list installed mysql*
yum remove mysql-community-client.x86_64 mysql-community-common.x86_64 mysql-community-devel.x86_64 mysql-community-libs.x86_64 mysql-community-libs-compat.x86_64 mysql-community-server.x86_64 mysql80-community-release.noarch
移除mysql文件夹
[root@centos1 ~]# whereis mysql
mysql: /usr/local/mysql
[root@centos1 ~]# rm -rf /usr/local/mysql
1.更新mysql源
rpm -Uvh https://repo.mysql.com/mysql80-community-release-el6.rpm
2.安装mysql
yum install -y mysql-community-server mysql-community
速度不一,请等待。。。。。。
3.添加mysql执行运行等级
chkconfig --level 2345 mysqld on
拓展:
2、3、4、5 对应不同的运行等级:
0 - halt (系统直接关机)
1 - single user mode (单人维护模式,用在系统出问题时的维护)
2 - Multi-user, without NFS (类似底下的 runlevel 3,但无 NFS 服务)
3 - Full multi-user mode (完整含有网络功能的纯文字模式)
4 - unused (系统保留功能)
5 - X11 (与 runlevel 3 类似,但加载使用 X Window)
6 - reboot (重新启动)
如果2、3、4、5 不为on, 则表示 nginx服务在运行级别为2、3、4、5时候没有启动 (即开机的时候不会启动ngnix服务)
例如
chkconfig --level 2345 nginx on 表示设置nginx 服务在运行级别为2、3、4、5时启动 (即设置开机启动nginx服务)
4.启动服务
service mysqld start
5. 查看初始化密码
cat /var/log/mysqld.log | grep password
6.我是懒得要密码了,毕竟测试
mysql> USE mysql ;
mysql> UPDATE user SET Password = password ( '' ) WHERE User = 'root' ;
mysql> flush privileges ;
mysql> quit
二、flume安装
wget http://mirrors.tuna.tsinghua.edu.cn/apache/flume/1.8.0/apache-flume-1.8.0-bin.tar.gz
tar –zxvf apache-flume-1.8.0-bin.tar.gz
mv apache-flume-1.8.0-bin /usr/local/flume
cp flume-env.sh.template flume-env.sh
三、zookeeper安装
1.1 下载
cd /usr/local/kafka
wget https://archive.apache.org/dist/zookeeper/zookeeper-3.3.6/zookeeper-3.3.6.tar.gz
tar -zxvf zookeeper-3.3.6.tar.gz
vim /etc/profile
1.2 安装
使用tar解压要安装的目录即可,以3.4.5版本为例
这里以解压到/usr/myapp,实际安装根据自己的想安装的目录修改(注意如果修改,那后边的命令和配置文件中的路径都要相应修改)
tar -zxf zookeeper-3.4.5.tar.gz -C /usr/myapp
1.3 配置
在主目录下创建data和logs两个目录用于存储数据和日志:
cd /usr/myapp/zookeeper-3.4.5
mkdir data
mkdir logs
在conf目录下新建zoo.cfg文件,写入以下内容保存:
tickTime=2000
dataDir=/usr/myapp/zookeeper-3.4.5/data
dataLogDir=/usr/myapp/zookeeper-3.4.5/logs
clientPort=2181
1.4 启动和停止
进入bin目录,启动、停止、重启分和查看当前节点状态(包括集群中是何角色)别执行:
./zkServer.sh start
./zkServer.sh stop
./zkServer.sh restart
./zkServer.sh status
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
dataDir=/usr/software/zookeeper/data
dataLogDir=/usr/software/zookeeper/logs
# the port at which the clients will connect
clientPort=2181
四、安装kafka
https://blog.csdn.net/qq_33792843/article/details/75727921
kafka的配置说明
http://www.cnblogs.com/yinchengzhe/p/5111635.html
kafka.common.KafkaException: Socket server failed to bind to centos1:9092: Unresolved address.
解决方案:https://www.cnblogs.com/yy3b2007com/p/8684974.html
kafka.admin.AdministrationException: replication factor: 1 larger than available brokers: 0
解决方案:https://blog.csdn.net/g1219371445/article/details/78828915
就是启动kafka然后创建
java.net.UnknownHostException: 主机名: 主机名: 未知的名称或服务
解决方案:https://blog.csdn.net/huanbia/article/details/69055523
测试也应该参照文档的,
0.修改/etc/hosts文件,将127.0.0.1 增加主机名称
1.bin/kafka-server-start.sh config/server.properties
2../kafka-create-topic.sh -partition 1 -replica 1 -zookeeper localhost:2181 -topic test
3.另一个端口来./kafka-console-consumer.sh -zookeeper localhost:2181 -topic test