环境准备(7步,三台机器都要做):
准备好3台虚拟机,搭建nginx和kafka集群
最好实用新的虚拟机,有些环境配置可能会相互影响
修改主机名(方便管理标明序号1,2,3)
编辑/etc/hostname下的主机名文件,三台机器分别改成nginx-kafka01,nginx-kafka02,nginx-kafka03,这里属于永久修改
[root@nginx-kafka01 ~]# vim /etc/hostname
修改完记得重启一下
[root@nginx-kafka01 ~]# reboot
配置好静态ip地址,三台ip要不一样,这里只列举了kafka01
[root@nginx-kafka01 ~]# cat /etc/sysconfig/network-scripts/ifcfg-ens33
TYPE=Ethernet
BOOTPROTO=static
DEFROUTE=yes
NAME=ens33
UUID=d4146a30-e3f2-46a2-b544-f5306b3986e1
DEVICE=ens33
ONBOOT=yes
IPADDR=192.168.254.131
NETMASK=255.255.255.0
GATEWAY=192.168.254.2
DNS1=114.114.114.114
配置好dns
[root@nginx-kafka01 ~]# cat /etc/resolv.conf
# Generated by NetworkManager
nameserver 114.114.114.114
每一台机器上都写好域名解析,三台都要
[root@nginx-kafka01 ~]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.254.131 nginx-kafka01
192.168.254.132 nginx-kafka02
192.168.254.133 nginx-kafka03
安装基本软件,三台都要
[root@nginx-kafka01 ~]# yum install wget lsof vim -y
安装时间同步服务
[root@nginx-kafka01 ~]# yum -y install chrony
设置开机自启
[root@nginx-kafka01 ~]# systemctl enable chronyd
启动
[root@nginx-kafka01 ~]# systemctl start chronyd
关闭防火墙,三台都要
[root@nginx-kafka01 ~]# systemctl stop firewalld
[root@nginx-kafka01 ~]# systemctl disable firewalld
关闭selinux(安全相关的子系统,规矩非常多),打开/etc/selinux/config的文件设置SELINUX=disabled,详情如下:
[root@nginx-kafka01 ~]# cat /etc/selinux/config
# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
# enforcing - SELinux security policy is enforced.
# permissive - SELinux prints warnings instead of enforcing.
# disabled - No SELinux policy is loaded.
SELINUX=disabled
# SELINUXTYPE= can take one of three values:
# targeted - Targeted processes are protected,
# minimum - Modification of targeted policy. Only selected processes are protected.
# mls - Multi Level Security protection.
SELINUXTYPE=targeted
selinux配置完毕需要重启机器
[root@nginx-kafka01 ~]# reboot
getenforce可以查看selinux设置是否完成
[root@nginx-kafka01 ~]# getenforce
Disabled
========================================
========================================
环境准备完毕开始nginx搭建,可以三台一起,电脑配置不行的话选一台搭建nginx就行
安装相关软件:
[root@nginx-kafka01 ~]# yum install epel-release -y
[root@nginx-kafka01 ~]# yum install nginx -y
启动:
[root@nginx-kafka01 ~]# systemctl start nginx
设置开机自启:
[root@nginx-kafka01 ~]# systemctl enable nginx
编辑配置文件/etc/nginx/nginx.conf:
将
listen 80 default_server;
修改成:
listen 80;
这里新的版本里是正常的,没有default_server可以不用修改
在/etc/nginx/conf.d下创建sc.conf文件,在/var/log/nginx下创建sc文件
[root@nginx-kafka01 ~]# mkdir -p /etc/nginx/conf.d/sc.conf
[root@nginx-kafka01 ~]# mkdir /var/log/nginx/sc
打开/etc/nginx/conf.d下创建sc.conf的文件输入以下内容
[root@nginx-kafka01 ~]# vim /etc/nginx/conf.d/sc.conf
server {
listen 80 default_server;
server_name www.sc.com;
root /usr/share/nginx/html;
access_log /var/log/nginx/sc/access.log main;
location / {
}
}
然后nginx语法检测一下,显示正常:
[root@nginx-kafka01 ~]# nginx -t
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful
重新加载nginx
[root@nginx-kafka01 ~]# nginx -s reload
========================================
========================================
下面准备kafka的集群搭建:
1.安装java:
[root@nginx-kafka01 ~]# yum install java wget -y
进入/opt安装kafka:
[root@nginx-kafka01 opt]# wget https://archive.apache.org/dist/kafka/2.8.1/kafka_2.12-2.8.1.tgz
解包:
[root@nginx-kafka01 opt]# tar xf kafka_2.12-2.8.1.tgz
使用自带的zookeeper集群配置
进入/opt安装zookeeper:
[root@nginx-kafka01 opt]# wget https://dlcdn.apache.org/zookeeper/zookeeper-3.7.1/apache-zookeeper-3.7.1.tar.gz
解包:
[root@nginx-kafka01 opt]# tar xf apache-zookeeper-3.7.1.tar.gz
2.配置kafka
修改/opt/kafka_2.12-2.8.1/config/server.properties文件,加下面三个值修改成对应的:
broker.id=0 # 固定
listeners=PLAINTEXT://nginx-kafka01:9092 # 三台机器改成自己对应的主机名称
zookeeper.connect=192.168.254.131:2181,192.168.254.132:2181,192.168.254.133:2181 # ip写自己的,三台机器都这么写
配置zk
进入/opt/apache-zookeeper-3.7.1/confs
首先拷贝zoo_sample.cfg到新建文件zoo.cfg防止更改错误
[root@nginx-kafka01 confs]# cp zoo_sample.cfg zoo.cfg
修改zoo.cfg文件, 在最底部添加如下三行:
server.1=192.168.254.131:3888:4888
server.2=192.168.254.132:3888:4888
server.3=192.168.254.133:3888:4888
三台机器都需配置,是为了让集群能够顺利找到对应机器
3888和4888都是端口 一个用于数据传输,一个用于检验存活性和选举
创建/tmp/zookeeper目录 ,在目录中添加myid文件,文件内容就是本机指定的zookeeper id内容
如:在192.168.254.131机器上
[root@nginx-kafka01 zookeeper]echo 1 > /tmp/zookeeper/myid
三台机器编号不同,其他两台1改成2和3
启动zookeeper:
[root@nginx-kafka01 apache-zookeeper-3.7.1]# bin/zkServer.sh start
开启zk和kafka的时候,一定是先启动zk,再启动kafka
关闭服务的时候,kafka先关闭,再关闭zk
[root@nginx-kafka03 apache-zookeeper-3.7.1]# bin/zkServer.sh status
/usr/bin/java
ZooKeeper JMX enabled by default
Using config: /opt/apache-zookeeper-3.7.1/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost. Client SSL: false.
Mode: leader
三台机器起来是有一台 leader和两台follower
在/opt/kafka_2.12-2.8.1目录下启动kafka,其他几台同理:
[root@nginx-kafka01 kafka_2.12-2.8.1]# bin/kafka-server-start.sh -daemon config/server.properties
zookeeper使用:
在/opt/apache-zookeeper-3.7.1下进入kafka集群管理
[root@nginx-kafka01 apache-zookeeper-3.7.1]# bin/zkCli.sh
查看集群是否三台机器都以加入,在这里路径都只认绝对路径:
[zk: localhost:2181(CONNECTED) 2] ls /brokers/ids
[1, 2, 3]
这个时候kafka集群已经搭建完毕,下面是测试:
创建topic
bin/kafka-topics.sh --create --zookeeper 192.168.254.131:2181 --replication-factor 1 --partitions 1 --topic sc
查看topic
/opt/kafka_2.12-2.8.1/bin/kafka-topics.sh --list --zookeeper 192.168.254.131:2181
创建生产者
[root@nginx-kafka01 kafka_2.12-2.8.1]# bin/kafka-console-producer.sh --broker-list 192.168.254.131:9092 --topic sc
>hello
>sanchuang tongle
>nihao
>world !!!!!!1
>
创建消费者
[root@nginx-kafka01 kafka_2.12-2.8.1]# bin/kafka-console-consumer.sh --bootstrap-server 192.168.254.131:9092 --topic sc --from-beginning
只有生产者生产,消费者才能消费
========================================
========================================
filebeat(轻量级的日志收集工具)部署
1.安装
[root@nginx-kafka01 kafka_2.12-2.8.1]# rpm --import https://packages.elastic.co/GPG-KEY-elasticsearch
编辑 /etc/yum.repos.d/fb.repo
[root@nginx-kafka01 kafka_2.12-2.8.1]# vim /etc/yum.repos.d/fb.repo
[elastic-7.x]
name=Elastic repository for 7.x packages
baseurl=https://artifacts.elastic.co/packages/7.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md
安装filebeat
[root@nginx-kafka01 kafka_2.12-2.8.1]# yum install filebeat -y
2.设置开机自启
[root@nginx-kafka01 kafka_2.12-2.8.1]#systemctl enable filebeat
修改配置文件/etc/filebeat/filebeat.yml内容如下:
filebeat.inputs:
- type: log
# Change to true to enable this input configuration.
enabled: true
# Paths that should be crawled and fetched. Glob based paths.
paths:
- /var/log/nginx/sc/access.log
#==========------------------------------kafka-----------------------------------
output.kafka:
hosts: ["192.168.229.139:9092","192.168.229.140:9092"]
topic: nginxlog
keep_alive: 10s
这里要确保/etc/nginx/conf.d/sc.conf 的访问日志文件路径和/etc/filebeat/filebeat.yml的paths路径一样,否则日志将读取不出数据
确保/var/log/nginx/sc/access.log 文件存在
4.创建主题nginxlog
[root@nginx-kafka01 kafka_2.12-2.8.1]# bin/kafka-topics.sh --create --zookeeper 192.168.77.132:2181 --replication-factor 3 --partitions 1 --topic nginxlog
5.启动服务:
[root@nginx-kafka01 kafka_2.12-2.8.1]# systemctl start filebeat
6.访问本地ip,生产者产生数据,消费者消费,同时注意生产消费者需要把topic 改成创建的主题nginxlog
========================================
========================================
使用python实现日志文件数据入库:
import json
import requests
import time
import pymysql
# 连接数据库
db = pymysql.connect(
host="192.168.254.131", # mysql主机ip
user="sctl", # 用户名
passwd="123456", # 密码
database="flask" # 数据库
)
taobao_url = "https://ip.taobao.com/outGetIpInfo?accessKey=alibaba-inc&ip="
# 查询ip地址的信息(省份和运营商isp),通过taobao网的接口
def resolv_ip(ip):
response = requests.get(taobao_url + ip)
if response.status_code == 200:
tmp_dict = json.loads(response.text)
prov = tmp_dict["data"]["region"]
isp = tmp_dict["data"]["isp"]
return prov, isp
return None, None
# 将日志里读取的格式转换为我们指定的格式
def trans_time(dt):
# 把字符串转成时间格式
timeArray = time.strptime(dt, "%d/%b/%Y:%H:%M:%S")
# timeStamp = int(time.mktime(timeArray))
# 把时间格式转成字符串
new_time = time.strftime("%Y-%m-%d %H:%M:%S", timeArray)
return new_time
# 从kafka里获取数据,清洗为我们需要的ip,时间,带宽
from pykafka import KafkaClient
client = KafkaClient(hosts="192.168.254.131:9092,192.168.254.132:9092,192.168.254.133:9092")
topic = client.topics['nginxlog']
balanced_consumer = topic.get_balanced_consumer(
consumer_group='testgroup',
# 自动提交offset
auto_commit_enable=True,
zookeeper_connect='nginx-kafka01:2181,nginx-kafka02:2181,nginx-kafka03:2181'
)
# consumer = topic.get_simple_consumer()
i = 1
for message in balanced_consumer:
if message is not None:
line = json.loads(message.value.decode("utf-8"))
log = line["message"]
tmp_lst = log.split()
ip = tmp_lst[0]
dt = tmp_lst[3].replace("[", "")
bt = tmp_lst[9]
dt = trans_time(dt)
prov, isp = resolv_ip(ip)
if prov and isp:
print(dt, prov, isp, bt)
cursor = db.cursor()
try:
cursor.execute(f"insert into mynginxlog values({i},{dt},'{prov}','{isp}',{bt})")
db.commit()
i += 1
except Exception as e:
print("插入失败", e)
db.rollback()
# 关闭数据库
db.close()