项目:日志收集分析平台的搭建

该文详细介绍了如何在三台虚拟机上配置Nginx和Kafka集群,包括修改主机名、设置静态IP、安装基础软件、关闭防火墙和SELinux等步骤。接着,文章演示了Nginx的配置与启动,以及Kafka的安装、配置、启动和测试。此外,还提到了使用Filebeat进行日志收集,并将其数据发送至Kafka,最后展示了使用Python处理日志数据并入库的过程。
摘要由CSDN通过智能技术生成

环境准备(7步,三台机器都要做):

  1. 准备好3台虚拟机,搭建nginx和kafka集群

最好实用新的虚拟机,有些环境配置可能会相互影响

  1. 修改主机名(方便管理标明序号1,2,3)

编辑/etc/hostname下的主机名文件,三台机器分别改成nginx-kafka01,nginx-kafka02,nginx-kafka03,这里属于永久修改

[root@nginx-kafka01 ~]# vim /etc/hostname

修改完记得重启一下

[root@nginx-kafka01 ~]# reboot

  1. 配置好静态ip地址,三台ip要不一样,这里只列举了kafka01

[root@nginx-kafka01 ~]# cat /etc/sysconfig/network-scripts/ifcfg-ens33

TYPE=Ethernet

BOOTPROTO=static

DEFROUTE=yes

NAME=ens33

UUID=d4146a30-e3f2-46a2-b544-f5306b3986e1

DEVICE=ens33

ONBOOT=yes

IPADDR=192.168.254.131

NETMASK=255.255.255.0

GATEWAY=192.168.254.2

DNS1=114.114.114.114

配置好dns

[root@nginx-kafka01 ~]# cat /etc/resolv.conf

# Generated by NetworkManager

nameserver 114.114.114.114

  1. 每一台机器上都写好域名解析,三台都要

[root@nginx-kafka01 ~]# cat /etc/hosts

127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4

::1 localhost localhost.localdomain localhost6 localhost6.localdomain6

192.168.254.131 nginx-kafka01

192.168.254.132 nginx-kafka02

192.168.254.133 nginx-kafka03

  1. 安装基本软件,三台都要

[root@nginx-kafka01 ~]# yum install wget lsof vim -y

  1. 安装时间同步服务

[root@nginx-kafka01 ~]# yum -y install chrony

设置开机自启

[root@nginx-kafka01 ~]# systemctl enable chronyd

启动

[root@nginx-kafka01 ~]# systemctl start chronyd

  1. 关闭防火墙,三台都要

[root@nginx-kafka01 ~]# systemctl stop firewalld

[root@nginx-kafka01 ~]# systemctl disable firewalld

关闭selinux(安全相关的子系统,规矩非常多),打开/etc/selinux/config的文件设置SELINUX=disabled,详情如下:

[root@nginx-kafka01 ~]# cat /etc/selinux/config

# This file controls the state of SELinux on the system.

# SELINUX= can take one of these three values:

# enforcing - SELinux security policy is enforced.

# permissive - SELinux prints warnings instead of enforcing.

# disabled - No SELinux policy is loaded.

SELINUX=disabled

# SELINUXTYPE= can take one of three values:

# targeted - Targeted processes are protected,

# minimum - Modification of targeted policy. Only selected processes are protected.

# mls - Multi Level Security protection.

SELINUXTYPE=targeted

selinux配置完毕需要重启机器

[root@nginx-kafka01 ~]# reboot

getenforce可以查看selinux设置是否完成

[root@nginx-kafka01 ~]# getenforce

Disabled

========================================

========================================

环境准备完毕开始nginx搭建,可以三台一起,电脑配置不行的话选一台搭建nginx就行

  1. 安装相关软件:

[root@nginx-kafka01 ~]# yum install epel-release -y

[root@nginx-kafka01 ~]# yum install nginx -y

启动:

[root@nginx-kafka01 ~]# systemctl start nginx

设置开机自启:

[root@nginx-kafka01 ~]# systemctl enable nginx

  1. 编辑配置文件/etc/nginx/nginx.conf:

listen 80 default_server;

修改成:

listen 80;

这里新的版本里是正常的,没有default_server可以不用修改

在/etc/nginx/conf.d下创建sc.conf文件,在/var/log/nginx下创建sc文件

[root@nginx-kafka01 ~]# mkdir -p /etc/nginx/conf.d/sc.conf

[root@nginx-kafka01 ~]# mkdir /var/log/nginx/sc

打开/etc/nginx/conf.d下创建sc.conf的文件输入以下内容

[root@nginx-kafka01 ~]# vim /etc/nginx/conf.d/sc.conf

server {

listen 80 default_server;

server_name www.sc.com;

root /usr/share/nginx/html;

access_log /var/log/nginx/sc/access.log main;

location / {

}

}

然后nginx语法检测一下,显示正常:

[root@nginx-kafka01 ~]# nginx -t

nginx: the configuration file /etc/nginx/nginx.conf syntax is ok

nginx: configuration file /etc/nginx/nginx.conf test is successful

重新加载nginx

[root@nginx-kafka01 ~]# nginx -s reload

========================================

========================================

下面准备kafka的集群搭建:

1.安装java:

[root@nginx-kafka01 ~]# yum install java wget -y

进入/opt安装kafka:

[root@nginx-kafka01 opt]# wget https://archive.apache.org/dist/kafka/2.8.1/kafka_2.12-2.8.1.tgz

解包:

[root@nginx-kafka01 opt]# tar xf kafka_2.12-2.8.1.tgz

使用自带的zookeeper集群配置

进入/opt安装zookeeper:

[root@nginx-kafka01 opt]# wget https://dlcdn.apache.org/zookeeper/zookeeper-3.7.1/apache-zookeeper-3.7.1.tar.gz

解包:

[root@nginx-kafka01 opt]# tar xf apache-zookeeper-3.7.1.tar.gz

2.配置kafka

修改/opt/kafka_2.12-2.8.1/config/server.properties文件,加下面三个值修改成对应的:

broker.id=0 # 固定

listeners=PLAINTEXT://nginx-kafka01:9092 # 三台机器改成自己对应的主机名称

zookeeper.connect=192.168.254.131:2181,192.168.254.132:2181,192.168.254.133:2181 # ip写自己的,三台机器都这么写

  1. 配置zk

进入/opt/apache-zookeeper-3.7.1/confs

首先拷贝zoo_sample.cfg到新建文件zoo.cfg防止更改错误

[root@nginx-kafka01 confs]# cp zoo_sample.cfg zoo.cfg

修改zoo.cfg文件, 在最底部添加如下三行:

server.1=192.168.254.131:3888:4888

server.2=192.168.254.132:3888:4888

server.3=192.168.254.133:3888:4888

三台机器都需配置,是为了让集群能够顺利找到对应机器

3888和4888都是端口 一个用于数据传输,一个用于检验存活性和选举

创建/tmp/zookeeper目录 ,在目录中添加myid文件,文件内容就是本机指定的zookeeper id内容

如:在192.168.254.131机器上

[root@nginx-kafka01 zookeeper]echo 1 > /tmp/zookeeper/myid

三台机器编号不同,其他两台1改成2和3

启动zookeeper:

[root@nginx-kafka01 apache-zookeeper-3.7.1]# bin/zkServer.sh start

开启zk和kafka的时候,一定是先启动zk,再启动kafka

关闭服务的时候,kafka先关闭,再关闭zk

[root@nginx-kafka03 apache-zookeeper-3.7.1]# bin/zkServer.sh status

/usr/bin/java

ZooKeeper JMX enabled by default

Using config: /opt/apache-zookeeper-3.7.1/bin/../conf/zoo.cfg

Client port found: 2181. Client address: localhost. Client SSL: false.

Mode: leader

三台机器起来是有一台 leader和两台follower

在/opt/kafka_2.12-2.8.1目录下启动kafka,其他几台同理:

[root@nginx-kafka01 kafka_2.12-2.8.1]# bin/kafka-server-start.sh -daemon config/server.properties

zookeeper使用:

在/opt/apache-zookeeper-3.7.1下进入kafka集群管理

[root@nginx-kafka01 apache-zookeeper-3.7.1]# bin/zkCli.sh

查看集群是否三台机器都以加入,在这里路径都只认绝对路径:

[zk: localhost:2181(CONNECTED) 2] ls /brokers/ids

[1, 2, 3]

这个时候kafka集群已经搭建完毕,下面是测试:

创建topic

bin/kafka-topics.sh --create --zookeeper 192.168.254.131:2181 --replication-factor 1 --partitions 1 --topic sc

查看topic

/opt/kafka_2.12-2.8.1/bin/kafka-topics.sh --list --zookeeper 192.168.254.131:2181

创建生产者

[root@nginx-kafka01 kafka_2.12-2.8.1]# bin/kafka-console-producer.sh --broker-list 192.168.254.131:9092 --topic sc

>hello

>sanchuang tongle

>nihao

>world !!!!!!1

>

创建消费者

[root@nginx-kafka01 kafka_2.12-2.8.1]# bin/kafka-console-consumer.sh --bootstrap-server 192.168.254.131:9092 --topic sc --from-beginning

只有生产者生产,消费者才能消费

========================================

========================================

filebeat(轻量级的日志收集工具)部署

1.安装

[root@nginx-kafka01 kafka_2.12-2.8.1]# rpm --import https://packages.elastic.co/GPG-KEY-elasticsearch

编辑 /etc/yum.repos.d/fb.repo

[root@nginx-kafka01 kafka_2.12-2.8.1]# vim /etc/yum.repos.d/fb.repo

[elastic-7.x]

name=Elastic repository for 7.x packages

baseurl=https://artifacts.elastic.co/packages/7.x/yum

gpgcheck=1

gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch

enabled=1

autorefresh=1

​type=rpm-md

安装filebeat

[root@nginx-kafka01 kafka_2.12-2.8.1]# yum install filebeat -y

2.设置开机自启

[root@nginx-kafka01 kafka_2.12-2.8.1]#systemctl enable filebeat

修改配置文件/etc/filebeat/filebeat.yml内容如下:

filebeat.inputs:

- type: log

# Change to true to enable this input configuration.

enabled: true

# Paths that should be crawled and fetched. Glob based paths.

paths:

- /var/log/nginx/sc/access.log

#==========------------------------------kafka-----------------------------------

output.kafka:

hosts: ["192.168.229.139:9092","192.168.229.140:9092"]

topic: nginxlog

keep_alive: 10s

这里要确保/etc/nginx/conf.d/sc.conf 的访问日志文件路径和/etc/filebeat/filebeat.yml的paths路径一样,否则日志将读取不出数据

确保/var/log/nginx/sc/access.log 文件存在

4.创建主题nginxlog

[root@nginx-kafka01 kafka_2.12-2.8.1]# bin/kafka-topics.sh --create --zookeeper 192.168.77.132:2181 --replication-factor 3 --partitions 1 --topic nginxlog

5.启动服务:

[root@nginx-kafka01 kafka_2.12-2.8.1]# systemctl start filebeat

6.访问本地ip,生产者产生数据,消费者消费,同时注意生产消费者需要把topic 改成创建的主题nginxlog

========================================

========================================

使用python实现日志文件数据入库:

import json

import requests

import time

import pymysql

# 连接数据库

db = pymysql.connect(

host="192.168.254.131", # mysql主机ip

user="sctl", # 用户名

passwd="123456", # 密码

database="flask" # 数据库

)

taobao_url = "https://ip.taobao.com/outGetIpInfo?accessKey=alibaba-inc&ip="

# 查询ip地址的信息(省份和运营商isp),通过taobao网的接口

def resolv_ip(ip):

response = requests.get(taobao_url + ip)

if response.status_code == 200:

tmp_dict = json.loads(response.text)

prov = tmp_dict["data"]["region"]

isp = tmp_dict["data"]["isp"]

return prov, isp

return None, None

# 将日志里读取的格式转换为我们指定的格式

def trans_time(dt):

# 把字符串转成时间格式

timeArray = time.strptime(dt, "%d/%b/%Y:%H:%M:%S")

# timeStamp = int(time.mktime(timeArray))

# 把时间格式转成字符串

new_time = time.strftime("%Y-%m-%d %H:%M:%S", timeArray)

return new_time

# 从kafka里获取数据,清洗为我们需要的ip,时间,带宽

from pykafka import KafkaClient

client = KafkaClient(hosts="192.168.254.131:9092,192.168.254.132:9092,192.168.254.133:9092")

topic = client.topics['nginxlog']

balanced_consumer = topic.get_balanced_consumer(

consumer_group='testgroup',

# 自动提交offset

auto_commit_enable=True,

zookeeper_connect='nginx-kafka01:2181,nginx-kafka02:2181,nginx-kafka03:2181'

)

# consumer = topic.get_simple_consumer()

i = 1

for message in balanced_consumer:

if message is not None:

line = json.loads(message.value.decode("utf-8"))

log = line["message"]

tmp_lst = log.split()

ip = tmp_lst[0]

dt = tmp_lst[3].replace("[", "")

bt = tmp_lst[9]

dt = trans_time(dt)

prov, isp = resolv_ip(ip)

if prov and isp:

print(dt, prov, isp, bt)

cursor = db.cursor()

try:

cursor.execute(f"insert into mynginxlog values({i},{dt},'{prov}','{isp}',{bt})")

db.commit()

i += 1

except Exception as e:

print("插入失败", e)

db.rollback()

# 关闭数据库

db.close()

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值