使用Kafka做日志收集。
需要收集的信息:
1、用户ID(user_id)
2、时间(act_time)
3、操作(action,可以是:点击:click,收藏:job_collect,投简历:cv_send,上传简历:cv_upload)
4、对方企业编码(job_code)
1、HTML可以理解为拉勾的职位浏览页面
2、Nginx用于收集用户的点击数据流,记录日志access.log
3、将Nginx收集的日志数据发送到Kafka主题:tp_individual
架构:
HTML+Nginx+ngx_kafka_module+Kafka
提示:
学员需要自己下载nginx,配置nginx的ngx_kafka_module,自定义一个html页面,能做到点击连接就收集用户动作数据即可。
需要先安装Kafka,zookeeper
启动Kafka,zookeeper
zkServer.sh start
kafka-server-start.sh -daemon /opt/kafka_2.12-1.0.2/config/server.properties
安装前确认系统中安装了gcc、pcre-devel、zlib-devel、openssl-devel
yum -y install gcc pcre-devel zlib-devel openssl openssl-devel
安装目录:cd /opt
下载librdkafka ngx_kafka_module nginx
nginx
wget https://nginx.org/download/nginx-1.9.9.tar.gz
librdkafka和ngx_kafka_module的百度云链接
链接:https://pan.baidu.com/s/1e5D0-A_DzZyPneai90grqw
提取码:1111
librdkafka编译
cd /opt/librdkafka
make & make install
nginx整合ngx_kafka_module模块
cd /opt
tar xvf nginx-1.18.0.tar.gz
cd /opt/nginx-1.18.0
./configure --add-module=/opt/ngx_kafka_module
make & make install
nginx配置文件修改:
cd /opt/nginx-1.18.0/conf/
vim nginx.conf
http {
#增加内容
kafka;
kafka_broker_list hadoop1:9092; # hostname:端口 与windows的host解析一致
#修改内容
server {
listen 80;
server_name hadoop1; # hostname:端口 与windows的host解析一致
location /kafka/log {
kafka_topic your_topic; #ajax转发,到kafka的主题
}
}
启动nginx
cd /usr/local/nginx/sbin
#初次启动
./nginx ./nginx -c /opt/nginx-1.18.0/conf/nginx.conf
#后面启动
./nginx -c /opt/nginx-1.18.0/conf/nginx.conf -s reload
报错,找不到kafka.so.1的文件
error while loading shared libraries: librdkafka.so.1: cannot open shared object file: No such file or directory
原因是没有加载库编译**,解决方案:
加载so库
#开机加载/usr/local/lib下面的库
echo "/usr/local/lib" >> /etc/ld.so.conf
#手动加载
ldconfig
查看一下有没有your_topic主题
kafka-topics.sh --list --zookeeper localhost:2181/myKafka
测试前把nginx开启,记得要ping通才能测试,而且开启相应的端口,开始测试:向nginx中写入数据,然后观察kafka的消费者能不能消费到数据
kafka开启消费者
kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic your_topic --from-beginning
开始测试:
curl http://hadoop1:80/kafka/log -d "message send to kafka topic"
curl http://hadoop1:80/kafka/log -d "test"
正确效果:
[root@hadoop1 ~]# kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic your_topic --from-beginning
message send to kafka topic
test
kafak收到网页按钮是由ajax转发得到
网页位置:/usr/local/nginx/html
html网页
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1,shrink-to-fit=no">
<title>index</title>
<!-- jquery cdn, 可换其他 -->
<script src="https://cdn.bootcdn.net/ajax/libs/jquery/3.5.1/jquery.js"></script>
</head>
<body>
<input id="click" type="button" value="点击" onclick="operate('click')" />
<input id="collect" type="button" value="收藏" onclick="operate('job_collect')" />
<input id="send" type="button" value="投简历" onclick="operate('cv_send')" />
<input id="upload" type="button" value="上传简历" onclick="operate('cv_upload')" />
</body>
<script>
function operate(action) {
var json = {'user_id': 'u_donald', 'act_time': current().toString(), 'action': action, 'job_code': 'donald'};
$.ajax({
url:"http://hadoop1:80/kafka/log",
type:"POST" ,
crossDomain: true,
data: JSON.stringify(json),
// 下面这句话允许跨域的cookie访问
xhrFields: {
withCredentials: true
},
success:function (data, status, xhr) {
// console.log("操作成功:'" + action)
},
error:function (err) {
// console.log(err.responseText);
}
});
};
function current() {
var d = new Date(),
str = '';
str += d.getFullYear() + '-';
str += d.getMonth() + 1 + '-';
str += d.getDate() + ' ';
str += d.getHours() + ':';
str += d.getMinutes() + ':';
str += d.getSeconds();
return str;
}
</script>
</html>
网页预览效果:
测试结果: