1. 需要收集的信息
1、用户ID(user_id)
2、时间(act_time)
3、操作(action,可以是:点击:click,收藏:job_collect,投简历:cv_send,上传简历:cv_upload)
4、对方企业编码(job_code)
2. 工作流程
1、HTML可以理解为拉勾的职位浏览页面
2、用户的操作会由Web服务器进行响应。
3、同时用户的操作也会使用ajax向Nginx发送请求,nginx用于收集用户的点击数据流。
4、Nginx收集的日志数据使用ngx_kafka_module将数据发送到Kafka集群的主题中。
5、只要数据保存到Kafka集群主题,后续就可以使用大数据组件进行实时计算或其他的处理了,比如职位推荐,统计报表等。
3. 架构
HTML+Nginx+ngx_kafka_module+Kafka
ngx_kafka_module网址:https://github.com/brg-liuwei/ngx_kafka_module
注意问题:由于使用ngx_kafka_module,只能接收POST请求,同时一般Web服务器不会和数据收集的Nginx在同一个域名,会涉及到使用ajax发送请求的跨域问题,可以在nginx中配置跨域来解决。
4. 实战步骤
- kafka集群搭建
- 编写 HTML(有个4个button)
- 配置 Nginx(基本配置和 ngx_kafka_module)
- 配置 Kafka,创建 Topic
- 监听 Topic,查看消息
5.实现过程
5.1 kafka集群搭建
参考:https://blog.csdn.net/rzpy_qifengxiaoyue/article/details/109564719
5.2 编写 HTML
<!DOCTYPE html>
<html lang="en">
<head>
<title>kafka_test</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1,shrink-to-fit=no">
<script src="https://cdn.bootcdn.net/ajax/libs/jquery/3.5.1/jquery.js"></script>
<script>
function current() {
var d = new Date(),
str = '';
str += d.getFullYear() + '-';
str += d.getMonth() + 1 + '-';
str += d.getDate() + ' ';
str += d.getHours() + ':';
str += d.getMinutes() + ':';
str += d.getSeconds();
return str;
}
function operate(action) {
var json = {
'user_id': 'rpp',
'act_time': current().toString(),
'action': action,
'job_code': 'job_test'
};
$.ajax({
type: "POST",
url: "http://47.94.80.41:9090/kafka/log",
dataType: "json",
crossDomain: true,
data: JSON.stringify(json),
//允许跨域的cookie访问
xhrFields: {
withCredentials: true
},
success: function (data) {
alert("success")
},
error: function (data) {
alert("success")
}
})
}
</script>
</head>
<body>
<div class="row" style="text-align: center">
<div>
<button type="button" id="click" onclick="operate('click')">点击</button>
</div>
<div>
<button type="button" id="job_collect" onclick="operate('job_collect')">收藏职位</button>
</div>
<div>
<button type="button" id="resume_upload" onclick="operate('cv_upload')">上传简历</button>
</div>
<div>
<button type="button" id="resume_send" onclick="operate('cv_send')">投递简历</button>
</div>
</div>
</body>
</html>
5.3 配置 nginx
# 1. 安装git
$ yum install -y git
# 2. 安装相关依赖
$ yum install -y gcc gcc-c++ zlib zlib-devel openssl openssl-devel pcre pcre-devel
# 3. kafka的客户端源码
$ cd /root/software
$ git clone https://github.com/edenhill/librdkafka
# 4. 编译
$ cd /root/software/librdkafka
$ ./configure
$ make && make install
# 5. 安装
$ cd /root/software
$ wget http://nginx.org/download/nginx-1.18.0.tar.gz
# 6. 解压
$ tar -zxf nginx-1.18.0.tar.gz
# 7. 下载模块源码
$ cd /root/software
$ git clone https://github.com/brg-liuwei/ngx_kafka_module
# 8. 编译
$ cd /root/software/nginx-1.18.0
$ ./configure --add-module=/root/software/ngx_kafka_module/
$ make && make install
修改 nginx.conf 配置
# 1. 修改 nginx.conf 配置
$ vi /usr/local/nginx/conf/nginx.conf
# 启动 nginx
$ cd /usr/local/nginx/sbin
$ ./nginx
nginx.conf 如下
#pid logs/nginx.pid;
events {
worker_connections 1024;
}
http {
include mime.types;
default_type application/octet-stream;
#log_format main '$remote_addr - $remote_user [$time_local] "$request" '
# '$status $body_bytes_sent "$http_referer" '
# '"$http_user_agent" "$http_x_forwarded_for"';
#access_log logs/access.log main;
sendfile on;
#tcp_nopush on;
#keepalive_timeout 0;
keepalive_timeout 65;
#gzip on;
kafka;
kafka_broker_list rpp:9092 rpp:9093 rpp:9094;
server {
listen 9090;
server_name localhost;
#charset koi8-r;
#access_log logs/host.access.log main;
#------------kafka相关配置开始------------
location = /kafka/log {
#跨域相关配置
add_header 'Access-Control-Allow-Origin' $http_origin;
add_header 'Access-Control-Allow-Credentials' 'true';
add_header 'Access-Control-Allow-Methods' 'GET, POST, OPTIONS';
kafka_topic topic_rpp;
}
#------------kafka相关配置结束------------
#error_page 404 /404.html;
}
}
5.4 创建 kafka 的 topic
# 创建topic
[root@rpp bin]# ./kafka-topics.sh --zookeeper rpp:2181/myKafka --create --topic topic_rpp --partitions 1 --replication-factor 1
WARNING: Due to limitations in metric names, topics with a period ('.') or underscore ('_') could collide. To avoid issues it is best to use either, but not both.
Created topic "topic_rpp".
# 查看topic列表
[root@rpp bin]# ./kafka-topics.sh --zookeeper rpp:2181/myKafka --list
topic_rpp
# 查看topic详情
[root@rpp bin]# ./kafka-topics.sh --zookeeper rpp:2181/myKafka --describe --topic topic_rpp
Topic:topic_rpp PartitionCount:1 ReplicationFactor:1 Configs:
Topic: topic_rpp Partition: 0 Leader: 0 Replicas: 0 Isr: 0
启动消费者,监听消息
./kafka-console-consumer.sh --bootstrap-server rpp:9092 --topic topic_rpp --from-beginning
5.5 测试
html页面如下
点击按钮,查看接收到的消息,如下
[root@rpp bin]# ./kafka-console-consumer.sh --bootstrap-server rpp:9092 --topic topic_rpp --from-beginning
{"user_id":"rpp","act_time":"2020-11-9 13:52:27","action":"click","job_code":"job_test"}
{"user_id":"rpp","act_time":"2020-11-9 13:52:41","action":"job_collect","job_code":"job_test"}
{"user_id":"rpp","act_time":"2020-11-9 14:2:48","action":"cv_send","job_code":"job_test"}
{"user_id":"rpp","act_time":"2020-11-9 14:3:4","action":"cv_upload","job_code":"job_test"}