大数据作业
使用Kafka做日志收集
需要收集的信息:
1、用户ID(user_id)
2、时间(act_time)
3、操作(action,可以是:点击:click,收藏:job_collect,投简历:cv_send,上传简历:cv_upload)
4、对方企业编码(job_code)
1、HTML可以理解为拉勾的职位浏览页面
2、Nginx用于收集用户的点击数据流,记录日志access.log
3、将Nginx收集的日志数据发送到Kafka主题:tp_individual
架构:
HTML+Nginx+ngx_kafka_module+Kafka
步骤:
1 安装JDK1.8
2 安装kafka , zk
3 配置ngx_kafka_module ,nginx 会存在跨域问题
4 页面开发
安装JDK1.8
wget --no-cookies --no-check-certificate --header "Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com%2F; oraclelicense=accept-securebackup-cookie" "http://download.oracle.com/otn-pub/java/jdk/8u141-b15/336fa29ff2bb4ef291e347e091f7f4a7/jdk-8u261-linux-x64.tar.gz"
tar -xzf jdk-8u141-linux-x64.tar.gz
配置环境变量
vim /etc/profile
export JAVA_HOME=/root/kafka/jdk1.8.0_141
export PATH=$PATH:$JAVA_HOME/bin
# 生效
source /etc/profile
# 验证
java -version
zookeeper 安装
wget http://archive.apache.org/dist/zookeeper/zookeeper-3.4.14/zookeeper-3.4.14.tar.gz
tar -zxf zookeeper-3.4.14.tar.gz -C /opt
cd /opt/zookeeper-3.4.14/conf
# 复制zoo_sample.cfg命名为zoo.cfg
cp zoo_sample.cfg zoo.cfg
# 编辑zoo.cfg文件
vim zoo.cfg
修改Zookeeper保存数据的目录,dataDir:
dataDir=/var/zookeeper/data
编辑/etc/profile
vim /etc/profile
# ZOOKEEPER_PREFIX指向Zookeeper的解压目录
export ZOOKEEPER_PROFIX=/opt/zookeeper-3.4.14/
# 将Zookeeper的bin目录添加到PATH中
export PATH=$PATH:$ZOOKEEPER_PROFIX/bin
# 设置环境变量ZOO_LOG_DIR,指定Zookeeper保存日志的位置
export ZOO_LOG_DIR=/var/zookeeper/log
使配置生效
source /etc/profile
验证
[root@mysql-slave1 bin]# cd /opt/zookeeper-3.4.14/bin
[root@mysql-slave1 bin]# zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /opt/zookeeper-3.4.14/bin/../conf/zoo.cfg
Error contacting service. It is probably not running.
kafka安装
1.软件下载
wget http://archive.apache.org/dist/kafka/1.0.2/kafka_2.12-1.0.2.tgz
tar -zxf kafka_2.12-1.0.2.tgz -C /opt
2.配置环境变量
vim /etc/profile
export KAFKA_HOME=/opt/kafka_2.12-1.0.2
export PATH=$PATH:$KAFKA_HOME/bin
source /etc/profile
3.配置kafka 连接Zookeeper地址
vim /opt/kafka_2.12-1.0.2/config/server.properties
log.dirs=/var/kafka-logs
zookeeper.connect=localhost:2181/myKafka
4.启动zookeeper
[root@mysql-slave1 bin]# cd /opt/zookeeper-3.4.14/bin/
[root@mysql-slave1 bin]# zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /opt/zookeeper-3.4.14/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[root@mysql-slave1 bin]# zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /opt/zookeeper-3.4.14/bin/../conf/zoo.cfg
Mode: standalone
5.启动kafka
cd /opt/kafka_2.12-1.0.2
bin/kafka-server-start.sh config/server.properties
# 此时Kafka是前台模式启动,要停止,使用Ctrl+C; 使用后台启动参数-daemon
bin/kafka-server-start.sh -daemon config/server.properties
# 查看Kafka的后台进程:
ps aux | grep kafka
6.停止kafka
cd /opt/kafka_2.12-1.0.2
bin/kafka-server-stop.sh
启动异常处理
Caused by: java.net.UnknownHostException: mysql-