日志分析

最新推荐文章于 2023-12-02 23:15:26 发布

JackLi_csdn

最新推荐文章于 2023-12-02 23:15:26 发布

阅读量318

点赞数

分类专栏：日志分析文章标签：日志分析

本文链接：https://blog.csdn.net/jackli31742/article/details/51361294

版权

日志分析专栏收录该内容

3 篇文章 0 订阅

订阅专栏

要分析的字段：访问的资源、cookie（代表用户，因为ip是公网ip，很多人共用）、
步骤：
1、flume 采集数据，可以使用shell等脚本设置定时任务
2、对数据进行清洗
3、使用hive对数据进行分析（比如说外部分区表），自动执行
4、把hive分析的结果通过sqoop倒回到MySQL、hbase中
5、提供视图工具，类似报表

具体步骤：
1、启动要收集的节点上的flume

bin/flume-ng agent -n log -c conf -f conf/log.conf -Dflume.root.logger=INFO,console

2、将日志复制到flume采集的文件路径中(此步骤可以启动flume前和后都没有区别)

cp logtest.txt test/

3、hive中创建表(创建表时要指定不同的location，而且location是目录，直接指定到flume采集到hdfs的路径后，则不需要第四步)

CREATE EXTERNAL TABLE if not exists flume_ip_"%Y%m%d"(                  
     remote_addr STRING,  
    http_marketChannelName STRING,  
     http_hwMac STRING, 
    http_x_forwarded_for STRING
    )ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'  location '/flume/ip/%Y%m%d';

4、将hdfs导入hive中

LOAD DATA INPATH '/flume/20160512' into table log_analyze;

5、每个用户收看的数量排名

create table UserSort row format delimited fields terminated by '\t' as select *,count(http_marketChannelName) as ChannelNum from log_analyze group by http_hwMac having  ChannelNum >=2 order by ChannelNum desc limit 10;

直接create 不能指定列名，可以这样解决(无需再次以某种格式分割字段)

create table UserSort(http_hwMac string,channelnum1 bigint,channelnum2 bigint); 

insert into table UserSort select *,count(http_marketChannelName) as ChannelNum from flume_ip_$date group by http_hwMac having  ChannelNum >=2 order by ChannelNum desc limit 10;

6、每个用户收看的不同渠道排名

create table ChannelSort row format delimited fields terminated by '\t' as select *,count(http_marketChannelName) as ChannelNum from log_analyze group by http_marketChannelName ,http_hwMac having  ChannelNum >=2 order by ChannelNum desc limit 10;

create table flume_ip_$date_ChannelSort(http_marketChannelName string,http_hwMac string, ChannelNum1 bigint, ChannelNum2 bigint) 

insert into table flume_ip_$date_ChannelSort select *,count(http_marketChannelName) as ChannelNum from flume_ip_$date group by http_marketChannelName ,http_hwMac having  ChannelNum >=2 order by ChannelNum desc limit 10;

7、每个用户观看的详细信息

select * from log_analyze where trim(http_hwMac)='zhangsan';

8、每个渠道有多少用户观看

create table ChannelUserSort row format delimited fields terminated by '\t' as select *,count(http_hwMac) as UserNum from log_analyze group by http_marketChannelName having  UserNum >=2 order by UserNum desc limit 10;

create table ChannelUserSort_ip_$date(http_marketChannelName string,UserNum bigint,num bigint); 

insert into table ChannelUserSort_ip_$date select *,count(http_hwMac) as UserNum from log_analyze group by http_marketChannelName having  UserNum >=2 order by UserNum desc limit 10;

每个渠道下，不同的设备类型的排名

create table nginx_20160523_ChannelModelTypeSort(http_marketChannelName string,http_hwModel string,ModelTypeNum bigint); 

insert into table nginx_20160523_ChannelModelTypeSort select http_marketChannelName,http_hwModel,count(*) as ModelNum from nginx_20160523 group by http_marketChannelName,http_hwModel having ModelNum>=50 order by ModelNum desc limit 20;

9、追加当天的另一台服务器的日志到当前的记录中(重复上述步骤,注意是第二次创建的表)


insert into table ChannelSort select *,count(http_marketChannelName) as ChannelNum from log_analyze group by http_marketChannelName ,http_hwMac having  ChannelNum >=2 order by ChannelNum desc limit 10;

10、将数据相加

这里写代码片

11、将hive中的数据导出到mysql中
（1）mysql中创建对应的表

CREATE TABLE if not exists ChannelSort(
    http_marketChannelName varchar(100),  
     http_hwMac varchar(100), 
    num1 varchar(10),
    num2 varchar(10)
    );

(2) 使用sqoop导入数据

 sqoop export -connect jdbc:mysql://ip:3306/test -username root -password 123456 --table ChannelSort --export-dir /user/hive/warehouse/channelsort -m 1 -fields-terminated-by '\t'

(3)将数据相加

select  http_marketChannelName,sum(num1) as num from ChannelSort group by  http_marketChannelName order by num desc;

12、创建自动化脚本
vi daily.sh
chmod +x daily.sh

date=`/bin/date +%Y%m%d`

#/letv/data/apache-hive-0.13.1-bin/bin/hive -e "CREATE EXTERNAL TABLE if not exists nginx_${date}(
# host STRING,  
 # uri STRING,  
 # request_method STRING,  
 # remote_addr STRING,  
 # http_marketChannelName STRING,  
 # http_hwMac STRING, 
 # http_ethMac STRING,   
 # http_hwModel STRING,  
 # http_appVerName STRING,
 # time_local_status STRING,
 # body_bytes_sent string,
 # bytes_sent string,
 # gzip_ratio string,
 # http_referer STRING, 
 # http_cookie STRING,
 # http_user_agent STRING, 
 # http_x_forwarded_for STRING,
 # upstream_addr STRING,
 # upstream_response_time string,
 # upstream_status STRING,
 #request_time string
#)ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'  location '/nginx/${date}/logs';"

/letv/data/apache-hive-0.13.1-bin/bin/hive -e "create table nginx_${date}_ChannelUserSort(http_marketChannelName string,UserNum bigint,num bigint);"
/letv/data/apache-hive-0.13.1-bin/bin/hive -e "insert into table nginx_${date}_ChannelUserSort select *,count(distinct http_hwMac) as UserNum from nginx_${date} group by http_marketChannelName having  UserNum >=50 order by UserNum desc limit 20;"
/letv/data/apache-hive-0.13.1-bin/bin/hive -e "create table nginx_${date}_UserCount(http_marketChannelName string,UserNum bigint,num bigint);"
/letv/data/apache-hive-0.13.1-bin/bin/hive -e "insert into table nginx_${date}_UserCount select *,count(distinct http_hwMac) from nginx_${date};"

集群的其他机器要使用scp部署flume，并修改apache-flume-1.5.0-bin/conf/flume-env.sh中的JAVA_HOME=/usr/Java/jdk1.8.0_77和/etc/profile中的要一致，同时修改vim /etc/hosts以及apache-flume-1.5.0-bin/conf下的配置文件