(大数据静态处理demo)flume+hadoop+hive+sqoop+mysql

(大数据静态处理demo)flume+hadoop+hive+sqoop+mysql


  • 思路:分析日志文件,flume将日志文件上传到hadoop。hive获取日志信息到表一。表一清洗一遍数据到表二,表二处理结果到表三。计算出当日访问数量和当日用户访问数。表三通过sqoop导出表到mysql。完成数据处理。

安装教程:
大数据集群搭建—《flume篇》
大数据集群搭建—《高可用hadoop篇》
大数据集群搭建—《hive篇》
大数据集群搭建—《sqoop篇》


一、日志文件source.log

192.168.123.12 mid_1 tel 100 20181111
192.168.123.123 mid_2 tel 100 20181111
192.168.123.12 mid_1 tel 100 20181111
192.168.123.12 mid_1 tel 100 20181111


二、配置flume(三台)

  1. 转移配置文件
//1.将hadoop的hdfs-site.xml和core-site.xml 放到flume/conf下
cp /opt/hadoop/hadoop-2.7.3/etc/hadoop/core-site.xml    /opt/flume/apache-flume-1.8.0-bin/conf/

cp /opt/hadoop/hadoop-2.7.3/etc/hadoop/hdfs-site.xml    /opt/flume/apache-flume-1.8.0-bin/conf/
//2.将hadoop的jar包拷贝到flume的lib目录下
cp /opt/hadoop/hadoop-2.7.3/share/hadoop/common/hadoop-common-2.7.3.jar   /opt/flume/apache-flume-1.8.0-bin/lib/

cp /opt/hadoop/hadoop-2.7.3/share/hadoop/common/lib/hadoop-auth-2.7.3.jar   /opt/flume/apache-flume-1.8.0-bin/lib/

cp /opt/hadoop/hadoop-2.7.3/share/hadoop/common/lib/commons-configuration-1.6.jar   /opt/flume/apache-flume-1.8.0-bin/lib/
  1. 增加配置flume-hdfs.conf(主机)
cd /opt/flume/apache-flume-1.8.0-bin/conf/
vi flume_hdfs.conf
//添加如下
#定义agent名, source、channel、sink的名称
a4.sources = r1
a4.channels = c1
a4.sinks = k1

#具体定义source
a4.sources.r1.type = spooldir
 #先创建此目录,保证里面空的
a4.sources.r1.spoolDir = /logs

#具体定义channel
a4.channels.c1.type = memory
a4.channels.c1.capacity = 10000
a4.channels.c1.transactionCapacity = 100

#定义拦截器,为消息添加时间戳
a4.sources.r1.interceptors = i1
a4.sources.r1.interceptors.i1.type = org.apache.flume.interceptor.TimestampInterceptor$Builder


#具体定义sink
a4.sinks.k1.type = hdfs
                   #集群的nameservers名字
                  #单节点的直接写:hdfs://zhiyou01:9000/xxx
a4.sinks.k1.hdfs.path = hdfs://ns/flume/%Y%m%d
a4.sinks.k1.hdfs.filePrefix = events-
a4.sinks.k1.hdfs.fileType = DataStream
#防止超时  
a1.sinks.k1.hdfs.callTimeout=3600000
#不按照条数生成文件
a4.sinks.k1.hdfs.rollCount = 0
#HDFS上的文件达到128M时生成一个文件
a4.sinks.k1.hdfs.rollSize = 134217728
#HDFS上的文件达到60秒生成一个文件
a4.sinks.k1.hdfs.rollInterval = 60

#组装source、channel、sink
a4.sources.r1.channels = c1
a4.sinks.k1.channel = c1


三、启动zookeeper、hadoop、flume

//1.zookeeper(三台)
cd /opt/zookeeper/zookeeper-3.4.12/bin/
./zkServer.sh start
//2.hadoop
cd /opt/hadoop/hadoop-2.7.3/sbin/
./start-all.sh 
//3.flume
cd /opt/flume/apache-flume-1.8.0-bin
 bin/flume-ng agent -n a4 -c conf -f conf/flume_hdfs.conf -Dflume.root.logger=INFO,console

enter description here

四、flume写入文件,自动上传到hdfs

//在flume指定文件夹添加日志文件
cd /logs/
vi source.log
192.168.123.12 mid_1 tel 100 20181111
192.168.123.123 mid_2 tel 100 20181111
192.168.123.12 mid_1 tel 100 20181111
192.168.123.12 mid_1 tel 100 20181111 

enter description here

五、hive处理数据

  1. 导入数据到表一
cd /opt/hive/apache-hive-2.3.3-bin/bin
./hive
//创建数据库
create database zhiyou_log;
use zhiyou_log;
//创建表一
 create table source_log(ip string,mid string,type string,price double,time string) row format  delimited fields terminated by " ";
//导入数据
 load data inpath "/flume/20191102/events-.1572661918682" into table source_log;

enter description here

  1. 清洗数据到表二(拿出ip,mid)
create table clear_log(ips String,mids String) row format  delimited fields terminated by " ";
insert into clear_log select ip,mid from source_log;

enter description here

  1. 计算结果到表三
create table result_log(pv bigint,nv bigint,ips String) row format  delimited fields terminated by " ";
//计算数据并插入
insert into result_log select count(mids) pv ,count(distinct(mids)) nv ,count(distinct(ips)) ips from clear_log;

enter description here
enter description here

六、使用sqoop导出数据到mysql

  1. 建立mysql数据库
    enter description here
  2. sqoop导出数据
cd /opt/sqoop/sqoop-1.4.7.bin__hadoop-2.6.0/bin/

./sqoop export --connect jdbc:mysql://192.168.80.128:3306/sqloop --username root --password HBQ521521cf* --export-dir '/user/hive/warehouse/zhiyou_log.db/result_log' --table tongji --fields-terminated-by ' ';

enter description here

  • 2
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值