1.创建虚拟机
-
1.虚拟机
-
2.查看虚拟机网络配置范围
-
3.设置虚拟机主机
-
-
- 在每台虚拟机上作如下配置
-
-
-
- node-01
-
-
-
- 修改完保存退出并重启所有的虚拟机查看ip地址配置,及连通内外网的情况
- 修改完保存退出并重启所有的虚拟机查看ip地址配置,及连通内外网的情况
-
-
2.用ssh工具SecureCRT连接虚拟机
-
-
1.安装lrzsz
-
-
-
2.创建export文件夹,data、software和servers目录
-
-
-
3.导入安装包
-
-
-
4.解压,移动,重命名
-
-
-
5.配置环境变量,source 生效
-
-
-
6.将环境变量scp到node-02和node-03
-
7.修改/etc/host文件
-
-
3.免密登录
-
- 1.node-01免密登录
-
- 2.验证
-
- 3.同理node-02,node-03做免密登录,验证
4.修改hadoop文件
-
- 1.修改hadoop中的core-site.xml
-
- 2.修改hdfs-site.xml
-
- 3.修改mapred-site.xml
-
- 4.修改yarn-site.xml,添加如下代码
-
- 5.修改hadoop-env.sh
-
- 6.修改workers
-
- 7.分发
scp -r /export/servers/hadoop/etc/hadoop/ node-03:/export/servers/hadoop/etc/
-
- 8.启动成功
5.配置flume,hive
-
- 1.flume
- 解压重命名flume
- 1.flume
-
-
- cp flume-env.sh.template flume-env.sh,flume-env.sh添加代码
-
-
-
- 修改/etc/profile,生效
-
-
- 2.hive
- 安装解压hive
- 修改hive配置,新建hive-site.xml用于个性化配置,可覆盖默认配置
- cp hive-default.xml.template hive-default.xml用于保留默认配置,
- 2.hive
-
-
- 将mysql的驱动包放在hive安装的lib目录下
-
-
-
- 初始化元数据库:./schematool -dbType mysql -initSchema
-
-
-
- 修改hive环境变量
-
-
-
-
- 运行成功
-
-
6.安装mysql
-
- 安装rpm包 rpm -ivh mysql57-community-release-el7-9.noarch.rpm
-
- 安装mysql
-
- 查看
-
- 获取密码
-
2020-12-22T01:37:43.181378Z 1 [Note] A temporary password is generated for root@localhost: sdwga%a8puO,
-
- 链接mysql,修改密码 。发生错误,直接跳过
7.安装sqoop
- 1.下载sqoop
wget http://mirror.bit.edu.cn/apache/sqoop/1.4.7/sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz
- 2.解压、移动、重命名
-
- 3.修改配置文件并生效
-
- 4.装载mysql驱动到sqoop/lib
-
- 5.将mysql-connector-java-5.xx.x.jar 复制到sqoop安装目录的lib目录中,运行
8.Flume搭建日志采集系统
-
- 下载,解压重命名flume,配置
-
- 启动flume agent:flume-ng agent -c conf -n a1 -f conf/flume-hdfs-sink.conf -Dflume.root.logger=INFO,console
-
- 清洗数据,利用以前编写的mapeduce程序
-
- 得到结果
-
- 上传hdfs,创建文件夹/var/pre
-
- 在hive中创建数据库
-
- 在库中创建表
create table ods_weblog_origin(vaildstring,remote_addr string,remote_user string,time_local string,requeststring,status string,body_bytes_sent string,http_referer string,http_user_agentstring) partitioned by (datestr string) row format delimited fields terminatedby '\001';
-
- 导入数据
load data inpath '/var/pre/'overwrite into table ods_weblog_origin partition(datestr='2018');
-
- 创建过渡表
create table t_ods_tmp_referurl as SELECTa. * ,b. * FROM ods_weblog_origin a LATERAL VIEWparse_url_tuple(regexp_replace(http_referer,"\"",""),'HOST','PATH','QUERY','QUERY:id')b as host,path,query,query_id; create table t_ods_tmp_detail as select b. *,substring(time_local,0,10) as daystr, substring(time_local,12) as tmstr,substring(time_local,6,12) as month,substring(time_local,9,2) asday,substring(time_local,11,3) as hour from t_ods_tmp_referurl b;
-
- 创建明细表
create table ods_weblog_detail(vailstring,remote_addr string,remote_user string,time_local string,daystrstring,timestr string,month string,day string,hour string,request string,statusstring,body_bytes_sent string,http_referer string,ref_host string,ref_pathstring, ref_query string,ref_query_id string,http_user_agent string)partitioned by(datestr string);
-
- 对默认动态分区参数进行修改
-
- 数据加载
insert overwrite table ods_weblog_detailpartition(datestr) select distinctotd.vaild,otd.remote_addr,otd.remote_user,otd.time_local,otd.daystr,otd.tmstr,otd.month,otd.day,otd.hour,otr.request,otr.status,otr.body_bytes_sent,otr.http_referer,otr.host,otr.path,otr.query,otr.query_id,otr.http_user_agent,otd.daystrfrom t_ods_tmp_detail as otd,t_ods_tmp_referurl as otr whereotd.remote_addr=otr.remote_addr and otd.time_local=otr.time_local andotd.body_bytes_sent=otr.body_bytes_sent and otd.request=otr.request;
9.分析
- 创建表结构
-
- 统计每天的量
- 创建表dw_avgpv_user_everyday
-
- 从表ods_weblog_detail获取相关数据并插入到维度表dw_avgpv_user_everyday中
-
在mysql中创建表并导入数据
-