(1) Hive ----- 使用记录(一)

------------------------------------------------------------------------------------------------------

1、建表语句

#init类型建表语句
USE vdnaccesslog;CREATE EXTERNAL TABLE IF NOT EXISTS srcInit(
remoteIp string,
countryCode string,
provinceCode string,
cityCode string,
ispCode string, 
accessY string,
accessYM string,
accessYMD string,
accessYMDH string,
accessYMDHM string,
accessYMDHMS string,
cookie string,
os string,
browserType string,
browserVersion string,
isHaveAD string,
pid string,
tai string,
t string,
isp2p string,
pf string,
referrer string
) 
PARTITIONED BY(date string) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' 
LOCATION '/data/flume/init/';

#ld类型建表语句
USE vdnaccesslog;CREATE EXTERNAL TABLE IF NOT EXISTS srcLd(
remoteIp string,
countryCode string,
provinceCode string,
cityCode string,
ispCode string, 
accessY string,
accessYM string,
accessYMD string,
accessYMDH string,
accessYMDHM string,
accessYMDHMS string,
cookie string,
os string,
browserType string,
browserVersion string,
v int,
te string,
dynamic string,
isHaveAD string,
adLen string,
referrer string,
pid string,
file string,
tai string,
fpv string,
guid string,
isp2p string,
pf string,
abroad string
) 
PARTITIONED BY(date string) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' 
LOCATION '/data/flume/ld/';

#other 类型建表语句other
USE vdnaccesslog;CREATE EXTERNAL TABLE IF NOT EXISTS srcOther(
remoteIp string,
countryCode string,
provinceCode string,
cityCode string,
ispCode string, 
accessY string,
accessYM string,
accessYMD string,
accessYMDH string,
accessYMDHM string,
accessYMDHMS string,
cookie string,
os string,
browserType string,
browserVersion string,
args string
) 
PARTITIONED BY(date string) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' 
LOCATION '/data/flume/other/';

#src
USE vdnaccesslog;CREATE EXTERNAL TABLE IF NOT EXISTS src(
remoteIp string,
countryCode string,
provinceCode string,
cityCode string,
ispCode string, 
accessY string,
accessYM string,
accessYMD string,
accessYMDH string,
accessYMDHM string,
accessYMDHMS string,
cookie string,
os string,
browserType string,
browserVersion string
) 
PARTITIONED BY(t string,date string) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' 
LOCATION '/data/flume/';

ALTER TABLE src ADD PARTITION (t='bc', date='20150119') LOCATION 'bc/20150119';
ALTER TABLE src ADD PARTITION (t='bf', date='20150119') LOCATION 'bf/20150119';
ALTER TABLE src ADD PARTITION (t='bt', date='20150119') LOCATION 'bt/20150119';
ALTER TABLE src ADD PARTITION (t='bw0004', date='20150119') LOCATION '_bw0004/20150119';
ALTER TABLE src ADD PARTITION (t='fs', date='20150119') LOCATION 'fs/20150119';
ALTER TABLE src ADD PARTITION (t='init', date='20150119') LOCATION 'init/20150119';
ALTER TABLE src ADD PARTITION (t='ld', date='20150119') LOCATION 'ld/20150119';
ALTER TABLE src ADD PARTITION (t='lv', date='20150119') LOCATION 'lv/20150119';
ALTER TABLE src ADD PARTITION (t='mem', date='20150119') LOCATION 'mem/20150119';
ALTER TABLE src ADD PARTITION (t='other', date='20150119') LOCATION 'other/20150119';
ALTER TABLE src ADD PARTITION (t='pt', date='20150119') LOCATION 'pt/20150119';
ALTER TABLE src ADD PARTITION (t='sk', date='20150119') LOCATION 'sk/20150119';

------------------------------------------------------------------------------------------------------

2、 Flume每天生成一个文件夹传递至HDFS,每天都增加新的内容,需要将新生成的数据添加进hive对应的表中

addPartitionEveryDay.sh  (执行时机 1 0 * * * sh /usr/local/hive-0.12.0/bash/addPartitionEveryDay.sh  )

#!/bin/bash
#Execute the script every day 0 point.create the next PARTITION for every table in db

cd /usr/local/hive-0.12.0

export HADOOP_HOME=/usr/local/hadoop-2.3.0-cdh5.1.0
export HIVE_HOME=/usr/local/hive-0.12.0

yest=`date -d yesterday +%Y%m%d`

###############################################
hql1="USE vdnaccesslog;ALTER TABLE srcinit ADD PARTITION (date='${yest}') LOCATION '${yest}';"
hql2="USE vdnaccesslog;ALTER TABLE srcld ADD PARTITION (date='${yest}') LOCATION '${yest}';"
hqlother="USE vdnaccesslog;ALTER TABLE srcother ADD PARTITION (date='${yest}') LOCATION '${yest}';"

hqlsrc1="USE vdnaccesslog;ALTER TABLE src ADD PARTITION (t='bc', date='${yest}') LOCATION 'bc/${yest}';"
hqlsrc2="USE vdnaccesslog;ALTER TABLE src ADD PARTITION (t='bf', date='${yest}') LOCATION 'bf/${yest}';"
hqlsrc3="USE vdnaccesslog;ALTER TABLE src ADD PARTITION (t='bt', date='${yest}') LOCATION 'bt/${yest}';"
hqlsrc4="USE vdnaccesslog;ALTER TABLE src ADD PARTITION (t='bw0004', date='${yest}') LOCATION '_bw0004/${yest}';"
hqlsrc5="USE vdnaccesslog;ALTER TABLE src ADD PARTITION (t='fs', date='${yest}') LOCATION 'fs/${yest}';"
hqlsrc6="USE vdnaccesslog;ALTER TABLE src ADD PARTITION (t='init', date='${yest}') LOCATION 'init/${yest}';"
hqlsrc7="USE vdnaccesslog;ALTER TABLE src ADD PARTITION (t='ld', date='${yest}') LOCATION 'ld/${yest}';"
hqlsrc8="USE vdnaccesslog;ALTER TABLE src ADD PARTITION (t='lv', date='${yest}') LOCATION 'lv/${yest}';"
hqlsrc9="USE vdnaccesslog;ALTER TABLE src ADD PARTITION (t='mem', date='${yest}') LOCATION 'mem/${yest}';"
hqlsrc10="USE vdnaccesslog;ALTER TABLE src ADD PARTITION (t='other', date='${yest}') LOCATION 'other/${yest}';"
hqlsrc11="USE vdnaccesslog;ALTER TABLE src ADD PARTITION (t='pt', date='${yest}') LOCATION 'pt/${yest}';"
hqlsrc12="USE vdnaccesslog;ALTER TABLE src ADD PARTITION (t='sk', date='${yest}') LOCATION 'sk/${yest}';"


##############################################
$HIVE_HOME/bin/hive -e "${hql1}">>/usr/local/hive-0.12.0/bash/a.log
$HIVE_HOME/bin/hive -e "${hql2}">>/usr/local/hive-0.12.0/bash/a.log
$HIVE_HOME/bin/hive -e "${hqlother}">>/usr/local/hive-0.12.0/bash/a.log

$HIVE_HOME/bin/hive -e "${hqlsrc1}">>/usr/local/hive-0.12.0/bash/a.log
$HIVE_HOME/bin/hive -e "${hqlsrc2}">>/usr/local/hive-0.12.0/bash/a.log
$HIVE_HOME/bin/hive -e "${hqlsrc3}">>/usr/local/hive-0.12.0/bash/a.log
$HIVE_HOME/bin/hive -e "${hqlsrc4}">>/usr/local/hive-0.12.0/bash/a.log
$HIVE_HOME/bin/hive -e "${hqlsrc5}">>/usr/local/hive-0.12.0/bash/a.log
$HIVE_HOME/bin/hive -e "${hqlsrc6}">>/usr/local/hive-0.12.0/bash/a.log
$HIVE_HOME/bin/hive -e "${hqlsrc7}">>/usr/local/hive-0.12.0/bash/a.log
$HIVE_HOME/bin/hive -e "${hqlsrc8}">>/usr/local/hive-0.12.0/bash/a.log
$HIVE_HOME/bin/hive -e "${hqlsrc9}">>/usr/local/hive-0.12.0/bash/a.log
$HIVE_HOME/bin/hive -e "${hqlsrc10}">>/usr/local/hive-0.12.0/bash/a.log
$HIVE_HOME/bin/hive -e "${hqlsrc11}">>/usr/local/hive-0.12.0/bash/a.log
$HIVE_HOME/bin/hive -e "${hqlsrc12}">>/usr/local/hive-0.12.0/bash/a.log

------------------------------------------------------------------------------------------------------

3、查询

测试语句1   (date指的是分区信息  PARTITION)

#query bt
/usr/local/hive-0.12.0/bin/hive -e "select accessY,accessYM,accessYMD, accessYMDH,accessYMDHM,sum(v) from vdnaccesslog.srcbt where date = '${dateYmd}' and v is not null and accessY is not null and accessYM is not null and  accessYMD is not null and accessYMDH is not null and accessYMDHM is not null  group by accessY,accessYM,accessYMD,accessYMDH,accessYMDHM;" > /mnt/resource/dist/bt/data_bt_${dateYmd}.txt 
#query pt
/usr/local/hive-0.12.0/bin/hive -e "select accessY,accessYM,accessYMD, accessYMDH,accessYMDHM,sum(v) from vdnaccesslog.srcpt where date = '${dateYmd}' and v is not null and accessY is not null and accessYM is not null and  accessYMD is not null and accessYMDH is not null and accessYMDHM is not null  group by accessY,accessYM,accessYMD,accessYMDH,accessYMDHM;" > /mnt/resource/dist/pt/data_pt_${dateYmd}.txt
#query uv
/usr/local/hive-0.12.0/bin/hive -e "select  '${dateYmd}', count(distinct remoteip) from vdnaccesslog.src where date = '${dateYmd}';" > /mnt/resource/dist/uv/data_uv_${dateYmd}.txt

远端mysql的建表语句3

建表bt
create table kdl_total_bt(
	id int not null auto_increment primary key,
	accessy  varchar(4),
	accessym varchar(7),
	accessymd varchar(10),
	accessymdh varchar(13),
	accessymdhm varchar(16),
	accessymdhms varchar(19),
	totalbt int
)engine=innodb default charset=utf8;

建表pt
create table kdl_total_pt(
	id int not null auto_increment primary key,
	accessy  varchar(4),
	accessym varchar(7),
	accessymd varchar(10),
	accessymdh varchar(13),
	accessymdhm varchar(16),
	accessymdhms varchar(19),
	totalpt int
)engine=innodb default charset=utf8;


建表pt
create table uv_info(
	id int not null auto_increment primary key,
	accessymd varchar(10),
	totaluv int
)engine=innodb default charset=utf8;


脚本 4。(脚本:先执行查询操作 ,将结果保存到本地文件中。将本地的文件导入到mysql对应的数据库中。)

(执行时机   0 1 * * * /usr/local/scripts/hiveQueryAndStorageMysql.sh   )

#! /bin/bash
##  bt pt  uv
#yyyymmdd for yesterday
dateYmd=`date -d yesterday +%Y%m%d`
dateRm=`date -d '-7 days' +%Y%m%d`
/usr/local/hadoop-2.3.0-cdh5.1.0/bin/hadoop fs -rm -f /data/flume/*/${dateRm}/*
###########################################
datebt1=`date '+%Y-%m-%d %H:%M:%S'`
#query bt
/usr/local/hive-0.12.0/bin/hive -e "select accessY,accessYM,accessYMD, accessYMDH,accessYMDHM,sum(v) from vdnaccesslog.srcbt where date = '${dateYmd}' and v is not null and accessY is not null and accessYM is not null and  accessYMD is not null and accessYMDH is not null and accessYMDHM is not null  group by accessY,accessYM,accessYMD,accessYMDH,accessYMDHM;" > /mnt/resource/dist/bt/data_bt_${dateYmd}.txt 
#bt to mysql 
/usr/local/mysql/bin/mysql -h101.240.8.201 -uhive -p123456 -e "load data local infile '/mnt/resource/dist/bt/data_bt_${dateYmd}.txt' into table vdnaccesslog.kdl_total_bt character set utf8 (accessy,accessym,accessymd,accessymdh,accessymdhm,totalbt);"
datebt2=`date '+%Y-%m-%d %H:%M:%S'`
echo ${datebt1}--${datebt2} bt query and storage >> /mnt/resource/dist/hiveQandSrunInfo.log
##########################################
datept1=`date '+%Y-%m-%d %H:%M:%S'`
#query pt
/usr/local/hive-0.12.0/bin/hive -e "select accessY,accessYM,accessYMD, accessYMDH,accessYMDHM,sum(v) from vdnaccesslog.srcpt where date = '${dateYmd}' and v is not null and accessY is not null and accessYM is not null and  accessYMD is not null and accessYMDH is not null and accessYMDHM is not null  group by accessY,accessYM,accessYMD,accessYMDH,accessYMDHM;" > /mnt/resource/dist/pt/data_pt_${dateYmd}.txt
#pt to mysql 
/usr/local/mysql/bin/mysql -h101.240.8.201 -uhive -p123456 -e "load data local infile '/mnt/resource/dist/pt/data_pt_${dateYmd}.txt' into table vdnaccesslog.kdl_total_pt character set utf8 (accessy,accessym,accessymd,accessymdh,accessymdhm,totalpt);"
datept2=`date '+%Y-%m-%d %H:%M:%S'`
echo ${datept1}--${datept2} pt query and storage >> /mnt/resource/dist/hiveQandSrunInfo.log
#####################################
dateuv1=`date '+%Y-%m-%d %H:%M:%S'`
#query uv
/usr/local/hive-0.12.0/bin/hive -e "select  '${dateYmd}', count(distinct remoteip) from vdnaccesslog.src where date = '${dateYmd}';" > /mnt/resource/dist/uv/data_uv_${dateYmd}.txt
#uv to mysql 
/usr/local/mysql/bin/mysql -h101.240.8.201 -uhive -p123456 -e "load data local infile '/mnt/resource/dist/uv/data_uv_${dateYmd}.txt' into table vdnaccesslog.uv_info character set utf8 (accessymd,totaluv);"
dateuv2=`date '+%Y-%m-%d %H:%M:%S'`
echo ${dateuv1}--${dateuv2} uv query and storage >> /mnt/resource/dist/hiveQandSrunInfo.log
####################################
dateuvv1=`date '+%Y-%m-%d %H:%M:%S'`
#vv
/usr/local/hive-0.12.0/bin/hive -e "select  '${dateYmd}', count(*) from vdnaccesslog.src where date = '${dateYmd}';" > /mnt/resource/dist/vv/data_vv_${dateYmd}.txt
#vv
/usr/local/mysql/bin/mysql -h101.240.8.201 -uhive -p123456 -e "load data local infile '/mnt/resource/dist/vv/data_vv_${dateYmd}.txt' into table vdnaccesslog.vv_info character set utf8 (accessymd,totalvv);" 
datevv2=`date '+%Y-%m-%d %H:%M:%S'`
echo ${datevv1}--${datevv2} vv query and storage >> /mnt/resource/dist/hiveQandSrunInfo.log
echo "===================================================" >> /mnt/resource/dist/hiveQandSrunInfo.log

 

------------------------------------------------------------------------------------------------------


------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------------

%%%发现问题%%%

1  deprecated

09:3执行

/usr/local/hive-0.12.0/bin/hive -e "select  '20150127', count(distinct remoteip) from vdnaccesslog.src where date = '20150127';" > /home/hadoop2/pjmTest/data_uv_panjm0127.txt

截止11:00处理完毕 。执行了一个半小时 我ca!

执行时间过长。 一个脚本写三个执行过程。凌晨1点执行,上午十点还未执行完第二个查询结果。
解决: 一个脚本拆分成3个脚本。三个查询 个字独立进行。而不是一个脚本队列中执行。


deprecated

pt,bt 的 Sql语句中的sum(1)--->> count(*)  (参考http://blog.csdn.net/kelly1984/article/details/8770106


在排序统计的时候
sum的算法为1+1+1+1+1+1=6,sum是未知的,需要额外的CPU每一步都需要计算
count是1+2+3=6,就是说count的计算公式是定的,发现一行,加1,那么就是1,下一次是从2开始加
显然sum的效率不如count。
所以开发童鞋们。在遇到类似的时候尽量使用count,而不用sum。


deprecated

执行 bt pt查询的 语句  

1 精确到分钟  2 group by 字段的顺序  年月日....或许能提高效率。

进行修改。 (修改sql 语句的时候  要修改多处   只修改一处问题还是会出。一定心细。)

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


(1) 执行脚本的过程中  卡在kill command 那一步 程序假死且 hive查询命令阻塞

原因 :Hadoop 占用的磁盘空间 快满了,需要清理磁盘空间。解决方案脚本执行(2)中的命令。 其他或者重启hadoop服务再试试。


(2) hadoop fs删除文件 (/usr/local/hadoop-2.3.0-cdh5.1.0/bin)

  949  hadoop fs -rm -f /data/flume/*/20150119/*
  950  hadoop fs -rm -f /data/flume/*/20150120/*
  951  hadoop fs -rm -f /data/flume/*/20150121/*
  952  hadoop fs -rm -f /data/flume/*/20150122/*
  953  hadoop fs -rm -f /data/flume/*/20150123/*
  954  hadoop fs -rm -f /data/flume/*/20150124/*
  955  hadoop fs -rm -f /data/flume/*/20150126/*

(3) (/usr/local/hadoop-2.3.0-cdh5.1.0/bin)

hadoop fs -du -s -h /data/flume/bt/*  查看文件系统的大小


(4) namenode节点的时间原来是UCT ,后来改为CTS。但是发现crontab中配置的  0 1 * * * /usr/local/scripts/hiveQueryAndStorageMysql.sh   脚本

  对应的执行时间仍然为UTC时间,即每天的CTS时间9点执行脚本。 看来crontab自己也会有一个 时间时区的概念。

   解决方案: 应该是时区时间更改之后没有 重启 crond 服务。

  当前用户hadoop    sudo至 root权限,service crond restart 执行这个命令。








  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值