1.1 添加hive服务
将hive服务都安装到hadoop-manager2上(列表所示)
并在所有机器上存放mysql驱动,存放位置
/opt/cloudera/parcels/CDH/lib/hive/lib
选择hadoop-manager1上的mysql
选择默认路径
1.2 impala
1.3 hive的配置
<property>
<name>yarn.nodemanager.aux-services</name>
<value>spark_shuffle,mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.spark_shuffle.class</name>
<value>org.apache.spark.network.yarn.YarnShuffleService</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
1.4 Hive的基本操作
1)在hive安装集群中输入一下命令,登入hive客户端
hive
2)打开hive.txt,并将建表命令拷贝在客户端执行
3)将hive_add_partition.sh拷贝到/opt/data-platform/bin
4)在 /opt/data-platform/sbin/jobs_day.sh 文件中添加如下命令
#添加partition
sh/opt/data-platform/bin/hive_add_partition.sh > /opt/data-platform/log/hive_add_partition.log2>&1
hive.txt
drop table if exists call_logs;
CREATE external TABLE call_logs (
accesscode string,
province string,
region string,
domain string,
frontid string,
callingnumber string,
oricallednumber string,
callednumber string,
starttime string,
answertime string,
keypresstime string,
endtime string,
keypressduration string,
keynumber string,
callduration string,
callingareanum string,
oricalledareanum string,
calledareanum string,
calltype string,
barringtype string,
trunkid string,
localcode string,
destcode string,
listtype string,
category string,
auditresult string,
auditstutas string,
recordfileid string,
recordpath string,
recordstarttime string,
recordendtime string,
ismonitoring string,
direction string,
answerendtiome string,
calllenth string,
notinterceptreason string,
ishide string,
templet_no string,
policyid string
) partitioned BY(stat_date STRING)
ROW format delimited FIELDS TERMINATED BY '|';
hive_add_partition.sh
#!/bin/bash
source /etc/profile
if [ $# -eq 1 ]; then
target_day=$1
else
target_day=`date-d "-0 days" +"%Y%m%d"`
fi
hive -e "alter table call_logs addpartition (stat_date ='${target_day}') location'/user/callLog/stat_date=${target_day}'"