在此文章中总结flum使用hive sink的操作过程,以及在调试过程中的一些填坑操作。
1. Hive表要求
1.1 hive建表要求
使用hive做flume sink时,对hive表的要求:
a:表必须是事物表
b:表必须是分区表,分桶表
c: 表stored as orc
即clustered分桶、transactional事务、orc存储格式。
1.2 flume中配置hive表列名要求
Flume配置的Hive 列名必须都为小写字母。否则,报下面错误:
Failed connecting to EndPoint
Caused by: org.apache.hive.hcatalog.streaming.InvalidColumn: Column 'URL' not found in table for input field 81
2. 操作过程
1. 拷贝hive jar包到flume lib路径下, 避免启动hive sink时失败
# pwd
/usr/local/src/hive-1.2.2/hcatalog/share/hcatalog
# ll
total 416
-rw-r--r-- 1 root root 257215 Apr 3 2017 hive-hcatalog-core-1.2.2.jar
-rw-r--r-- 1 root root 50192 Apr 3 2017 hive-hcatalog-pig-adapter-1.2.2.jar
-rw-r--r-- 1 root root 53562 Apr 3 2017 hive-hcatalog-server-extensions-1.2.2.jar
-rw-r--r-- 1 root root 56999 Apr 3 2017 hive-hcatalog-streaming-1.2.2.jar
#cp /usr/local/src/hive-1.2.2/hcatalog/share/hcatalog/*.jar /usr/local/src/flume-1.6.0/lib
#cp /usr/local/src/hive-1.2.2/lib/*.jar /usr/local/src/flume-1.6.0/lib
2. 修改hive配置,使其支持事务处理
# cat hive-site.xml
<property>
<name>hive.support.concurrency</name>
<value>true</value>
</property>
<property>
<name>hive.exec.dynamic.partition.mode</name>
<value>nonstrict</value>
</property>
<property>
<name>hive.txn.manager</name>
<value>org.apache.hadoop.hive.ql.lockmgr.DbTxnManager</value>
</property>
3. 初始化hive metastore数据库
#/usr/local/src/hive-1.2.2/bin
#./schematool -dbType mysql -initSchema
4. 重新启动mysql、hadoop、HiveMetaStore。
#重启mysql
#cd /usr/bin
#./mysqladmin -u root -p shutdown
Enter password: ******
# cd /usr/bin
./mysqld_safe &
#重启hadoop
[root@master sbin]# pwd
/usr/local/src/hadoop-2.6.5/sbin
[root@master sbin]# ./stop-all.sh
[root@master sbin]# ./start-all.sh
#启动HiveMetaStore
[root@master sbin]# cd /usr/local/src/hive-1.2.2/
[root@master hive-1.2.2]# cd bin
[root@master bin]# ./hive --service metastore &
ls: cannot access /usr/local/src/spark-2.0.2-bin-hadoop2.6/lib/spark-assembly-*.jar: No such file or directory
Starting Hive Metastore Server
[root@master ~]# ps -ef|grep HiveMetaStore
root 20755 14146 23 22:39 pts/0 00:00:11 /usr/local/src/jdk1.8.0_172/bin/java -Xmx256m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/usr/local/src/hadoop-2.6.5/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/local/src/hadoop-2.6.5 -Dhadoop.id.str=root -Dhadoop.root.logger=INFO,console -Djava.library.path=/usr/local/src/hadoop-2.6.5/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Xmx512m -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.util.RunJar /usr/local/src/hive-1.2.2/lib/hive-service-1.2.2.jar org.apache.hadoop.hive.metastore.HiveMetaStore
root 20922 16217 0 22:40 pts/1 00:00:00 grep --color=auto HiveMetaStore
5. Hive中创建表