第一种方法:做hive与mongodb映射
先将需要的三个mongo相关的jar包放到hive110/lib下,并授权为777.具体参照https://blog.csdn.net/alisa_Ge/article/details/116531789?spm=1001.2014.3001.5501
创建hive的外部表:
create external table test.mongodb_users(
user_id string,
locale string,
birthyear string,
gender string,
joinedAt string,
location string,
timezone string
)
stored by 'com.mongodb.hadoop.hive.MongoStorageHandler'
WITH SERDEPROPERTIES('mongo.columns.mapping'='{"id":"_id"}')
tblproperties('skip.header.line.count'='1','mongo.uri'='mongodb://192.168.21.200:27017/kafkamongo.users')
第二种方法:mongodb所在节点上,先导出csv格式的数据文件。再将此文件上传到hive所在节点的本地路径或者hdfs上。
mongoexport --host 192.168.21.200 --port 27017 --db kafkamongo --collection 'users' --type csv --fields user_id,locale,birthyear,gender,joinedAt,location,timezone --out /opt/users.csv