Hive
晓晓121
这个作者很懒,什么都没留下…
展开
-
Spark拉取es数据为parquet文件映射到Hive
package com.lzimport org.apache.spark.sql.SparkSessionimport org.elasticsearch.hadoop.cfg.ConfigurationOptionsimport scala.collection.Mapobject Es2Hive { def main(args: Array[String]): Unit = { val spark: SparkSession = SparkSession.builder()原创 2021-09-08 11:52:04 · 440 阅读 · 1 评论 -
Hive之向一个array中装载多个struct
CREATE external TABLE `mongodb_dingtalk.mongodb_test`(`companyName` string,`sources` array<struct<contact:string,contactJob:string,site:string,source:string,sourceHref:string>>)STORED BY 'com.mongodb.hadoop.hive.MongoStorageHandler'WITH SE原创 2021-09-06 11:44:07 · 295 阅读 · 0 评论 -
UDF中之正则报错 java.lang.StackOverflowError
报错如下:Exception in thread "main" java.lang.StackOverflowError at java.util.regex.Pattern$Loop.match(Pattern.java:4779) at java.util.regex.Pattern$GroupTail.match(Pattern.java:4731) at java.util.regex.Pattern$Curly.match0(Pattern.java:4286) at java.util原创 2021-08-23 10:43:03 · 355 阅读 · 0 评论 -
记录hive中sqoop-export 的 Error during export:
原因是数据中有中文,而数据库表编码不支持,在hive导入mysql时设置编码集。drop table companynameandsource;create table companynameandsource( companyname text, source text)DEFAULT CHARSET=utf8;原创 2021-08-12 11:58:09 · 155 阅读 · 0 评论 -
Hive通过函数对数据的拆分
记录一个数据拆分案例select id,userid,from_unixtime(cast(adddate/1000 as bigint),'yyyy-MM-dd') as adddate,from_unixtime(cast(updatedate/1000 as bigint),'yyyy-MM-dd') as updatedate,get_json_object(tag1,'$.id') as setting_id,-- get_json_object(tag1,'$.enable') as原创 2021-05-17 10:38:41 · 1046 阅读 · 0 评论