对接hive数据源(Doris支持两种建表方式 , 分桶表和复合分区表/复合分区表先分区在分区内分桶)
分桶表测试案例:
建hive表:
drop table tmp_txzl_as2;
CREATE TABLE `tmp_txzl_as2`(
`signalid` string,
`day_id` string,
`cnts` int)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
'field.delim'=',',
'serialization.format'=',')
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://nameservice1/user/hive/test/tmp_txzl_as2'
TBLPROPERTIES (
'transient_lastDdlTime'='1611109065');
插入数据
insert into tmp_txzl_as2 values("189971997","20210707",1),
("189023234","20210707",1),
("180093111","20210707",1),
("133807928","20210707",1),
("133679691","20210707",1);
建Doris表:
drop table tmp_txzl_as;
CREATE TABLE `tmp_txzl_as` (
`signalid` varchar(32) default '',
`day_id` varchar(32) default '',
cnts int default '1'
) ENGINE=OLAP
DUPLICATE KEY(`signalid`)
COMMENT "OLAP"
DISTRIBUTED BY HASH(`day_id`) BUCKETS 10
PROPERTIES (
"replication_num" = "3",
"in_memory" = "false"
);
导数据
LOAD LABEL testlabel9
(
DATA INFILE("hdfs://dn1.hadoop:8020/user/hive/test/tmp_txzl_as2/test.txt")
INTO TABLE tmp_txzl_as
COLUMNS TERMINATED BY ','
(signalid,day_id,cnts)
SET
(signalid=signalid,day_id=day_id,cnts=cnts)
)
WITH BROKER 'broker1'
(
"username" = "hpp",
"password" = "m4x_1",
"dfs.nameservices" = "nameservice1",
"dfs.ha.namenodes.nameservice1" = "namenode41,namenode74",
"dfs.namenode.rpc-address.nameservice1.namenode41" = "dn1.hadoop:8020",
"dfs.namenode.rpc-address.nameservice1.namenode74" = "dn3.hadoop:8020",
"dfs.client.failover.proxy.provider" = "org.apache.hadoop.hdfs.server.nameno

最低0.47元/天 解锁文章
688

被折叠的 条评论
为什么被折叠?



