DML
Loading files into tables Inserting data into Hive Tables from queries Writing data into the filesystem from queries Inserting values into tables from SQL Update Delete
Merge
1. loading files into tables
LOAD DATA [LOCAL] INPATH ‘filepath’ [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 …)]
LOAD DATA [LOCAL] INPATH ‘filepath’ [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 …)] [INPUTFORMAT ‘inputformat’ SERDE ‘serde’] (3.0 or later)
2. Inserting data into Hive Tables from queries
标准语法
INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 …) [IF NOT EXISTS]] select_statement1 FROM from_statement;
INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 …)] select_statement1 FROM from_statement;
Hive拓展语法 多insert语句
FROM from_statement
INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 …) [IF NOT EXISTS]] select_statement1
[INSERT OVERWRITE TABLE tablename2 [PARTITION … [IF NOT EXISTS]] select_statement2]
[INSERT INTO TABLE tablename2 [PARTITION …] select_statement2] …;
FROM from_statement
INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 …)] select_statement1
[INSERT INTO TABLE tablename2 [PARTITION …] select_statement2]
[INSERT OVERWRITE TABLE tablename2 [PARTITION … [IF NOT EXISTS]] select_statement2] …;
Hive拓展语法 插入动态分区
INSERT OVERWRITE TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] …) select_statement FROM from_statement;
INSERT INTO TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] …) select_statement FROM from_statement;
3. Writing data into the filesystem from queries
Standard syntax:
INSERT OVERWRITE [LOCAL] DIRECTORY directory1
[ROW FORMAT row_format] [STORED AS file_format] (Note: Only available starting with Hive 0.11.0)
SELECT … FROM …
Hive extension (multiple inserts):
FROM from_statement
INSERT OVERWRITE [LOCAL] DIRECTORY directory1 select_statement1
[INSERT OVERWRITE [LOCAL] DIRECTORY directory2 select_statement2] …
-
row_format
-
DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char]
[MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
[NULL DEFINED AS char] (Note: Only available starting with Hive 0.13)
4. Inserting values into tables from SQL
Standard Syntax:
INSERT INTO TABLE tablename [PARTITION (partcol1[=val1], partcol2[=val2] …)] VALUES values_row [, values_row …]
Where values_row is:
( value [, value …] )
where a value is either null or any valid SQL literal
5. Update
– Hive 0.14版本以后支持
UPDATE tablename SET column = value [, column = value …] [WHERE expression]
6. Delete
– Hive 0.14版本以后支持
DELETE FROM tablename [WHERE expression]
7. Merge
– Hive 2.2 版本以后支持
MERGE INTO AS T USING <source expression/table> AS S
ON
WHEN MATCHED [AND ] THEN UPDATE SET
WHEN MATCHED [AND ] THEN DELETE
WHEN NOT MATCHED [AND ] THEN INSERT VALUES
8. in exists
IN适合于外表大而内表小的情况;EXISTS适合于外表小而内表大的情况。
in /exists / left semi join 不会产生笛卡尔积 ! inner join可能会产生笛卡尔积!
参考资料:
Hive in exists 区别
9. json解析
一文学会Hive解析Json数组
常用函数
greatest和least函数,实现多列取最大、最小值
活动场景内如果发生多次关注行为,付费统计周期为(首次场景内关注时间,min(最后一次场景内关注后首次取关,首次场景内关注+60天))】
小众特殊场景使用函数
space: space(int n) 返回长度为n的空字符串。
– 可以用在数据放大多少倍上
select anchor_id,
date_add(from_unixtime(unix_timestamp(start_date, ‘yyyy/MM/dd’), ‘yyyy-MM-dd’), pos) period_date,
period/day
as server_period,
start_date,
trill_id,
end_date
from webcast.tmp_anchor_field_control_serv
lateral view
posexplode (
split (
space(
datediff(
from_unixtime(unix_timestamp(end_date, ‘yyyy/MM/dd’), ‘yyyy-MM-dd’),
from_unixtime(unix_timestamp(start_date, ‘yyyy/MM/dd’), ‘yyyy-MM-dd’)
)
),
‘’
)
) t as pos,
val
- repeat(string str, int n)。返回重复n次后的str字符串