impala系列：服务API--DML

最新推荐文章于 2021-08-17 22:55:52 发布

张建闯

最新推荐文章于 2021-08-17 22:55:52 发布

阅读量231

点赞数

分类专栏：大数据文章标签：大数据 impala

本文链接：https://blog.csdn.net/qq_43578218/article/details/115932886

版权

大数据专栏收录该内容

10 篇文章 0 订阅

订阅专栏

文章目录

- INSERT Statement.
- LOAD DATA Statement.

INSERT Statement.

Impala支持插入到使用Impala CREATE TABLE语句创建的表和分区中，或者插入到通过hive创建的预定义表和分区中。
语法

[with_clause]
  INSERT [hint_clause] { INTO | OVERWRITE } [TABLE] table_name
  [(column_list)]
  [ PARTITION (partition_clause)]
{
    [hint_clause] select_statement
  | VALUES (value [, value ...]) [, (value [, value ...]) ...]
}

partition_clause ::= col_name [= constant] [, col_name [= constant] ...]

hint_clause ::=
  hint_with_dashes |
  hint_with_cstyle_delimiters |
  hint_with_brackets

hint_with_dashes ::= -- +SHUFFLE | -- +NOSHUFFLE -- +CLUSTERED

hint_with_cstyle_comments ::= /* +SHUFFLE */ | /* +NOSHUFFLE */ | /* +CLUSTERED */

hint_with_brackets ::= [SHUFFLE] | [NOSHUFFLE]
  (With this hint format, the square brackets are part of the syntax.)

INSERT INTO：将数据追加到表中，现有的数据保持原样，插入的数据放入一个或多个新数据文件中
INSERT OVERWRITE：替换表中已存在的数据，当前被覆盖的数据文件会立即删除，不会通过hdfs的垃圾箱机制。

INSERT语句当前不支持写入包含复杂类型（数组、结构和映射）的数据文件。目前，此类表必须使用Parquet文件格式，需要在Impala外部生成Parquet格式的数据文件，然后使用LOAD data或CREATE EXTERNAL TABLE将这些数据文件与表关联。

注意：插入时impala不会自动向上转型，需要手动转换，使用CAST(COS(column) AS columnType)
例如

CREATE DATABASE IF NOT EXISTS file_formats;
USE file_formats;

DROP TABLE IF EXISTS text_table;
CREATE TABLE text_table
( id INT, col_1 BOOLEAN, col_2 DOUBLE, col_3 TIMESTAMP )
STORED AS TEXTFILE;

DROP TABLE IF EXISTS parquet_table;
CREATE TABLE parquet_table
( id INT, col_1 BOOLEAN, col_2 DOUBLE, col_3 TIMESTAMP )
STORED AS PARQUET;

insert语句会不断向表中追加数据，如果一个语句运行了两次，那么增加的数据也会变成两倍，比如下面的例子：

[localhost:21000] > insert into table text_table select * from default.tab1;
Inserted 5 rows in 0.41s

[localhost:21000] > insert into table text_table select * from default.tab1;
Inserted 5 rows in 0.46s

[localhost:21000] > select count(*) from text_table;
+----------+
| count(*) |
+----------+
| 10       |
+----------+
Returned 1 row(s) in 0.26s

如果使用insert overwrite语句，则会将之前的数据删除，比如下面的例子：

[localhost:21000] > insert into table parquet_table select * from default.tab1;
Inserted 5 rows in 0.35s

[localhost:21000] > insert overwrite table parquet_table select * from default.tab1 limit 3;
Inserted 3 rows in 0.43s
[localhost:21000] > select count(*) from parquet_table;
+----------+
| count(*) |
+----------+
| 3        |
+----------+
Returned 1 row(s) in 0.43s

注意：在insert语句中，使用任何排序语法，均会被忽略

使用带value的语句例子

[localhost:21000] > describe val_example;
Query: describe val_example
Query finished, fetching results ...
+-------+---------+---------+
| name  | type    | comment |
+-------+---------+---------+
| id    | int     |         |
| col_1 | boolean |         |
| col_2 | double  |         |
+-------+---------+---------+

[localhost:21000] > insert into val_example values (1,true,100.0);
Inserted 1 rows in 0.30s
[localhost:21000] > select * from val_example;
+----+-------+-------+
| id | col_1 | col_2 |
+----+-------+-------+
| 1  | true  | 100   |
+----+-------+-------+

[localhost:21000] > insert overwrite val_example values (10,false,pow(2,5)), (50,true,10/3);
Inserted 2 rows in 0.16s
[localhost:21000] > select * from val_example;
+----+-------+-------------------+
| id | col_1 | col_2             |
+----+-------+-------------------+
| 10 | false | 32                |
| 50 | true  | 3.333333333333333 |
+----+-------+-------------------+

LOAD DATA Statement.

语法

LOAD DATA INPATH 'hdfs_file_or_directory_path' [OVERWRITE] INTO TABLE tablename
  [PARTITION (partcol1=val1, partcol2=val2 ...)]

注意：

加载的数据文件会被移动到impala文件夹中，而不是复制
当加载的数据是一个目录时，impala会加载目录下所有的文件，请确保该目录中没有嵌套的子目录，如果有子目录，会失败。
不能使用通配符来指定要加载的文件。
加载指定路径为目录时，不会加载目录中的隐藏文件。
加载数据只能从HDFS中加载数据，不能从本地加载，因此在指定文件路径时，无需携带hdfs://
加载数据时只会做非常有限的检查，如果格式有问题，可能会在查询时抛出异常。

使用样例

random_strings.py 1000 | hdfs dfs -put - /user/doc_demo/thousand_strings.txt
random_strings.py 100 | hdfs dfs -put - /user/doc_demo/hundred_strings.txt
random_strings.py 10 | hdfs dfs -put - /user/doc_demo/ten_strings.txt

[localhost:21000] > create table t1 (s string);
[localhost:21000] > load data inpath '/user/doc_demo/thousand_strings.txt' into table t1;
Query finished, fetching results ...
+----------------------------------------------------------+
| summary                                                  |
+----------------------------------------------------------+
| Loaded 1 file(s). Total files in destination location: 1 |
+----------------------------------------------------------+
Returned 1 row(s) in 0.61s
[kilo2-202-961.cs1cloud.internal:21000] > select count(*) from t1;
Query finished, fetching results ...
+------+
| _c0  |
+------+
| 1000 |
+------+
Returned 1 row(s) in 0.67s

-- 再次加载时，会报数据文件不存在
[localhost:21000] > load data inpath '/user/doc_demo/thousand_strings.txt' into table t1;
ERROR: AnalysisException: INPATH location '/user/doc_demo/thousand_strings.txt' does not exist. 

-- 如果去查看表的文件，会发现名称和之前的一致
$ hdfs dfs -ls /user/hive/warehouse/load_data_testing.db/t1
Found 1 items
-rw-r--r--   1 doc_demo doc_demo      13926 2013-06-26 15:40 /user/hive/warehouse/load_data_testing.db/t1/thousand_strings.txt

-- 此时加载100行的数据文件，表中数据会增加100行
[localhost:21000] > load data inpath '/user/doc_demo/hundred_strings.txt' into table t1;
Query finished, fetching results ...
+----------------------------------------------------------+
| summary                                                  |
+----------------------------------------------------------+
| Loaded 1 file(s). Total files in destination location: 2 |
+----------------------------------------------------------+
Returned 1 row(s) in 0.24s
[localhost:21000] > select count(*) from t1;
Query finished, fetching results ...
+------+
| _c0  |
+------+
| 1100 |
+------+
Returned 1 row(s) in 0.55s

-- 使用overwrite关键词时，表中原来的数据会被覆盖
[localhost:21000] > load data inpath '/user/doc_demo/ten_strings.txt' overwrite into table t1;
Query finished, fetching results ...
+----------------------------------------------------------+
| summary                                                  |
+----------------------------------------------------------+
| Loaded 1 file(s). Total files in destination location: 1 |
+----------------------------------------------------------+
Returned 1 row(s) in 0.26s
[localhost:21000] > select count(*) from t1;
Query finished, fetching results ...
+-----+
| _c0 |
+-----+
| 10  |
+-----+
Returned 1 row(s) in 0.62s

张建闯

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
impala系列：服务API--DML

文章目录INSERT Statement.LOAD DATA Statement.INSERT Statement.Impala支持插入到使用Impala CREATE TABLE语句创建的表和分区中，或者插入到通过hive创建的预定义表和分区中。语法[with_clause] INSERT [hint_clause] { INTO | OVERWRITE } [TABLE] table_name [(column_list)] [ PARTITION (partition_clause
复制链接

扫一扫