Impala必要知识总结

目录

impala的架构:

impala的内外部操作:

相对于Hive,impala的不同:

impala的查询:

impala的压缩和存储:


impala的架构:

Impala自身包含三个模块:Impalad、Statestore和Catalog,除此之外它还依赖Hive Metastore和HDFS

impala的内外部操作:

impala分为外部操作和内部操作:

外部shell:

impala-shell -i hadoop106  指定连接运行impalad守护进程的主机

impala-shell -q 'select * from stu;' -o output.txt   外部shell查询,并将查询结构输出到本地

impala-shell -f sel.sql -o output.txt   添加查询脚本。

impala-shell -r 建立连接后刷新impala元数据

登录hive在该数据库下创建表:并查看一下表有几个:
hive> create table student(id int,name string)
    > row format delimited
    > fields terminated by '\t';
hive> show tables;
stu
student

在impala中查询当前的数据库有多少表:
[hadoop106:21000] > show tables;
Query: show tables
+------+
| name |
+------+
| stu  |
+------+

然后再退出,使用重新登录:
[root@hadoop105 ~]# impala-shell -i hadoop106
[hadoop106:21000] > show tables;
Query: show tables
+------+
| name |
+------+
| stu  |
+------+
发现任然没有该表的信息:
[hadoop106:21000] > quit;

接下来使用刷新数据的方式登录
[root@hadoop105 ~]# impala-shell -i hadoop106 -r

[hadoop106:21000] > show tables;
Query: show tables
+---------+
| name    |
+---------+
| stu     |
| student |
+---------+
发现能够查询到表的信息。

impala-shell -q 'select * from stu' -B --output_delimiter="\t" -o output.txt  
impala 的去格式化输出,这种方式没有表格线,各个字段之间使用制表符分开

---------------------------------------------------------------------------------------
外部shell:

shell hadoop fs -ls /   impala中查询一下HDFS上的文件目录

shell ls      impala中查询linux下的文件目录

刷新指定表的元数据信息:
现在hive中创建表student(省略),然后在导入数据,然后查看表中的数据
hive> load data local inpath '/opt/module/datas/stu.txt' into table student;

hive> select * from student;
1001	zhangfei
1002	guanyu
1003	liubei

接下来进入impala来查询表中是否有数据
[hadoop106:21000] > select * from student;
Query: select * from student
没有数据,接下来我们对表刷新,然后再尝试查询
[hadoop106:21000] > refresh student;

[hadoop106:21000] > select * from student;

+------+----------+
| id   | name     |
+------+----------+
| 1001 | zhangfei |
| 1002 | guanyu   |
| 1003 | liubei   |
+------+----------+

相对于Hive,impala的不同:

1,impala在创建数据库时,不支持with dbproperties
[hadoop106:21000] > create database db_test with dbproperties('name'='iseayou');
Query: create database db_test with dbproperties('name'='iseayou')
ERROR: AnalysisException: Syntax error in line 1:
create database db_test with dbproperties('name'='iseayou')
                        ^
Encountered: WITH
Expected: COMMENT, LOCATION
CAUSED BY: Exception: Syntax error

2,impala不支持alter database的语法

3,在使用当前的数据库时,无法删除当前的数据库,需要离开当前库,然后在删除该库;
[hadoop106:21000] > create database db_test;
Query: create database db_test

[hadoop106:21000] > use db_test;
Query: use db_test
[hadoop106:21000] > create table stu(id int,name string);
Query: create table stu(id int,name string)
[hadoop106:21000] > show tables;
Query: show tables
+------+
| name |
+------+
| stu  |
+------+

[hadoop106:21000] > drop database db_test;
Query: drop database db_test
ERROR: AnalysisException: Cannot drop current default database: db_test

[hadoop106:21000] > drop database db_test cascade;
Query: drop database db_test cascade
ERROR: AnalysisException: Cannot drop current default database: db_test

[hadoop106:21000] > use default;
Query: use default
[hadoop106:21000] > drop database db_test cascade;
Query: drop database db_test cascade
[hadoop106:21000] > show databases;
Query: show databases
+------------------+----------------------------------------------+
| name             | comment                                      |
+------------------+----------------------------------------------+
| _impala_builtins | System database for Impala builtin functions |
| db_impala        |                                              |
| default          | Default Hive database                        |
+------------------+----------------------------------------------+
Fetched 3 row(s) in 0.01s

4,impala支持分区表,但是如果没有分区,load data导入数据的时候,不能自动创建分区,需要先修改
表,然后在导入数据;演示如下:
先把数据上传到HDFS上,然后在表中插入分区,然后在导入数据:
create table stu_par(id int,name string)
partitioned by(month string)
row format delimited
fields terminated by '\t';

shell hadoop fs -put /opt/module/datas/stu.txt
alter table stu_par add partition(month='12');
load data inpath '/stu' into table stu_par partition(month='12');
[hadoop106:21000] > select * from stu_par where month = '12';
+------+----------+-------+
| id   | name     | month |
+------+----------+-------+
| 1001 | zhangfei | 12    |
| 1002 | guanyu   | 12    |
| 1003 | liubei   | 12    |
+------+----------+-------+
另外impala支持添加多个分区,使用空格隔开;但是不支持删除多个分区
查询分区的语法是:
 show partitions stu_par;


5,impala导入数据的时候,不支持本地导入。即不支持load data local 
但是支持insert select的方式  和 load data inpath的方式;

6,implant导出数据一般使用impala -o 不支持 insert overwrite directory的方式
impala-shell -i hadoop106 -q 'select * from st;' -B --output_delimiter=',' -o output.txt;

[root@hadoop105 ~]# cat output.txt
1001,zhangfei
1004,xiaoqiao
1002,guanyu
1005,daqiao
1003,liubei

impala的查询:

1,Impala不支持CLUSTER BY, DISTRIBUTE BY, SORT BY
2,Impala中不支持分桶表
本质的原因是impala不使用MapReduce

3,Impala不支持COLLECT_SET(col)和explode(col)函数
本质的原因是impala不完全支持数组

4,Impala支持开窗函数

t1:
select *,row_number() over(partition by subject order by score desc) row_number
from score;
 
select * from
(
select *,row_number() over(partition by subject order by score desc) row_number
from score)t1
where t1.row_number <= 3;
 
+----------+-------------+-----------+----------------+--+
| t1.name  | t1.subject  | t1.score  | t1.row_number  |
+----------+-------------+-----------+----------------+--+
| 悟空       | 数学          | 95        | 1              |
| 张飞       | 数学          | 86        | 2              |
| 小乔       | 数学          | 85        | 3              |
| 张飞       | 英语          | 84        | 1              |
| 八戒       | 英语          | 84        | 2              |
| 小乔       | 英语          | 78        | 3              |
| 八戒       | 语文          | 94        | 1              |
| 悟空       | 语文          | 87        | 2              |
| 小乔       | 语文          | 65        | 3              |
+----------+-------------+-----------+----------------+--+
!

impala的压缩和存储:

1,parquet类型的数据支持创建,插入,和查询:
[hadoop106:21000] > create table stu_parquet(id int,name string)
row format delimited
fields terminated by '\t'
stored as parquet;

[hadoop106:21000] > show tables;
Query: show tables
+-------------+
| name        |
+-------------+
| stu         |
| stu_par     |
| stu_parquet |
| student     |
+-------------+
Fetched 4 row(s) in 0.01s
[hadoop106:21000] > insert into table stu_parquet select * from stu;

[hadoop106:21000] > select * from stu_parquet;

+------+----------+
| id   | name     |
+------+----------+
| 1005 | daqiao   |
| 1004 | xiaoqiao |
| 1001 | zhangfei |
| 1002 | guanyu   |
| 1003 | liubei   |
+------+----------+

2,sequencefile类型的数据支持创建,但是默认是不支持插入,需要设置一下才能insert;
[hadoop106:21000] > create table stu_sq(id int, name string)
                  > row format delimited 
                  > fields terminated by '\t'
                  > stored as sequenceFile;
Query: create table stu_sq(id int, name string)
row format delimited
fields terminated by '\t'
stored as sequenceFile

Fetched 0 row(s) in 0.14s
[hadoop106:21000] > show tables;
Query: show tables
+-------------+
| name        |
+-------------+
| stu         |
| stu_par     |
| stu_parquet |
| stu_sq      |
| student     |
+-------------+
[hadoop106:21000] > insert into table stu_sq values(1001,'zhangsan');

WARNINGS: Writing to table format SEQUENCE_FILE is not supported. Use query option ALLOW_UNSUPPORTED_FORMATS to override.


[hadoop106:21000] > set ALLOW_UNSUPPORTED_FORMATS=true;
ALLOW_UNSUPPORTED_FORMATS set to true
[hadoop106:21000] > insert into table stu_sq values(1001,'zhangsan');

[hadoop106:21000] > select * from stu_sq;
+------+----------+
| id   | name     |
+------+----------+
| 1001 | zhangsan |
+------+----------+

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值