Impala Sql语法

最新推荐文章于 2023-05-23 09:05:45 发布

小哇666

最新推荐文章于 2023-05-23 09:05:45 发布

阅读量1.7k

点赞数

分类专栏： # Impala 文章标签： impala

本文链接：https://blog.csdn.net/qq_41712271/article/details/108365905

版权

Impala 专栏收录该内容

11 篇文章 3 订阅

订阅专栏

支持数据类型
int,tinyint,smallint,bigint,boolean,char,varchar,string,float,double,real,decimal,timestamp
CDH5.5以上追加支持，但对复杂数据类型支持的并不好
array,map,struct,complex

impala不支持HiveSQL的以下特性
-可扩展机制，例如transform，自定义文件格式，自定义SerDes
-XML，json函数
-某些聚合函数，例如covar_pop，covar_samp，corr，percentile等
-impala仅支持：AVG，count，max，min，sum
-多distinct查询
-UDF，UDAF
-还有以下语句

   analyze table(impala:compute stats)
   describe column
   describe database
   export table
   import table
   show table extened
   show indexes
   show columns

由此也可以看出，impala就是hive的一个子集，补充一些hive的功能
很多语法和hive差不多

创建/删除数据库

create database db1;
use db1;
use default;
drop database db1;

创建表（内部）

create table tb1(
id int,
name string
);

create table tb2(
id int,
name string
)
row format delimited
fields terminated by '\0' #(impala-1.3.1以上支持\0)
stored as textfile; 

create table tb3 like tb1;

alter table tb3 set serdeproperties('serialization'=',' , 'field.delim'=','); #指定文本表字段分隔符

创建表（外部表）

create external table tb1(
id int,
name string
)
location '/user/data/tb1_data.txt'

create table tb2 like parquet_tab '/user/data/test1.dat'
partition(year int,month tinyint,day tinyint)
localtion '/user/data/tb2_data.txt'
stored as parquet;

插入数据

insert into tb1 values(1,'similarFish');
insert (overwrite) into tb3 select * from tb2;
load data local inpath '/usr/data/test.txt' into table tb1;

视图

create view v1 as select count(id) as total from tb1; #创建
select * from v1; #查询
describe formatted v1; #查看视图定义

注：不能向impala视图进行插入操作，insert表的数据源可以来自视图

数据文件处理
加载数据
insert：插入数据时每条数据差生一个数据文件，会导致小文件非常多，hive要用时会生成很多map，效率十分慢，可以使用第三个方法来消除小文件，不推荐此方法。
load data：在进行批量插入时使用，这种方式比较合适。
来自中间表：从一个小文件较多的大表中读取文件并写入新表产生少量的数据文件，也可以用此种方式进行格式转化。

空值处理
impala将"\n"表示为NULL，结合sqoop使用时注意做相应的空字段过滤
也可以使用如下方法：
alter table tb1 set tblproperties("serialization.null.format"="null");