hive--基础语句

湫湫玺云台

已于 2024-01-26 01:01:06 修改

阅读量16

点赞数

分类专栏： Hadoop 文章标签： hive

于 2022-11-16 15:33:57 首次发布

本文链接：https://blog.csdn.net/qq_57620101/article/details/127880125

版权

Hadoop 专栏收录该内容

7 篇文章 0 订阅

订阅专栏

HSQL文件执行：

控制台执行：

hive -f sql_path;

eg: hive -f /path/to/file/xxx.hql;

hive shell执行：

source sql_path;

eg: source /path/to/file/test.sql;

一次使用命令：

hive -e 'sql语句'；

eg. $ hive -e "select * from mytable limit 3";

配合nohup使用：

nohup hive -f insert.sql >log.log

数据导入与导出：

数据导出：

hadoop fs -cp source_path target_path

insert (overwrite) path(local) directory path

loda data 语句：

loda data (local) inpath '文件路径' [overwrite] into table table_name;

在hdfs上创建数据文件目录：

hdfs dfs -mkdir -p /tmp/hive_test/data/student

上传 csv 文件到 hdfs 数据文件目录：

hdfs dfs -put /opt/data/student.csv /tmp/hive_test/data/student

DDL命令：

创建数据库：

create database;

create database if not exists userdb;

create schema uerdb;

create table test3 like test2;

ccreate table test2 as select name,addr from teat1;

create datebase database_name location '路径' 修改数据库的路径

删除数据库：

drop database;

drop database if exists userdb;

drop schema userdb;

drop database db_name cascade;(删除含有表的数据库)

修改数据库：

alter table table_name rename to another_name;

查看数据库：

show databases;

desc database (extended) db_name; 显示数据库的详细信息：

describe databases db_name; 查看数据库的描述及路径；

select current_database(); 查看正在使用哪个库：

show create database db_name; 查看创建库的详细语句：

show databases like 'h.*'; 查看该数据库中的所有表

show tables '*t*'; 模糊查询

show tables in db_name; 查看指定数据库中的所有表；

show partitions t1; 查看表有哪些分区；

describe (formatted) tab_name; 查看表的结构及路径；

使用数据库：

use default; 使用哪个数据库；

!ls; 查询当前linux文件夹下的文件；

dfs -ls /; 查询当前hdfs文件系统下/目录下的文件；

set hive.cli.print.current.db=true; 显示地展示当前使用的数据库；

set hive.cli.print.header=true; hive显示猎头；

新建表：

CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name

[(col_name data_type [COMMENT col_comment], ...)] ----指定表的名称和表的具体列信息。

[COMMENT table_comment] ---表的描述信息注释。

[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)] ---表的分区信息。

[CLUSTERED BY (col_name, col_name, ...)

[SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS] ---表的桶信息。

[ROW FORMAT row_format] ---表的数据分割信息，格式化信息。

[STORED AS file_format] ---表数据的存储序列化信息。(TEXTFILE\ORC\PARQUET\Avro\SEQUENCEFILE\RCFILE\parquet)

[LOCATION hdfs_path] ---数据存储的文件夹地址信息

tblproperties("skip.header.line.count"="n") --跳过前n行

若已建表，跳过前n行：
alter table table_name set tblproperties('skip.header.line.count'='n');

分区表和分桶表的区别：
（1）分区和分桶都是细化数据管理，但是分区表是手动添加区分，由于hive是读模式，所以对添加进分区的数据不做模式校验。分桶表的数据时按住某些分桶字段进行hash散列相乘的多个文件，所以数据的准确性高很多

（2）分区表是指按照数据表的某列或某些列分为多个区，区从形式上可以理解为文件夹

（3）分桶是相对分区进行更细粒度的划分。分桶将整个数据内容按照某列属性值的hash值进行区分，如要按照name属性分为3个桶，就是对name属性值的hash值对3取摸，按照取模结果对数据分桶。如取模结果为0的数据记录存放到一个文件，取模为1的数据存放到一个文件，取模为2的数据存放到一个文件

（1）从表现形式上：
分区表是一个目录，分桶表是文件

（2）从创建语句上：
分区表使用partitioned by 子句指定，以指定字段为伪列，需要指定字段类型
分桶表由clustered by 子句指定，指定字段为真实字段，需要指定桶的个数

（3）从数量上：
分区表的分区个数可以增长，分桶表一旦指定，不能再增长

（4）从作用上：
分区避免全表扫描，根据分区列查询指定目录提高查询速度
分桶保存分桶查询结果的分桶结构（数据已经按照分桶字段进行了hash散列）。
分桶表数据进行抽样和JOIN时可以提高MR程序效率

内部表和外部表的区别：

1、创建表的时候，内部表直接存储在默认的hdfs路径，不带数据;外部表需要自己指定路径,带数据

2、删除表的时候，内部表会将数据和元数据全部删除，外部表只删除元数据，数据不删除

创建表默认内部表

创建外部表：

create external table table_name_ext (字段名称数据类型)

row format delimited fields terminatd by ','

location "位置";

分区表：

        create external table student_ptn
        (id int, name string, sex string, age int,department string)
        partitioned by (city string)
        row format delimited fields terminated by ","
        location "/hive/student_ptn";

添加分区：

alter table student_ptn add partition(city="beijing");

分桶表：

        create external table student_bck
        (id int, name string, sex string, age int,department string)
        clustered by (id) sorted by (id asc, name desc) into 4 buckets
        row format delimited fields terminated by ","
         location "/hive/student_bck";

使用CTAS创建表：

create table student_ctas as select * from student where id < 95012;

复制表结构：

create table student_copy like student;

注意：

如果在table的前面没有加external关键字，那么复制出来的新表。无论如何都是内部表
如果在table的前面有加external关键字，那么复制出来的新表。无论如何都是外部表

查看表：

show tables;

查看非当前使用的数据库中有哪些表：

show tables in db_name;

查看数据库中以XXX开头的表：

show tables like 'student_*';

查看分区信息：

show partitions student_ptn;

修改表名：

alter table student rename to new_student;

增加字段：

alter table new_student add columns (score int);

修改字段定义：

alter table new_student change name new_name string;

！不支持删除字段

替换所有字段：

alter table new_student replace columns (id int, name string, address string);

添加分区：

alter table student_ptn add partition(city="chongqing2") partition(city="chongqing3") partition(city="chongqing4");

删除表：

drop table db_table;

清空表：

truncate table student_ptn;

显示所有函数：

show functions;

查看函数用法：

describe function substr;

内连接：

select sales.*,things.* from sales join things on (sales.id=things.id);

查看hive为某个人查询使用多少个mapreduce作业：

explain select sales.*,things.* from sales join things on (sales.id=things.id);

外连接：

select sales.*,things.* from sales left outer things on (sales.id=things.id);

right outer things on

full outer things on

湫湫玺云台

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
hive--基础语句

PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)] ---表的分区信息。[SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS] ---表的桶信息。[(col_name data_type [COMMENT col_comment], ...)] ----指定表的名称和表的具体列信息。
复制链接

扫一扫