hive入门三

最新推荐文章于 2022-09-05 22:45:00 发布

奋斗的小乌龟

最新推荐文章于 2022-09-05 22:45:00 发布

阅读量459

点赞数

分类专栏： hadoop

本文链接：https://blog.csdn.net/guo20082200/article/details/82430280

版权

hadoop 专栏收录该内容

16 篇文章 0 订阅

订阅专栏

1. 建表语句：

CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name    -- (Note: TEMPORARY available in Hive 0.14.0 and later)
  [(col_name data_type [COMMENT col_comment], ... [constraint_specification])]
  [COMMENT table_comment]
  [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
  [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
  [SKEWED BY (col_name, col_name, ...)                  -- (Note: Available in Hive 0.10.0 and later)]
     ON ((col_value, col_value, ...), (col_value, col_value, ...), ...)
     [STORED AS DIRECTORIES]
  [
   [ROW FORMAT row_format] 
   [STORED AS file_format]
     | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)]  -- (Note: Available in Hive 0.6.0 and later)
  ]
  [LOCATION hdfs_path]
  [TBLPROPERTIES (property_name=property_value, ...)]   -- (Note: Available in Hive 0.6.0 and later)
  [AS select_statement];   -- (Note: Available in Hive 0.5.0 and later; not supported for external tables)

例：

– 创建内部表略去
– 创建外部表

create external table t_sz_ext(id  int,name string)
row format delimited fields terminated by '\t'
stored as textfile 
location '/class03';

查看表结构：

0: jdbc:hive2://localhost:10000> desc formatted t_sz_ext;
+-------------------------------+-------------------------------------------------------------+-----------------------+--+
|           col_name            |                          data_type                          |        comment        |
+-------------------------------+-------------------------------------------------------------+-----------------------+--+
| # col_name                    | data_type                                                   | comment               |
|                               | NULL                                                        | NULL                  |
| id                            | int                                                         |                       |
| name                          | string                                                      |                       |
|                               | NULL                                                        | NULL                  |
| # Detailed Table Information  | NULL                                                        | NULL                  |
| Database:                     | shizhan03                                                   | NULL                  |
| Owner:                        | hadoop                                                      | NULL                  |
| CreateTime:                   | Tue Sep 04 13:32:49 CST 2018                                | NULL                  |
| LastAccessTime:               | UNKNOWN                                                     | NULL                  |
| Protect Mode:                 | None                                                        | NULL                  |
| Retention:                    | 0                                                           | NULL                  |
| Location:                     | hdfs://langzi01:9000/class03                                | NULL                  |
| Table Type:                   | EXTERNAL_TABLE                                              | NULL                  |
| Table Parameters:             | NULL                                                        | NULL                  |
|                               | EXTERNAL                                                    | TRUE                  |
|                               | transient_lastDdlTime                                       | 1536039169            |
|                               | NULL                                                        | NULL                  |
| # Storage Information         | NULL                                                        | NULL                  |
| SerDe Library:                | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe          | NULL                  |
| InputFormat:                  | org.apache.hadoop.mapred.TextInputFormat                    | NULL                  |
| OutputFormat:                 | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat  | NULL                  |
| Compressed:                   | No                                                          | NULL                  |
| Num Buckets:                  | -1                                                          | NULL                  |
| Bucket Columns:               | []                                                          | NULL                  |
| Sort Columns:                 | []                                                          | NULL                  |
| Storage Desc Params:          | NULL                                                        | NULL                  |
|                               | field.delim                                                 | \t                    |
|                               | serialization.format                                        | \t                    |
+-------------------------------+-------------------------------------------------------------+-----------------------+--+
29 rows selected (0.167 seconds)

导入数据：

LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO
TABLE tablename [PATITION(protcol1=val1,partcol2=val2)...]

说明：
1. Load 操作知识单纯的复制/移动操作，将数据文件移动到Hive表对应的位置
2. filepath：
相对路径：例如 project/data1
绝对路径：例如 /user/hive/project/data1
示例：
load data local inpath ‘/home/hadoop/ext.txt’ into table t_sz_ext;

0: jdbc:hive2://localhost:10000> load data local inpath '/home/hadoop/ext.txt' into table t_sz_ext;
INFO  : Loading data to table shizhan03.t_sz_ext from file:/home/hadoop/ext.txt
INFO  : Table shizhan03.t_sz_ext stats: [numFiles=0, totalSize=0]
No rows affected (0.807 seconds)

/home/hadoop/ext.txt的内容如下：

1       zhangsan
2       lisi
3       wangwu
4       zhaoliu
5       zhouqi

查看数据

0: jdbc:hive2://localhost:10000> select * from t_sz_ext;
+--------------+----------------+--+
| t_sz_ext.id  | t_sz_ext.name  |
+--------------+----------------+--+
| 1            | zhangsan       |
| 2            | lisi           |
| 3            | wangwu         |
| 4            | zhaoliu        |
| 5            | zhouqi         |
+--------------+----------------+--+
5 rows selected (0.724 seconds)

外部表和内部表的区别：
- 外部表删除掉表之后，上传到hdfs的数据还存在
- 内部表删除掉表之后，上传到hdfs的数据也同时

查看hdfs的数据：使用dfs命令

0: jdbc:hive2://localhost:10000> dfs -ls /class03;
+--------------------------------------------------------------------------------+--+
|                                   DFS Output                                   |
+--------------------------------------------------------------------------------+--+
| Found 1 items                                                                  |
| -rwxr-xr-x   3 hadoop supergroup         46 2018-09-04 13:46 /class03/ext.txt  |
+--------------------------------------------------------------------------------+--+
2 rows selected (0.025 seconds)

查看内部表数据：

0: jdbc:hive2://localhost:10000> dfs -ls /user/hive/warehouse/shizhan03.db/t_sz01;
+-----------------------------------------------------------------------------------------------------------------+--+
|                                                   DFS Output                                                    |
+-----------------------------------------------------------------------------------------------------------------+--+
| Found 1 items                                                                                                   |
| -rw-r--r--   3 hadoop supergroup         46 2018-09-04 10:20 /user/hive/warehouse/shizhan03.db/t_sz01/data.txt  |
+-----------------------------------------------------------------------------------------------------------------+--+

创建分区表：

create table t_sz_part(id int,name string)
partitioned by (country string)
row format delimited
fields terminated by ',';

导入数据：

load data local inpath '/home/hadoop/data.txt' into table t_sz_part partition(country='China');
load data local inpath '/home/hadoop/data.txt.Japan' into table t_sz_part partition(country='Japan');

0: jdbc:hive2://localhost:10000> load data local inpath '/home/hadoop/data.txt' into table t_sz_part partition(country='China');
INFO  : Loading data to table shizhan03.t_sz_part partition (country=China) from file:/home/hadoop/data.txt
INFO  : Partition shizhan03.t_sz_part{country=China} stats: [numFiles=1, numRows=0, totalSize=46, rawDataSize=0]
No rows affected (1.548 seconds)

注意：
* 分区在hdfs里面就是在当前表下面创建了一个目录，这里目录的名字为：country=China,country为建表时指定的分区字段，China为导入数据时指定的分区名
* 每导入一个分区数据，hdfs就会多创建一个分区对应的目录

查看分区表数据：

0: jdbc:hive2://localhost:10000> select * from t_sz_part;
+---------------+-----------------+--------------------+--+
| t_sz_part.id  | t_sz_part.name  | t_sz_part.country  |
+---------------+-----------------+--------------------+--+
| 1             | zhangsan        | China              |
| 2             | lisi            | China              |
| 3             | wangwu          | China              |
| 4             | zhaoliu         | China              |
| 5             | zhouqi          | China              |
| 1             | anbei           | Japan              |
| 2             | xiaoquan        | Japan              |
| 3             | xiaoze          | Japan              |
| 4             | jingkong        | Japan              |
| 5             | songxia         | Japan              |
+---------------+-----------------+--------------------+--+
10 rows selected (0.25 seconds)

查看指定分区数据

0: jdbc:hive2://localhost:10000> select * from t_sz_part where country = 'China';
+---------------+-----------------+--------------------+--+
| t_sz_part.id  | t_sz_part.name  | t_sz_part.country  |
+---------------+-----------------+--------------------+--+
| 1             | zhangsan        | China              |
| 2             | lisi            | China              |
| 3             | wangwu          | China              |
| 4             | zhaoliu         | China              |
| 5             | zhouqi          | China              |
+---------------+-----------------+--------------------+--+
5 rows selected (0.435 seconds)

分组查询：

select count(1) from t_sz_part where country = 'China' group by (name='zhangsan');

0: jdbc:hive2://localhost:10000> select count(1) from t_sz_part where country = 'China' group by (name='zhangsan');
INFO  : Number of reduce tasks not specified. Estimated from input data size: 1
INFO  : In order to change the average load for a reducer (in bytes):
INFO  :   set hive.exec.reducers.bytes.per.reducer=<number>
INFO  : In order to limit the maximum number of reducers:
INFO  :   set hive.exec.reducers.max=<number>
INFO  : In order to set a constant number of reducers:
INFO  :   set mapreduce.job.reduces=<number>
INFO  : number of splits:1
INFO  : Submitting tokens for job: job_local1969129417_0001
INFO  : The url to track the job: http://localhost:8080/
INFO  : Job running in-process (local Hadoop)
INFO  : 2018-09-05 20:13:21,371 Stage-1 map = 100%,  reduce = 100%
INFO  : Ended Job = job_local1969129417_0001
+------+--+
| _c0  |
+------+--+
| 4    |
| 1    |
+------+--+
2 rows selected (2.411 seconds)

修改/增加分区：

alter table table_name add[if not exists] partition_spec [localtion 'localtion1']
partition_spec [localtion 'localtion2']...
partition_spec:
:partition (partition_col=partition_col_value,partition_col=partition_col_value,...)

alter table table_name drop partition_spec,partition_spec,...

示例：

0: jdbc:hive2://localhost:10000> alter table t_sz_part add partition (country='American');

查看分区：

0: jdbc:hive2://localhost:10000> show partitions t_sz_part;
+-------------------+--+
|     partition     |
+-------------------+--+
| country=American  |
| country=China     |
| country=Japan     |
+-------------------+--+
3 rows selected (0.175 seconds)

奋斗的小乌龟

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
hive入门三

1. 建表语句：CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name -- (Note: TEMPORARY available in Hive 0.14.0 and later) [(col_name data_type [COMMENT col_comment], ... [const...
复制链接

扫一扫