hive建表基础知识

我家大宝最可爱

已于 2023-02-19 20:23:37 修改

阅读量160

点赞数

分类专栏：大数据开发文章标签： hive 大数据数据库

于 2023-02-10 14:52:03 首次发布

本文链接：https://blog.csdn.net/he_wen_jie/article/details/128970039

版权

大数据开发专栏收录该内容

6 篇文章 0 订阅

订阅专栏

1. 查看数据库

show databases;

2. 创建数据库

create database db_name;

Hive的数据都是存储在HDFS上的，默认有一个根目录，在hive-site.xml中，由参数hive.metastore.warehouse.dir指定。默认值为/user/hive/warehouse.

3. 查看数据库信息

describe database db_name;

Hive中的数据库在HDFS上的存储路径为：
${hive.metastore.warehouse.dir}/databasename.db

4. 内部表

内部表通常存储在/user/hive/warehouse目录下，例如通过其他表新建的表

-- 切换数据库
use helloworld;
-- 如果存在的话就删掉
-- drop table if exists dim_hellowrod;
-- 创建hive表
create table  if not exists dim_helloworld(
    rowkey string
    ,hello string
    ,world bigint
)
partition by (ds string);
-- 如果原来分区不存在就创建新的分区
alter table dim_helloworld add if not exists partition (ds='2022-02-02');
-- 如果原来分区存在就用新数据覆盖掉
insert overwrite if not exists dim_helloworld partition by (pt='2022-02-02')
-- 将ods的数据同步到dim表
select * 
from ods_hellowrod
where ds = '2022-02-02'

5. 外部表

但是很多时候我们的数据通常存储在其他的路径，例如通过spark处理过的数据存储在hdfs的某个位置，这个位置通常不是hive表默认存储数据的位置，这个时候我们就可以创建外部表。

create table  if not exists dim_helloworld(
    rowkey string
    ,hello string
    ,world bigint
)
LOCATION 'hdfs://localhost:9000/user/spark/helloworld/part-00000';