hive自学习

最新推荐文章于 2024-06-09 17:48:44 发布

zhizubaby0519

最新推荐文章于 2024-06-09 17:48:44 发布

阅读量406

点赞数 2

本文链接：https://blog.csdn.net/zhizubaby0519/article/details/88541038

版权

**************************************************************************

Hive概述

Hive最早的目的是分析处理海量日志

关系型数据库如：

Mysql，oracle，sql server

非关系型数据库：

Hive facebook做的hive

Hive facebook mr

Hive query language

Hibernate：自动生成自己的sql语句

Schema：我们对表及字段的一个定义（字段类型）

写时模式：在写的时候进行模式检查

数据类型：更好的利用内存分配空间

建立索引，为了更快的查询，数值类型的建立

读时模式：在读的时候进行模式检查

Hive client hive server

Hive不支持行级别的update和delete

从哪创建在到那个目录下：默认存储 hive metstore-db/ 自动保存在这

**************************************************************************

Hive架构设计原理

mkdir hive

cd hive

hive //进到cli操作端

exit; //退出操作环境

dfs -ls /apps/hive/warehouse/guodandan.db/student; //查看集群上的命令

Hdfs 首先启动jvm与集群连接

Hive 进入后，已经启动jvm虚拟机

Hive都用linux解释器

常用操作如下：

show databases; //集群上会有很多所有人的

create database guodandan; //创建自己的数据库

show databases like ‘guodan*’; //查看有没有

use guodandan; //切换数据库

set hive.cli.print.current.db=true; //设置显示当前数据库名称

vi ~/.hiverc

set hive.cli.print.current.db=true;

set mapred.job.name=hive-cli-01;

show tables; //查看数据库中的所有表

create table userinfo(id int,username string); //创建表

Show tables; //查看表

Insert into userinfo values(1,’one’); //插进去一条数据，启动job

Select * from userinfo; //查询表中数据

Drop table userinfo; //删除一个表

MR_1 MR_2 Tez Spark

Yarn Mesos Local

HDFS

**************************************************************************

Hive基础应用

Hive 数据模型：

数据库
表	表	表			表
	分区	分区			分桶	分桶	分桶	分桶
		分桶	分桶	分桶

数据类型：数值型tinyint,smallint,int,integer,bigint,float,double,decimal,日期timestamp,date,字符串string,varchar,char,布尔类型boolean,字节数组binary，复杂数据类型struct,map,array

数据操作分类：

DDL ：建表，删除表，修改表结构，创建删除视图，显示命令

DML ：数据插入

DQL ：数据查询

DDL：

v 元数据:描述数据的数据

v 表分类：主要分内表和外表

u 内表（管理表，托管表）：元数据和数据本身均被 hive 管理。删除表则全部删除。

u 外表（external）：元数据被 hive 管理，数据本身存储在 hdfs,不受 hive 管理。

删除表则只删除元数据，数据本身不变。

Select 1;

Select 1.0;

Seelct ‘abc’;

Select true;

Select array(1,2,3,4); //传参，转成数组[1,2,3,4] list,set 内置函数

Select map(‘a’,1,’b’,’def’); //全是参数用逗号

Select struct(1,2); //col1,col2,key值已经顶死了，只传value就可以

大写的时关键字，小写的是自己写的

default没有实体路径

Describe user_info;

Desc user_info;

Desc formatted user_info; //查看表字段

默认内表，外表加特殊关键字external，外表删除源数据，表内容不动。

Create table user_like_info like user_info;

Drop table user_info; //先删除元数据，后删除真正的数据，元数据存在metastore中.

create table user_info(

id int comment 'user id',

name string comment '用户姓名',

salary double comment '薪水'

)comment 'user base info';

insert into user_info

select * from(

select 3,'小强',80.8

union all

select 4,'小明',25.5

)t;

partitioned by //按什么分区

clustered by col_name into 9 buckets; //分桶

alter table user_info rename to user_info_2; //重命名

alter table user_info add colomns(province string,city string); //添加字段

alter table user_info replace colomns (id int,name string,salary double); //替换字段

alter table user_info change colomns id user_id int; //改类型

视图：不存储数据，只存储表结构

create view student2_view as select id,username from student2;

select * from student2_view;

show tables;

drop view student2_view;

****************************///

DML:

CREATE TABLE student(

id string comment '学号',

username string comment '姓名',

classid int comment '班级 id',

classname string comment '班级名称'

)

comment '学生信息主表'

partitioned by (come_date string comment '按入学年份分区')

ROW FORMAT DELIMITED

FIELDS TERMINATED BY '\t';

Cat student.txt

001 张一 01 计算机1班

002 张二 02 计算机2班

003 张三 03 计算机3班

004 张四 04 计算机4班

本地文件 load data是复制

集群上 load data是移动，为了节省时间

LOAD DATA LOCAL INPATH './student.txt' OVERWRITE INTO TABLE

student PARTITION (come_date=20170903);

Select * from student;

加载 HDFS 数据文件的脚本:

LOAD DATA INPATH

'/tmp/tianliangedu/input_student_info/student.txt' OVERWRITE

INTO TABLE student PARTITION (come_date=20170904);

Select * from student;

将查询结果插入到数据表中：
insert overwrite table student

partition(come_date='20170905')

select

id,username,classid,classname

from student

where come_date='20170904';

Select * from student;

insert into table student

select * from (

select 12,'天天',01,'计算机151'

union all

select 13,'张一',02,'计算机152'

union all

select 14,'芳华',03,'物联网151'

)t; //此处的into或者overwrite都可以

字段声明有复杂的类型，如structural，array，map

create table demo(

id int,

name array<string>,

score map<string,double>,

addr struct<province:string,city:string>

)

row format delimited

fields terminated by '\t'

collection items terminated by ','

map keys terminated by ':';

\001 默认字段分隔符

\002 元素与元素之间的分隔符

\003 key，value对

Vi demo

1 James,'小明' 语文:98,数学:100,英语:96 河北省,石家庄市

2 Lucy,'小红' 语文:98,数学:100,英语:96 河北省,邯郸市

Bob,'小玉' 语文:98,数学:100,英语:96 山东省,潍坊市

Load data local inpath ‘demo’ into table demo ;

select score['语文'] from demo;

select addr.province from demo;

create table demo1(

id int,

name array<string>,

score map<string,double>,

addr struct<col1:string,col2:string>

)

row format delimited

fields terminated by '\t'

collection items terminated by ','

map keys terminated by ':';

insert into table demo1

select * from (

select 7,array('James','小敏'),

map('语文',92.0,'数学',100),

struct('河北省','武安市')

)t;

创建外表：
create external table demo2(

id int,

name string,

salary double

)

row format delimited

fields terminated by '\t'

location '/tmp/guodandan/hive/external';

Vi demo2

1 Jane 900.2

2 Mary 345.7

3 Bob 789.4

4 Lily 678.8

Load data local inpath ‘demo2’ into table demo2 ;

Drop table demo2;

Dfs -ls /apps/hive/warehouse/guodandan.db/demo2 ;

//重新建表语句

create external table demo2(

id int,

name string,

salary double

)

row format delimited

fields terminated by '\t'

location '/tmp/guodandan/hive/external';

Select * from demo2; //就又有了数据

分区：避免多表扫描

create table log(

uid int,

url string

)

partitioned by (dt string)

row format delimited

fields terminated by '\t';

insert into table log

partition(dt=20190204)

select * from (

select 10010,'http://www.360.com'

union all

select 10086,'http://www.10086.com'

union all

select 10000,'http://www.189.com'

)t;

create table log1(

uid int,

url string

)

partitioned by (dt string,province string)

row format delimited

fields terminated by '\t';

insert into table log1

partition(dt='20190204',province='ShanDong')

select * from (

select 1,'http://abc.com'

union all

最低0.47元/天解锁文章

zhizubaby0519

关注

2
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
hive自学习

**************************************************************************Hive概述Hive最早的目的是分析处理海量日志关系型数据库如：Mysql，oracle，sql server非关系型数据库：Hive facebook做的hiveHive facebook mrHive query l...
复制链接

扫一扫