Hive简介
- Facebook开发构建与Hadoop之上的数据仓库,可以结构化数据文件映射为一张数据库表
- 提供完整的SQL查询功能,将SQL语句转换为MapReduce任务进行
- 适合静态数据分析,数据变化不频繁
- 不支持记录级别的更新插入或者删除,不支持事务
- 提供ETL数据提取转化加载,可以维护海量数据,对数据挖掘后形成任意报告表单
- Hive SQL 和 Mysql 更接近
- 分区表 按照分区条件在表目录下将数据文件分类归档,目录名是partition条件
Hive安装
1.解压安装包
2.设置环境变量
3.修改hive-site.xml配置文件
4.将mysql连接包放入hive安装路径下的lib包 /usr/local/hive/lib
5.启动hive
mysql安装
1.更新软件源
2.安装mysql
3.修改编码 character_set_server
4.给hive用户赋权限(hive 能接入数据库)
hive内部表和外部表的区别
Hive安装
1.解压安装包
下载安装包可能会用到的镜像
http://mirror.bit.edu.cn/apache/hive/
http://mirrors.hust.edu.cn/apache/hive/
http://mirrors.shu.edu.cn/apache/hive/
http://mirrors.tuna.tsinghua.edu.cn/apache/hive/
tar -zxvf apache-hive-1.2.2-bin.tar.gz -C /usr/local/
hadoop@dhjvirtualmachine:/usr/local$ sudo mv apache-hive-1.2.2-bin/ hive
sudo chown -R hadoop:hadoop hive
2.设置环境变量
编辑~/.bashrc文件
vim ~/.bashrc
export HIVE_HOME=/usr/local/hive
export PATH=$PATH:$HIVE_HOME/bin
vim ~/.bashrc
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
source ~/.bashrc
3.修改hive-site.xml配置文件
cd /usr/local/hive/conf
vi hive-site.xml
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
<description>username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hive</value>
<description>password to use against metastore database</description>
</property>
</configuration>
5.启动hive
先启动Hadoop和mysql
启动Hadoop
start-all.sh
启动hive
hive
mysql安装
- 更改apt-get 源
sudo apt-get update #更新软件源
关闭Ubuntu的软件和更新中的所有源
删除/var/lib/apt/lists 所有文件
```
rm -rf /var/lib/apt/lists/*
```
修改/etc/apt/sources.list
(注意版本匹配:16.1->yakkety;16.04->xenial;15.10->willy;14.04->trusty;12.04->precise)
```
deb http://mirrors.163.com/ubuntu/ trusty main restricted universe multiverse
deb http://mirrors.163.com/ubuntu/ trusty-security main restricted universe multiverse
deb http://mirrors.163.com/ubuntu/ trusty-updates main restricted universe multiverse
deb http://mirrors.163.com/ubuntu/ trusty-proposed main restricted universe multiverse
deb http://mirrors.163.com/ubuntu/ trusty-backports main restricted universe multiverse
deb-src http://mirrors.163.com/ubuntu/ trusty main restricted universe multiverse
deb-src http://mirrors.163.com/ubuntu/ trusty-security main restricted universe multiverse
deb-src http://mirrors.163.com/ubuntu/ trusty-updates main restricted universe multiverse
deb-src http://mirrors.163.com/ubuntu/ trusty-proposed main restricted universe multiverse
deb-src http://mirrors.163.com/ubuntu/ trusty-backports main restricted universe multiverse
```
2.安装mysql
sudo apt-get install mysql-server #安装mysql
3.修改编码 character_set_server
启动mysql
service mysql start
检查
sudo netstat -tap | grep mysql
进入mysql shell界面
mysql -u root -p
show variables like "char%"
character_set_server默认为latin1
修改配置文件
vi /etc/mysql/mysql.conf.d/mysqld.cnf
在[mysqld]下添加一行character_set_server=utf8
重启mysql
service mysql restart
进入mysql shell界面
mysql -u root -p
检查character_set_server变量
show variables like "char%"
4.给hive用户赋权限(hive 能接入数据库)
进入mysql shell界面
mysql -u root -p
创建于hive对接数据库
create database hive;
给hive用户赋权限
grant all on *.* to hive@localhost identified by 'hive';
刷新mysq系统权限关系表
flush privileges;
hive内部表和外部表的区别
1.创建内部表
create table if not exists hive.usr(
name string comment 'username',
pwd string comment 'password',
address struct<street:string,city:string,state:string,zip:int>,
comment 'home address',
identify map<int,tinyint> comment 'number,sex')
comment 'description of the table'
tblproperties('creator'='me','time'='2016.1.1');
2.创建外部表
create external table if not exists usr2(
name string,
pwd string,
address struct<street:string,city:string,state:string,zip:int>,
identify map<int,tinyint>)
row format delimited fields terminated by ','
location '/usr/local/hive/warehouse/hive.db/usr';
区别
当执行加载数据到表后
load data local inpath '/home/duhj/data/duhj.txt' into table usr;
duhj.text会被加载到HDFS分布式文件系统中,hive将hdfs数据放入表中,表的数据放在配置文指定的路径(hive-site.xml配置文件的hive.metastore.warehouse.dir属性)
当删除这张内部表的时候,会将表的元数据清除,而外部表在创建的时候location 指定了表数据存放位置(表数据不由hive管理),并不会删除该位置下的数据。
数据管理:内部表由hive管理,删表 hdf上数据也删
外部表由Hadoop管理,删表 hdf 数据还在
对表修改: 内部表 同步元数据
外部表 需要修复 MSCK REPAIR TABLE tablename;
数据导入导出 Sqoop
hive 导入mysql
bin/sqoop export
--connect jdbc:mysql://localhost:3306/database
--username root
--password root
--table user_log
--export_dir '/user/hive/warehouse/somedb.db/inner_user_log'
--fields-interminated-by ','
注:
string -> vchar
mysql导出hive
bin/sqoop export
--connect jdbc:mysql://localhost:3306/database
--username root
--password root
--query 'select id,name,age from user_log'
--target-dir /user/hive/warehouse/user_log
--delete-target-dir
--num-mappers
--compress
--compression-code org.apache.hadoop.io.compress.snappycodec
--fields-termied-by '\t'
hive 再将/user/hive/warehouse/user_log 加载到表中