hive安装的安装和使用

最新推荐文章于 2024-06-01 08:34:47 发布

Infernal Affairs

最新推荐文章于 2024-06-01 08:34:47 发布

阅读量161

点赞数 1

文章标签： hive

本文链接：https://blog.csdn.net/Smple_BOY/article/details/126358061

版权

hive1.2底层使用mapreduce作为计算引擎
hive1.2安装方法
解压apache-hive-1.2.2-bin.tar.gz到/usr/local
添加hive-site.xml到/usr/local/apache-hive-1.2.2-bin/conf
里面的内容是
<configuration>
<property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:mysql://192.168.56.103:3306/hive</value>
        <description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
   <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
   <description>Driver class name for a JDBC metastore</description>
</property>
<property>
   <name>javax.jdo.option.ConnectionUserName</name>
   <value>root</value>
   <description>username to use against metastore database</description>
</property>
<property>
   <name>javax.jdo.option.ConnectionPassword</name>
   <value>az63091919</value>
   <description>password to use against metastore database</description>
</property>
<property>
    <name>hive.metastore.warehouse.dir</name>
    <value>/user/hive3/warehouse</value>
    <description>location of default database for the warehouse</description>
</property>

<property>

<name>hive.metastore.schema.verification</name>

<value>false</value>

Enforce metastore schema version consistency.

True: Verify that version information stored in metastore matches with one from Hive jars. Also disable automatic

schema migration attempt. Users are required to manully migrate schema after Hive upgrade which ensures

proper metastore schema migration. (Default)

False: Warn if the version information stored in metastore doesn't match with one from in Hive jars.

</description>

</property>


<property>
<name>tez.queue.name</name>
<value>hivequeue</value>
</property>

<property>
<name>mapreduce.job.queuename</name>
<value>hivequeue</value>
</property>
</configuration>

编辑hive-env.sh
export HADOOP_HOME=/usr/local/hadoop-2.8.5
export HIVE_CONF=/usr/local/apache-hive-1.2.2-bin/conf
export HIVE_AUX_JARS_PATH=/usr/local/apahce-hive-1.2.2-bin/conf

代表mysql连接地址，使用mysql驱动，mysql用户名，mysql密码
拷贝mysql的驱动jar包到/usr/local/apache-hive-1.2.2-bin/lib目录下

配置HADOOP_HOME到/etc/profile中去

启动hdfs start-dfs.sh 访问http://node2:50070确认hdfs正常工作
启动yarn start-yarn.sh 访问http://node1:8088确认yarn正常工作、
进入node启动yarn的backupmaster yarn-daemon.sh start resourcemanager

进入mysql mysql -uroot -paz63091919
drop database hive;
create database hive;
alter database hive character set latin1;

初始化hive在mysql中的元数据
bin/schematool -dbType mysql -initSchema

启动hive
bin/hive

在linux的当前用户目录中，编辑一个.hiverc文件，将参数写入其中
set hive.cli.print.header=true; //显示表头
set hive.cli.print.current.db=true; //显示当前正在使用的库名

--------------------------------------------

-------------------------------------------

Hive Metastore原理及配置
有三种模式

内嵌模式：hive服务和metastore服务运行在同一个进程中，derby服务也运行在该进程中.内嵌模式使用的是内嵌的Derby数据库来存储元数据，也不需要额外起Metastore服务。

本地模式：不再使用内嵌的Derby作为元数据的存储介质，而是使用其他数据库比如MySQL来存储元数据。hive服务和metastore服务运行在同一个进程中，mysql是单独的进程，可以同一台机器，也可以在远程机器上。这种方式是一个多用户的模式，运行多个用户client连接到一个数据库中。这种方式一般作为公司内部同时使用Hive。每一个用户必须要有对MySQL的访问权利，即每一个客户端使用者需要知道MySQL的用户名和密码才行。
hive-site.xml中添加
<property>
<name>hive.metastore.local</name>
<value>true</value>
</property>

远程模式
Hive服务和metastore在不同的进程内，可能是不同的机器，该模式需要将hive.metastore.local设置为false，将hive.metastore.uris设置为metastore服务器URL，

远程元存储需要单独起metastore服务，然后每个客户端都在配置文件里配置连接到该metastore服务。将metadata作为一个单独的服务进行启动。各种客户端通过beeline来连接，连接之前无需知道数据库的密码。

仅连接远程的mysql并不能称之为“远程模式”，是否远程指的是metastore和hive服务是否在同一进程内.

<property>
<name>hive.metastore.local</name>
<value>false</value>
</property>

<property>
<name>hive.metastore.uris</name>
<value>thrift://hadoop2:9083</value>
</property>
启动metastore服务
nohup bin/hive --service metastore &
启动hiveserver服务
nohup bin/hive --service hiveserver2 &

-----------------------------------------

hive启动指定执行的yarn队列
hive --hiveconf mapreduce.job.queuename=queue

或者在hive启动后使用语句也能改变
hive> set mapreduce.job.queuename=queue;

-------------------------------------

把hive启动为一个服务
在后台启动hive服务
nohup hiveserver2 1>/dev/null 2>&1 &
默认监听10000端口

启动hive客户端beeline
然后输入!connect jdbc:hive2://node4:10000
输入hdfs用户 root
密码直接回车

bin/beeline -u jdbc:hive2://node4:10000 -n root
----------------------------------------------------------------------------------------------------------------

脚本化运行hive
hive -e "select * from user"
vi t_order_etl.sh

#!/bin/bash
hive -e "select * from db_order"
hive -e "select * from t_user"
hql="create table demo as select * from db_order"
hive -e "$hql"

或者把sql写在hql文件里，用hive -f xxx.hql执行文件
vi xxx.hql
select * from db_order;
select * from t_user;

----------------------------------------------------------------------------------------------------------------

show databases; //显示数据库
create database big24; //创建数据库
这时hdfs目录下会产生/user/hive/warehouse/big24.db这个目录

use big24; //使用big24数据库

create table t_big24(id int,name string,age int,sex string); //创建表
这时hdfs目录下会产生 /user/hive/warehouse/big24.db/t_big24这个目录

linux下输入hive默认分隔符^a '\001' 在vi编辑器下先按Ctrl+v然后Ctrl+a

---------------------------------------------------------------------------------------------------------------

hive建表语句
create table t_order(id string,create_time string,amount float,uid string)
row format delimited
fields terminated by ',';
指定数据文件的字段分隔符为逗号,

删除表
drop table t_order

--------------------------------------------------------------------------------------------------------

hive内部表和外部表

外部表：hdfs的表目录由用户自己指定
create external table t_access(ip string,url string,access_time string)
row format delimited
fields terminated by ','
location '/access/log';
指定的目录
如果删除外部表，外部表的目录和数据文件不会被删除

------------------------------

内部表：hive自己在hdfs中维护数据目录
如果删除内部表，数据目录和数据文件一起被删除

--------------------------------------------------------------------------------------------------------

分区表
/user/hive/warehouse/t_pv_log/day=2017-09-16/
/day=2017-09-17/
如果想查2017-09-16的数据，就去对应目录查
如果想查全部的数据就对应t_pv_log目录

create table t_pv_log(ip string,url string,commit_time string)
partitioned by(day string)
row format delimited
fields terminated by ',';

分区标志字段在表查询时也会返回，分区字段不能是表定义中已存在的字段

从本地路径导入数据，注意要是hiveserver运行的机器
load data local inpath '/root/hivetest/pv.log' into table t_pv_log partition(day='20170915')
如果文件不在本地而在hdfs中
load data inpath '/access.log.2017-08-06.log' into table t_access partition(dt='20170806'); 但是hdfs中的文件会移动而不是复制