Hive操作

最新推荐文章于 2024-05-23 15:16:32 发布

Thor_Selen_Liu

最新推荐文章于 2024-05-23 15:16:32 发布

阅读量733

点赞数

分类专栏：大数据 hive 文章标签：大数据 hive

本文链接：https://blog.csdn.net/Thor_Selen_Liu/article/details/77532240

版权

大数据同时被 2 个专栏收录

6 篇文章 0 订阅

订阅专栏

hive

1 篇文章 0 订阅

订阅专栏

一、Hive配置

前奏：

可以一个虚拟机中安装mysql 和 hive；
hive 最多应用于大数据的分析，，，因：长时间不连接会自动断开；
如果mysql 和 hive不在同一台虚拟机中，需要开启远程访问；
hive使hdfs存储的数据有结构。
在hive中建立一个库，等于在hdfs中创建一个目录。
hbase是以hadoop格式存储只支持行级事务。
hive通过mapreduce进行检索作业可以支持表级事务，但不支持多表；
hbase不通过mapreduce进行检索，有自己的套检索方式。
hive通常用于离线数据处理，所有处理速度比较慢。

1.首先在虚拟机中安装MySQL

1)下载 
   
  下载地址：https://dev.mysql.com/downloads/repo/yum/

2)上传到虚拟机中，并进行命令安装步骤： (ps:在root用户下操作)

①rpm -Uvh mysql57-community-release-el7-11.noarch.rpm rpm:软件资源管理包
 
  ②yum repolist all | grep mysql 查看mysql版本
 ③yum -y install mysql-community-server 安装mysql 
 
④systemctl start mysqld   开启mysql
 ⑤grep 'temporary password' /var/log/mysqld.log 查看初识密码(ps:12位)
 mysql -uroot -p
 ⑥ALTER USER 'root'@'localhost' IDENTIFIED BY ’P@ssw0rd’; 修改密码：P@sswOrd
 ⑦mysql -uroot -pP@ssw0rd; 登录mysql

3)MySQL的其他操作：

systemctl status mysqld 查询mysql的状态
 systemctl stop mysqld  停止mysql
 rpm -qa | grep mysql   软件资源包管理器rpm
 yum -y remove mysql   删除mysql
2.下载和安装Hive，并进行配置和运行

1)下载hive 地址：apache.fayea.com/hive

①官网：hive.apache.org

②

③

④

2)上传并解压hive ，给解压文件改别名

解压：tar -zxvf apache-hive-2.3.0-bin.tar.gz

改别名：mv apache-hive-2.3.0-bin hive (ps:目的在于启动hive时，直接输入hive回车)

3)对hive进行文件配置

 
  ①cd hive 
 

 
  ②cd conf 
 

 
  ③cp hive-default.xml.template hive-site.xml   配置文件，但是要进行修改 
 

④ 修改后的hive-site.xml文件如下：

[hadoop@hadoop-node2 conf]$ cat hive-site.xml 
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!--
   Licensed to the Apache Software Foundation (ASF) under one or more
   contributor license agreements.  See the NOTICE file distributed with
   this work for additional information regarding copyright ownership.
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at
       http://www.apache.org/licenses/LICENSE-2.0
   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
--><configuration>
  <!-- WARNING!!! This file is auto generated for documentation purposes ONLY! -->
  <!-- WARNING!!! Any changes you make to this file will be ignored by Hive.   -->
  <!-- WARNING!!! You must make your changes in hive-site.xml instead.         -->
  <!-- Hive Execution Parameters -->
	<property>
		<name>hive.exec.local.scratchdir</name>
		<value>/home/hadoop/hive/tmp/hive</value>
		<description>Local scratch space for Hive jobs</description>
	</property>
	<property>
		<name>hive.downloaded.resources.dir</name>
		<value>/home/hadoop/hive/tmp/${hive.session.id}_resources</value>
		<description>Temporary local directory for added resources in the remote file system.</description>
	</property>
	<property>
		<name>hive.querylog.location</name>
		<value>/home/hadoop/hive/tmp/hive</value>
		<description>Location of Hive run time structured log file</description>
	</property>
	<property>
		<name>hive.server2.logging.operation.log.location</name>
		<value>/home/hadoop/hive/tmp/hive/operation_logs</value>
		<description>Top level directory where operation logs are stored if logging functionality is enabled</description>
	</property>	
	<property>
		<name>hive.exec.scratchdir</name>
		<value>/tmp/hive</value>
		<description>HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/<username> is created, with ${hive.scratch.dir.permission}.</description>
	</property>	
	<property>
		<name>hive.scratch.dir.permission</name>
		<value>700</value>
		<description>The permission for the user specific scratch directories that get created.</description>
	</property>
	<property>
		<name>javax.jdo.option.ConnectionPassword</name>
		<value>P@ssw0rd</value>
		<description>password to use against metastore database</description>
	</property>
	<property>
		<name>javax.jdo.option.ConnectionURL</name>
		<value>jdbc:mysql://localhost:3306/hive_metadata?createDatabaseIfNotExist=true&characterEncoding=utf8</value>
		<description>
		  JDBC connect string for a JDBC metastore.
		  To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
		  For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
		</description>
	</property>
	<property>
		<name>javax.jdo.option.ConnectionDriverName</name>
		<value>com.mysql.jdbc.Driver</value>
		<description>Driver class name for a JDBC metastore</description>
	</property>
	<property>
		<name>javax.jdo.option.ConnectionUserName</name>
		<value>root</value>
		<description>Username to use against metastore database</description>
	</property>
	<property>
		<name>datanucleus.schema.autoCreateAll</name>
		<value>true</value>
		<description>Auto creates necessary schema on a startup if one doesn't exist. Set this to false, after creating it once.To enable auto create also set hive.metastore.schema.verification=false. Auto creation is not recommended for production use cases, run schematool command instead.</description>
	</property>
	<property>
		<name>hive.metastore.schema.verification</name>
		<value>false</value>
		<description>
		  Enforce metastore schema version consistency.
		  True: Verify that version information stored in is compatible with one from Hive jars.  Also disable automatic schema migration attempt. Users are required to manually migrate schema after Hive upgrade which ensures proper metastore schema migration. (Default) False: Warn if the version information stored in metastore doesn't match with one from in Hive jars.
		</description>
	</property>
</configuration>

⑤创建Hive的临时目录

 
  如果临时目录没有创建，默认数据保存位置在根目录下的tmp中： 
 

 
  cd /tmp ---> 便可找到运行的数据 
 

⑥Hive配置详情步骤：

https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration

⑦如果出现以下错误：

 
  解决方案： 
 

 
  在apache-hive-2.3.0-bin 目录中的lib 目录上传一个mysql驱动包即可！ 
 

4)启动运行

①启动Hive的元数据

hive --service metastore (ps:在mysql中创建数据库) ，只起一次。

② 第一步完成后，登录mysql：mysql -uroot -pP@ssw0rd

 
  , 然后展示数据库：show databases;  
 

 
  ,可查看到创建的数据库hive_metadata 
 

③ 查看hive_metadata数据库中的表

 
  首先：use hive_metadata 
 

 
  然后：show tables; 
 

④启动Hive的客户端：

hive 启动：

 
   hive -version metastore 只在配置完成时启动一次 
  

 
  hive 
 

 
  set hive.cli.print.current.db=true; 
 

 
  hive2 启动： 
 

 
  hiveserver2 (负载均衡) 只在配置完成时启动一次 
 

 
  beeline -u jdbc:hive2://  
 

3.问题及解决方案：

 
  原 因：hive需要依赖hadoop，hive和hadoop都有该jar，jar冲突 
 

 
  解决方案：到hive的lib目录删除或重命名slf4j-log4j12-1.7.10.jar即可 
 

 
 /home/hadoop/hive/bin/ext/metastore.sh: 第 29 行:export: ` -Dproc_metastore -Dlog4j.configurationFile=hive-log4j2.properties   

 
 -Djava.util.logging.config.file=/home/hadoop/hive/conf/parquet-logging.properties ': 不是有效的标识符 

二、Hive之表操作

1.创建表(ps:都是在hive中进行的操作)

1)首先在hive中创建一个数据库：dsj101

 
 CREATE DATABASE IF NOT EXISTS dsj101  

 
 COMMENT 'my test hive'; 

2)使用命令：use dsj101 切换数据库

3)创建一个表：emp

 
 CREATE TABLE IF NOT EXISTS emp(ename string, 

 
 age int, 

 
 cishu int 

)

 
 ROW FORMAT DELIMITED 

 
 FIELDS TERMINATED BY '|'; 

4)导入数据：

 
 LOAD DATA LOCAL INPATH  
 '/home/hadoop/hive/examples/files/emp.txt' 

 
 OVERWRITE INTO TABLE emp; 

5)查看导入到表emp中的数据：

 
 select * from emp; 

2.DDL操作：

 
  
 http://www.cnblogs.com/liuyifeng/p/4721189.html 

 
  
 http://www.cnblogs.com/qifengle-2446/p/6424620.html 

1)添加列

①先写一个文本文件：vi persons.txt

2)创建一个表：

 
 create table persons ( id int,name string) row format delimited fields 

 
 terminated by '\t'; 

 
 load data local inpath '/user/hadoop/persons.txt' into table persons; 

 
 select * from persons; 

 
 alter table persons add column (age int); #添加一个新列 

 
 select * from persons; 

3)Updata语法：

①编辑hive-site.xml文件：

<property>
    <name>hive.optimize.sort.dynamic.partition</name>
    <value>false</value>
</property>
<property>
    <name>hive.support.concurrency</name>
    <value>true</value>
</property>
<property>
    <name>hive.enforce.bucketing</name>
    <value>true</value>
</property>
<property>
    <name>hive.exec.dynamic.partition.mode</name>
    <value>nonstrict</value>
</property>
<property>
    <name>hive.txn.manager</name>
    <value>org.apache.hadoop.hive.ql.lockmgr.DbTxnManager</value>
</property>
<property>
    <name>hive.compactor.initiator.on</name>
    <value>true</value>
</property>
<property>
    <name>hive.compactor.worker.threads</name>
    <value>1</value>
</property>
<property>
    <name>hive.in.test</name>
    <value>true</value>
</property>

② 创表语句和要求：

 
 Hive对使用Update功能的表有特定的语法要求, 语法要求如下: 

 
 (1)要执行Update的表中, 建表时必须带有buckets(分桶)属性 

 
 (2)要执行Update的表中, 需要指定格式,其余格式目前赞不支持, 如:parquet格式, 目前只支持ORCFileformat和AcidOutputFormat 

 
 (3)要执行Update的表中, 建表时必须指定参数('transactional' = true); 

 
 举例: 

 
 create 
   
 table 
  student ( 
 id 
   
 bigint 
 , 
 name 
   
 string 
 ) clustered  
 by 
  ( 
 name 
 )  
 into 
  2 buckets  
 stored 
   
 as 
  orc TBLPROPERTIES( 
 'transactional' 
 = 
 'true' 
 ); 

 
  更新语句: 
 
 
  update 
   student  
  set 
    
  id 
  = 
  '444' 
    
  where 
    
  name 
  = 
  'tom' 
  ; 
 

3.查

 
 1)select current_database() 

 
 2)select * from employees where salary > 2000; 

 
 3)select employee_name from employees; 

 
 4)select distinct department from employees; #剔除重复 

 
 5)select count(*) from employees; 

 
 6)select * from employees limit 5,6; 

 
 7)select department ,salary from employees; 

 
 8)select department,avg(salary) from employees group by 

 
 department; 

 
 9)select department,avg(salary) from employees group by  

 
 department having avg(salary) > 2000; 

4.更多内容请查看此链接：DDL & DDM

点击打开链接

三、Hive帮助文档

1.帮助文档地址：

 
 1). 
 https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML 

 
 2). 
 https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL 

2.帮助文档路径：

1)地址栏中：hive.apache.org

Thor_Selen_Liu

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Hive操作

一、Hive配置前奏：可以一个虚拟机中安装mysql 和 hive；hive 最多应用于大数据的分析，，，因：长时间不连接会自动断开；如果mysql 和 hive不在同一台虚拟机中，需要开启远程访问；hive使hdfs存储的数据有结构。在hive中建立一个库，等于在hdfs中创建一个目录。hbase是以hadoop格式存储只支持行级事务。hive通过ma
复制链接

扫一扫