Hive操作

一、Hive配置

前奏:

可以一个虚拟机中安装mysql 和 hive;
hive 最多应用于大数据的分析,,,因:长时间不连接会自动断开;
如果mysql 和 hive不在同一台虚拟机中,需要开启远程访问;
hive使hdfs存储的数据有结构。
hive中建立一个库,等于在hdfs中创建一个目录。
hbase是以hadoop格式存储  只支持行级事务。
hive通过mapreduce进行检索作业 可以支持表级事务,但不支持多表;
hbase不通过mapreduce进行检索,有自己的套检索方式。
hive通常用于离线数据处理,所有处理速度比较慢。

1.首先在虚拟机中安装MySQL

2)上传到虚拟机中,并进行命令安装步骤: (ps:在root用户下操作)

①rpm -Uvh mysql57-community-release-el7-11.noarch.rpm

rpm:软件资源管理包

②yum repolist all | grep mysql 查看mysql版本
③yum -y install mysql-community-server 安装mysql

④systemctl start mysqld   开启mysql
grep 'temporary password' /var/log/mysqld.log 查看初识密码(ps:12位)
mysql -uroot -p
ALTER USER 'root'@'localhost' IDENTIFIED BY P@ssw0rd; 修改密码:P@sswOrd
⑦mysql -uroot -pP@ssw0rd; 登录mysql

3)MySQL的其他操作:

systemctl status mysqld 查询mysql的状态
systemctl stop mysqld  停止mysql
rpm -qa | grep mysql   软件资源包管理器rpm
yum -y remove mysql   删除mysql

2.下载和安装Hive,并进行配置和运行

1)下载hive 地址 :apache.fayea.com/hive

官网:hive.apache.org





2)上传并解压hive ,给解压文件改别名

解压:tar -zxvf apache-hive-2.3.0-bin.tar.gz
改别名:mv apache-hive-2.3.0-bin hive (ps:目的在于启动hive时,直接输入hive回车)

3)对hive进行文件配置

①cd hive
②cd conf
③cp hive-default.xml.template hive-site.xml   配置文件,但是要进行修改

修改后的hive-site.xml文件如下:
[hadoop@hadoop-node2 conf]$ cat hive-site.xml 
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!--
   Licensed to the Apache Software Foundation (ASF) under one or more
   contributor license agreements.  See the NOTICE file distributed with
   this work for additional information regarding copyright ownership.
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at
       http://www.apache.org/licenses/LICENSE-2.0
   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
--><configuration>
  <!-- WARNING!!! This file is auto generated for documentation purposes ONLY! -->
  <!-- WARNING!!! Any changes you make to this file will be ignored by Hive.   -->
  <!-- WARNING!!! You must make your changes in hive-site.xml instead.         -->
  <!-- Hive Execution Parameters -->
	<property>
		<name>hive.exec.local.scratchdir</name>
		<value>/home/hadoop/hive/tmp/hive</value>
		<description>Local scratch space for Hive jobs</description>
	</property>
	<property>
		<name>hive.downloaded.resources.dir</name>
		<value>/home/hadoop/hive/tmp/${hive.session.id}_resources</value>
		<description>Temporary local directory for added resources in the remote file system.</description>
	</property>
	<property>
		<name>hive.querylog.location</name>
		<value>/home/hadoop/hive/tmp/hive</value>
		<description>Location of Hive run time structured log file</description>
	</property>
	<property>
		<name>hive.server2.logging.operation.log.location</name>
		<value>/home/hadoop/hive/tmp/hive/operation_logs</value>
		<description>Top level directory where operation logs are stored if logging functionality is enabled</description>
	</property>	
	<property>
		<name>hive.exec.scratchdir</name>
		<value>/tmp/hive</value>
		<description>HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/<username> is created, with ${hive.scratch.dir.permission}.</description>
	</property>	
	<property>
		<name>hive.scratch.dir.permission</name>
		<value>700</value>
		<description>The permission for the user specific scratch directories that get created.</description>
	</property>
	<property>
		<name>javax.jdo.option.ConnectionPassword</name>
		<value>P@ssw0rd</value>
		<description>password to use against metastore database</description>
	</property>
	<property>
		<name>javax.jdo.option.ConnectionURL</name>
		<value>jdbc:mysql://localhost:3306/hive_metadata?createDatabaseIfNotExist=true&characterEncoding=utf8</value>
		<description>
		  JDBC connect string for a JDBC metastore.
		  To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
		  For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
		</description>
	</property>
	<property>
		<name>javax.jdo.option.ConnectionDriverName</name>
		<value>com.mysql.jdbc.Driver</value>
		<description>Driver class name for a JDBC metastore</description>
	</property>
	<property>
		<name>javax.jdo.option.ConnectionUserName</name>
		<value>root</value>
		<description>Username to use against metastore database</description>
	</property>
	<property>
		<name>datanucleus.schema.autoCreateAll</name>
		<value>true</value>
		<description>Auto creates necessary schema on a startup if one doesn't exist. Set this to false, after creating it once.To enable auto create also set hive.metastore.schema.verification=false. Auto creation is not recommended for production use cases, run schematool command instead.</description>
	</property>
	<property>
		<name>hive.metastore.schema.verification</name>
		<value>false</value>
		<description>
		  Enforce metastore schema version consistency.
		  True: Verify that version information stored in is compatible with one from Hive jars.  Also disable automatic schema migration attempt. Users are required to manually migrate schema after Hive upgrade which ensures proper metastore schema migration. (Default) False: Warn if the version information stored in metastore doesn't match with one from in Hive jars.
		</description>
	</property>
</configuration>
⑤创建Hive的临时目录

如果临时目录没有创建,默认数据保存位置在根目录下的tmp中:
cd /tmp ---> 便可找到运行的数据
⑥Hive配置详情步骤:
⑦如果出现以下错误:
解决方案:
在apache-hive-2.3.0-bin 目录中的lib 目录上传一个mysql驱动包即可!

4)启动运行

①启动Hive的元数据
hive --service metastore (ps:在mysql中创建数据库) ,只起一次。
第一步完成后,登录mysql:mysql -uroot -pP@ssw0rd
, 然后展示数据库:show databases;
,可查看到创建的数据库hive_metadata

查看hive_metadata数据库中的表
首先:use hive_metadata

然后:show tables;

④启动Hive的客户端:
hive 启动:
hive -version metastore 只在配置完成时启动一次
hive
set hive.cli.print.current.db=true;
hive2 启动:
hiveserver2 (负载均衡) 只在配置完成时启动一次
beeline -u jdbc:hive2://
3.问题及解决方案:
1)

原 因:hive需要依赖hadoop,hive和hadoop都有该jar,jar冲突
解决方案:到hive的lib目录删除或重命名slf4j-log4j12-1.7.10.jar即可
2)

/home/hadoop/hive/bin/ext/metastore.sh: 第 29 行:export: ` -Dproc_metastore -Dlog4j.configurationFile=hive-log4j2.properties  
-Djava.util.logging.config.file=/home/hadoop/hive/conf/parquet-logging.properties ': 不是有效的标识符

二、Hive之表操作

1.创建表(ps:都是在hive中进行的操作)

1)首先在hive中创建一个数据库:dsj101

CREATE DATABASE IF NOT EXISTS dsj101
COMMENT 'my test hive';

   2)使用命令:use dsj101 切换数据库

   3)创建一个表:emp

CREATE TABLE IF NOT EXISTS emp(ename string,
age int,
cishu int
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|';

           4)导入数据:

LOAD DATA LOCAL INPATH '/home/hadoop/hive/examples/files/emp.txt'
OVERWRITE INTO TABLE emp;

   5)查看导入到表emp中的数据:

select * from emp;

2.DDL操作:

1)添加列

先写一个文本文件:vi persons.txt


2)创建一个表:

create table persons ( id int,name string) row format delimited fields
terminated by '\t';
load data local inpath '/user/hadoop/persons.txt' into table persons;
select * from persons;
alter table persons add column (age int); #添加一个新列
select * from persons;

3)Updata语法:

编辑hive-site.xml文件:

<property>
    <name>hive.optimize.sort.dynamic.partition</name>
    <value>false</value>
</property>
<property>
    <name>hive.support.concurrency</name>
    <value>true</value>
</property>
<property>
    <name>hive.enforce.bucketing</name>
    <value>true</value>
</property>
<property>
    <name>hive.exec.dynamic.partition.mode</name>
    <value>nonstrict</value>
</property>
<property>
    <name>hive.txn.manager</name>
    <value>org.apache.hadoop.hive.ql.lockmgr.DbTxnManager</value>
</property>
<property>
    <name>hive.compactor.initiator.on</name>
    <value>true</value>
</property>
<property>
    <name>hive.compactor.worker.threads</name>
    <value>1</value>
</property>
<property>
    <name>hive.in.test</name>
    <value>true</value>
</property>
创表语句和要求:

Hive对使用Update功能的表有特定的语法要求, 语法要求如下:
(1)要执行Update的表中, 建表时必须带有buckets(分桶)属性
(2)要执行Update的表中, 需要指定格式,其余格式目前赞不支持, 如:parquet格式, 目前只支持ORCFileformat和AcidOutputFormat
(3)要执行Update的表中, 建表时必须指定参数('transactional' = true);
举例:
create table student ( id bigint , name string ) clustered by ( name ) into 2 buckets stored as orc TBLPROPERTIES( 'transactional' = 'true' );

更新语句:
update student set id = '444' where name = 'tom' ;

3.查

1)select current_database()
2)select * from employees where salary > 2000;
3)select employee_name from employees;
4)select distinct department from employees; #剔除重复
5)select count(*) from employees;
6)select * from employees limit 5,6;
7)select department ,salary from employees;
8)select department,avg(salary) from employees group by
department;
9)select department,avg(salary) from employees group by
department having avg(salary) > 2000;

4.更多内容请查看此链接:DDL & DDM

点击打开链接


三、Hive帮助文档

  1.帮助文档地址:

2.帮助文档路径:

1)地址栏中:hive.apache.org

2)

3)

4)

5)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值