Hive是基于Hadoop的数据仓库解决方案。由于Hadoop本身在数据存储和计算方面有很好的可扩展性和高容错性,因此使用Hive构建的数据仓库也秉承了这些特性。
简单来说,Hive就是在Hadoop上架了一层SQL接口,可以将SQL翻译成MapReduce去Hadoop上执行,这样就使得数据开发和分析人员很方便的使用SQL来完成海量数据的统计和分析,而不必使用编程语言开发MapReduce那么麻烦。
首先先做好mysql部分:
创建一个hive的数据库,以及hadoop的用户:
>create database hive;
>CREATE USER 'hadoop'@'%' IDENTIFIED BY 'mysql';
>GRANT ALL PRIVILEGES ON *.* TO 'hadoop'@'%' WITH GRANT OPTION;
>flush privileges;
我的数据库 主机名是 mysql
安装hive:
下载:wget http://mirror.bit.edu.cn/apache/hive/hive-2.3.3/apache-hive-2.3.3-bin.tar.gz
解压缩: tar -zxf apache-hive-2.3.3-bin.tar.gz
重命名: mv apache-hive-2.3.3 hive
配置hive:
1.cd hive
2.cp hive-env.sh.template hive-env.sh
修改hive-env.sh
HADOOP_HOME=/data/bigdata/hadoop-2.7.6
export HIVE_CONF_DIR=/data/bigdata/hive/conf
export HIVE_AUX_JARS_PATH=/data/bigdata/hive/lib
cp hive-default.xml.template hive-site.xml
修改hive-site.xml
<property>
<name>hive.server2.thrift.bind.host</name>
<value>node1</value>
<description>Bind host on which to run the HiveServer2 Thrift service.</description>
</property>
<property>
<name>hive.exec.local.scratchdir</name>
<value>/data/bigdata/hive/iotmp/${user.name}</value>
<description>Local scratch space for Hive jobs</description>
</property>
<property>
<name>hive.downloaded.resources.dir</name>
<value>/data/bigdata/hive/iotmp/${hive.session.id}_resources</value>
<description>Temporary local directory for added resources in the remote file system.</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>mysql</value>
<description>password to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://mysql/hive</value>
<description>
JDBC connect string for a JDBC metastore.
To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
</description>
</property>
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
<description>
Enforce metastore schema version consistency.
True: Verify that version information stored in is compatible with one from Hive jars. Also disable automatic
schema migration attempt. Users are required to manually migrate schema after Hive upgrade which ensures
proper metastore schema migration. (Default)
False: Warn if the version information stored in metastore doesn't match with one from in Hive jars.
</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>Username to use against metastore database</description>
</property>
<property>
<name>hive.querylog.location</name>
<value>/data/bigdata/hive/iotmp/${user.name}</value>
<description>Location of Hive run time structured log file</description>
</property>
<property>
<name>hive.server2.logging.operation.log.location</name>
<value>/data/bigdata/hive/iotmp/${user.name}/operation_logs</value>
<description>Top level directory where operation logs are stored if logging functionality is enabled</description>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/data/hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>
<property>
<name>hive.merge.mapredfiles</name>
<value>true</value>
<description>Merge small files at the end of a map-reduce job</description>
</property>
<property>
<name>hive.exec.scratchdir</name>
<value>/data/hive/tmp/</value>
<description>HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/<username> is created, with ${hive.scratch.dir.permission}.</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hadoop</value>
<description>Username to use against metastore database</description>
</property>
wq保存退出
3.cp hive-log4j2.properties.template hive-log4j2.properties
修改hive-log4j2.properties
property.hive.log.dir = /data/hive/logs
4.下载mysql-connector-java-5.1.46.jar包,放入hive/lib目录下
5.初始化hive仓库
bin/schematool -initSchema -dbType mysql
启动服务
nohup hive --service metastore > metastore.log 2>&1 &
nohup hive --service hiveserver2 > hiveserver2.log 2>&1 &
hive start hiveserver2
进入hive客户端 创建表
hive
>create table xxx_log (ip string,time1 string,time2 string,time3 string,time4 string,method string,path string,proc string,status int,unknown1 string,unknown2 string,unknown3 string,myip string,unknown4 string,unknown5 string,unknown6 string,unknown7 string,requesttime string)ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' STORED AS TEXTFILE;
导入预先准备好的nginx日志(格式化数据)
>load data local inpath '/data/xxx.access.log' into table xxx_log;
最后可以查询了(基本和mysql查询一致)
>select ip,count(*) as num from xxx_log group by ip;