大数据入门（15）hive简介和配置

最新推荐文章于 2023-05-08 16:19:39 发布
沙漏无语
最新推荐文章于 2023-05-08 16:19:39 发布
阅读量340
点赞数
分类专栏：大数据入门
本文链接：https://blog.csdn.net/estelle_belle/article/details/83928762
版权
大数据入门专栏收录该内容
26 篇文章 0 订阅
订阅专栏
1、上传文件，解压到app 下
    tar  -zxvf  文件   -C   app
2、不配置文件的情况下
   启动 ：./hive   (目录：/home/admin/app/hive/bin)
   创建表： create table t_1(id int ,name string);
   查看：show tables;
   退出：exit;

当前目录下生产文件： metastore_db

退出后，在外层目录启动.hive ,查看表，则无法查看，因为metastore_db 文件只存在bin 目录下，默认使用的derby数据库,缺点：一次只能打开一个会话；

3、配置文件，使用mysql 作为数据源管理
    3.1、配置环境变量（/etc/profile ）
        
        JAVA_HOME=/home/admin/app/java/jdk1.7.0_71
        HIVE_HOME=/home/admin/app/hive
        HADOOP_HOME=/home/admin/app/hadoop-2.4.1
        PATH=$HIVE_HOME/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/bin:$JAVA_HOME/bin:$PATH
        CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
        export JAVA_HOME HIVE_HOME HADOOP_HOME  PATH CLASSPATH

    3.2、修改配置文件 hive-env.sh 
        cp hive-env.sh.template  hive-env.sh
         编辑文件：
        export JAVA_HOME=/home/admin/app/java/jdk1.7.0_71
        export HIVE_HOME=/home/admin/app/hive
        export HADOOP_HOME=/home/admin/app/hadoop-2.4.1    
    3.3、添加配置文件hive-site.xml   
        vi  hive-site.xml 
        
    3.4、hive-site.xml（需要先安装mysql或者使用windows下的配置，切允许远程连接）
        
    添加如下内容：
    
<configuration>
    <property>
      <name>javax.jdo.option.ConnectionURL</name>
      <value>jdbc:mysql://192.168.1.2:3306/hive?createDatabaseIfNotExist=true</value>
      <description>JDBC connect string for a JDBC metastore</description>
    </property>
    <property>
      <name>javax.jdo.option.ConnectionDriverName</name>
      <value>com.mysql.jdbc.Driver</value>
      <description>Driver class name for a JDBC metastore</description>
    </property>

    <property>
      <name>javax.jdo.option.ConnectionUserName</name>
      <value>root</value>
      <description>username to use against metastore database</description>
    </property>

    <property>
      <name>javax.jdo.option.ConnectionPassword</name>
      <value>root</value>
      <description>password to use against metastore database</description>
    </property>
</configuration>
4、将mysql的连接jar包拷贝到$HIVE_HOME/lib目录下

    sftp> cd /home/admin/app/hive/lib
    sftp> put e:\soft\mysql-connector-java-5.1.28.jar

    注意:使用外部的mysql ,需要把mysql的数据库字符类型改为latin1，否则建表失败

5、bin/hive启动
    show tables;
    load dat

    show tables;
    create table z_1(id int,name string);（需要把mysql的数据库字符类型改为latin1）

6、创建内部表(默认是内部表)
    create table t_order(id int,name string,velocity string,price double)  row format delimited fields terminated by '\t';

（1）原理
    每一个 Table 在 Hive 中都有一个相应的目录存储数据。例如，一个表 test，它在 HDFS 中的路径为：/ warehouse/test。 
    warehouse是在 hive-site.xml 中由 ${hive.metastore.warehouse.dir} 指定的数据仓库的目录，所有的 Table 数据（不包括 External Table）都保存在这个目录中。
    
（2）hdfs中路径

     默认hdfs中路径：hdfs://ns1/user/hive/warehouse
        表直接在warehouse 下

    http://192.168.1.113:50070/explorer.html#/user/hive/warehouse

    新建一个creat database  wek01;新建表t_order_01；
    hdfsl路径：warehouse/wek01.db/t_order_01
    
（3）加载数据
    
    #本地虚拟机上文件    
    load data local inpath '/home/admin/oder.txt' into table t_order_01;

    #hdfs中文件
    load data inpath '/oder3.txt' into table t_order_01;

（4）hive查询的hdfs原理：查询表下的文件

    load后文件的hdfs路径：warehouse/wek01.db/t_order_01/order.txt

    select * from t_order_01;
    select count(*) from t_order_01;(使用mapreduce)

    直接导入文件到warehouse/wek01.db/t_order_01/下，
    select * from t_order_01 ;//如果格式不同，则显示为NULL
(5)删除表时，元数据与数据都会被删除

7、创建外部表        
(1)原理
    它和 内部表 在元数据的组织上是相同的，而实际数据的存储则有较大的差异：
    内部表 的创建过程和数据加载过程（这两个过程可以在同一个语句中完成），在加载数据的过程中，实际数据会被移动到数据仓库目录中；之后对数据对访问将会直接在数据仓库目录中完成。        删除表时，表中的数据和元数据将会被同时删除
    外部表 只有一个过程，加载数据和创建表同时完成，并不会移动到数据仓库目录中，只是与外部数据建立一个链接。当删除一个 外部表 时，仅删除该链接
（2）创建表
    create external table t_order_ex(id int,name string,velocity string,price double)  row format delimited fields terminated by '\t' location '/hive_test';

8、创建分区表
（1）原理
        Partition 对应于数据库的 Partition 列的密集索引；
        在 Hive 中，表中的一个 Partition 对应于表下的一个目录，所有的 Partition 的数据都存储在对应的目录中
        例如：test表中包含 date 和 city 两个 Partition，
        则对应于date=20130201, city = bj 的 HDFS 子目录为：/warehouse/test/date=20130201/city=bj
        对应于date=20130202, city=sh 的HDFS 子目录为：/warehouse/test/date=20130202/city=sh
（2）创建表
    create table t_order_pt(id int,name string,velocity string,price double)  
    partitioned by (month string)
    row format delimited fields terminated by '\t';
（3）加载数据
    load data local inpath '/home/admin/order.txt' into table t_order_pt partition (month='201810');
    load data local inpath '/home/admin/order2.txt' into table t_order_pt partition (month='201811');
（4）查询
    select count(*) from t_order_pt;
    select count(*) from t_order_pt where month='201810';
（5）修改数据
    alter table partition_table add partition (daytime='2013-02-04',city='bj');
    通过load data 加载数据

    alter table partition_table drop partition (daytime='2013-02-04',city='bj')
    元数据，数据文件删除，但目录daytime=2013-02-04还在

9、将mysq当中的数据直接导入到hive当中
        sqoop import --connect jdbc:mysql://192.168.1.10:3306/itcast --username root --password 123 --table trade_detail --hive-import --hive-overwrite --hive-table trade_detail --fields-terminated-by '\t'
        sqoop import --connect jdbc:mysql://192.168.1.10:3306/itcast --username root --password 123 --table user_info --hive-import --hive-overwrite --hive-table user_info --fields-terminated-by '\t'


10、hive 的运行模式

Hive的运行模式即任务的执行环境，分为本地与集群两种，我们可以通过mapred.job.tracker 来指明
设置方式：
hive > SET mapred.job.tracker=local


11、hive的启动方式
    1、hive 命令行模式，直接输入#/hive/bin/hive的执行程序，或者输入 #hive --service cli 
    2、 hive web界面的 (端口号9999) 启动方式
    #hive --service hwi &
    用于通过浏览器来访问hive；
    http://hadoop0:9999/hwi/
    3、 hive 远程服务 (端口号10000) 启动方式
    #hive --service hiveserver &


创建分区表
    普通表和分区表区别：有大量数据增加的需要建分区表比较方便
    create table book (id bigint, name string) partitioned by (pubdate string) row format delimited fields terminated by '\t'; 

    分区表加载数据
    load data local inpath './book.txt' overwrite into table book partition (pubdate='2010-08-22');    
    load data local inpath '/root/data.am' into table beauty partition (nation="USA");    
    select nation, avg(size) from beauties group by nation order by avg(size);      

*************************************常见问题*********************************************************************

创建表的过程，报错
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Specified key was too long; max key length is 767 bytes

解决方案：修改数据库的编码utf-8 为latin1