大数据学习笔记(八)-Hive(基础)

1.下载Hive:
http://archive.cloudera.com/cdh5/cdh/5/
2.安装MySQL
参考:https://www.cnblogs.com/starof/p/4680083.html
其中关于my.cnf的设置是错误的,应改成:
character-set-server=utf8
3./conf/hive-site.xml

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <property>
    <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value>
        </property>
    <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.jdbc.Driver</value>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>root</value>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>root</value>
    </property>
</configuration>

4.把mysql-connector-java-5.1.7-bin.jar放到/home/hadoop/app/hive-1.1.0-cdh5.7.0/lib下
在MySQL中手动创建hive数据库

create database hive character set latin1;

然后执行hive命令进入Hive shell

建表

CREATE TABLE helloworld(id int, name string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

CREATE TABLE emp(empno int, ename string, job string, mgr int, hiredate string, sal double, comm double, depno int)ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';

CREATE TABLE emp2 LIKE emp;--只拷贝表结构,补考表数据

CREATE TABLE emp3 as SELECT * FROM emp; --拷贝表结构及数据

5.执行导入数据的语句出现了以下问题

hive> load data local inpath '/home/hadoop/data/helloworld.txt' overwrite into table helloworld; 
Loading data to table default.helloworld
Failed with exception Unable to move source file:/home/hadoop/data/helloworld.txt to destination hdfs://hadoop000:8020/user/hive/warehouse/helloworld/helloworld.txt
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask

优先考虑hdfs启动失败(我的情况就是这个)
如果hdfs启动成功了,尝试一下解决方案
$HADOOP_HOME/share/hadoop/tools/lib/hadoop-distcp-2.6.0.jar放入hive的lib目录

6.在Hive命令行里显示数据库名称

set hive.cli.print.current.db=true;

永久生效配置/conf/hive-site.xml

<property>
    <name>hive.cli.print.current.db</name>
    <value>true</value>
</property>

6.Hive中的内部表和外部表
内部表(managed_table):删除表时hdfs上的数据也会一同被删除,MySQL上的元数据会被删除
外部表(external_table):删除表时hdfs上的数据不会被删除,MySQL上的元数据不会被删除
创建外部表的方法

CREATE EXTERNAL TABLE emp_external(empno int, ename string, job string, mgr int, hiredate string, sal double, comm double, depno int)ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION 'folder path on hdfs';

外部表保证了hdfs上文件的安全性
7.从Hive中导出查询结果到文件中

INSERT OVERWRITE LOCAL DIRECTORY '/home/hadoop/data/tmp'
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
select sal, ename,
case
when sal>0 and sal<1000 then 'LOWER'
when sal>=1000 and sal<=2000 then 'MIDDLE'
when sal>2000 and sal<=4000 then 'HIGH'
else 'HIGHEST' end
from emp;

HSQL官方实用文档:
https://cwiki.apache.org/confluence/display/Hive/Home

8.Hive可以在建表的时候添加分区,语句如下

CREATE TABLE page_view(viewTime INT, userid BIGINT,
     page_url STRING, referrer_url STRING,
     ip STRING COMMENT 'IP Address of the User')
 COMMENT 'This is the page view table'
 PARTITIONED BY(dt STRING, country STRING)
 ROW FORMAT DELIMITED
   FIELDS TERMINATED BY '\t';

这里的partitioned by中的分区实际上就是给表加标签,这个标签可以和表中的内容完全无关,分区内容不再表中显示,但是在查询时可以作为查询条件,从而加快查询速度。
手动间文件上传到hdfs的Hive数据分区后的存储目录里,可以用命令

msck repair table 表名

生成对应分区数据的元数据。
9.Hive可以支持array、map和struct数据类型。
10.CREATE TABLE全部参数

CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name    -- (Note: TEMPORARY available in Hive 0.14.0 and later)
  [(col_name data_type [COMMENT col_comment], ... [constraint_specification])]
  [COMMENT table_comment]
  [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
  [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
  [SKEWED BY (col_name, col_name, ...)                  -- (Note: Available in Hive 0.10.0 and later)]
     ON ((col_value, col_value, ...), (col_value, col_value, ...), ...)
     [STORED AS DIRECTORIES]
  [
   [ROW FORMAT row_format] 
   [STORED AS file_format]
     | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)]  -- (Note: Available in Hive 0.6.0 and later)
  ]
  [LOCATION hdfs_path]
  [TBLPROPERTIES (property_name=property_value, ...)]   -- (Note: Available in Hive 0.6.0 and later)
  [AS select_statement];   -- (Note: Available in Hive 0.5.0 and later; not supported for external tables)

CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
  LIKE existing_table_or_view_name
  [LOCATION hdfs_path];

data_type
  : primitive_type
  | array_type
  | map_type
  | struct_type
  | union_type  -- (Note: Available in Hive 0.7.0 and later)

primitive_type
  : TINYINT
  | SMALLINT
  | INT
  | BIGINT
  | BOOLEAN
  | FLOAT
  | DOUBLE
  | DOUBLE PRECISION -- (Note: Available in Hive 2.2.0 and later)
  | STRING
  | BINARY      -- (Note: Available in Hive 0.8.0 and later)
  | TIMESTAMP   -- (Note: Available in Hive 0.8.0 and later)
  | DECIMAL     -- (Note: Available in Hive 0.11.0 and later)
  | DECIMAL(precision, scale)  -- (Note: Available in Hive 0.13.0 and later)
  | DATE        -- (Note: Available in Hive 0.12.0 and later)
  | VARCHAR     -- (Note: Available in Hive 0.12.0 and later)
  | CHAR        -- (Note: Available in Hive 0.13.0 and later)

array_type
  : ARRAY < data_type >

map_type
  : MAP < primitive_type, data_type >

struct_type
  : STRUCT < col_name : data_type [COMMENT col_comment], ...>

union_type
   : UNIONTYPE < data_type, data_type, ... >  -- (Note: Available in Hive 0.7.0 and later)

row_format
  : DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char]
        [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
        [NULL DEFINED AS char]   -- (Note: Available in Hive 0.13 and later)
  | SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]

file_format:
  : SEQUENCEFILE
  | TEXTFILE    -- (Default, depending on hive.default.fileformat configuration)
  | RCFILE      -- (Note: Available in Hive 0.6.0 and later)
  | ORC         -- (Note: Available in Hive 0.11.0 and later)
  | PARQUET     -- (Note: Available in Hive 0.13.0 and later)
  | AVRO        -- (Note: Available in Hive 0.14.0 and later)
  | INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname

constraint_specification:
  : [, PRIMARY KEY (col_name, ...) DISABLE NOVALIDATE ]
    [, CONSTRAINT constraint_name FOREIGN KEY (col_name, ...) REFERENCES table_name(col_name, ...) DISABLE NOVALIDATE 
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值