大数据--hiveDDL命令操作

最新推荐文章于 2022-07-25 13:46:09 发布

awys19931001

最新推荐文章于 2022-07-25 13:46:09 发布

阅读量278

点赞数 1

文章标签：大数据数据库 shell

原文链接：http://www.cnblogs.com/jeff190812/p/11507312.html

版权

1、在hive中显示当前数据库的名字和表头的信息，需要在$HIVE_HOME/conf目录下的hive-site.xml文件中添加如下内容：

<name>hive.cli.print.header</name>

</property>

<name>hive.cli.print.current.db</name>

</property>

配置完成后就可以看到如下的效果了：

hive (default)> select * from student;
OK
student.id student.name
1001 ss1
Time taken: 0.061 seconds, Fetched: 1 row(s)

======================================================================================================================================

2、修改hive的默认数据库也就是default的HDFS的存储路径，需要在$HIVE_HOME/conf目录下的hive-site.xml文件中添加如下内容（此段内容是在hive-default.xml.template文件下的）：

<name>hive.metastore.warehouse.dir</name>

<value>/user/hive/warehouse</value>

<description>location of default database for the warehouse</description>

</property>

--需要在配置下同组用户有写的权限（我的Hadoop的环境是HA模式的）

hdfs dfs -chmod g+w /user/hive/warehouse hdfs://bigdata111:9000/

--在HIVE_HOME/conf目录下的hive-site.xml文件中新增配置项对所有的连接到hive的session都生效

<name>mapred.reduce.tasks</name>

<value>3</value> --默认值是-1

</property>

--在Linuxshell上修改配置项修改后连接到hive上，该session没有退出hiveshell的时候是生效的，当退出该hiveshell的时候就会失效，值还是-1不是修改后的值

hive -hiveconf mapred.reduce.tasks=2;

--在hiveshell上修改配置项修改后，只对当前连接到hiveshell修改配置项的session生效，退出后就实效了

hive (default)> set mapred.reduce.tasks=2;

=====================================================================================================================================

3、hive的常用命令：

3.1、在hiveshell下执行hive的命令：

--查询hive下所有的数据库

hive (default)> show databases;
OK
database_name
db_test
default
Time taken: 0.096 seconds, Fetched: 2 row(s)

------------------------------------------------------------

--查询当前default数据库下的所有表

hive (default)> show tables;
OK
tab_name
data
student
sum
Time taken: 0.018 seconds, Fetched: 3 row(s)

-----------------------------------------------------------

--查询表data的表结构信息

hive (default)> desc data;
OK
col_name data_type comment
id int
name string
Time taken: 0.06 seconds, Fetched: 2 row(s)

-----------------------------------------------------------

--查询表data的数据

hive (default)> select * from data;
OK
data.id data.name
1001 zhangshan
1002 lishi
1003 zhaoliu
Time taken: 0.066 seconds, Fetched: 3 row(s)

---------------------------------------------------------------

--切换数据库

hive (default)> use db_test;
OK
Time taken: 0.013 seconds

----------------------------------------------------------------

--创建内部表，hive创建的内部表，可以将数据移动到指定的HDFS的目录下，而且删除内部表的时候会将内部表的元数据和HDFS上存储的数据都删除

hive (db_test)> create table if not exists student(id int,name string)
> row format delimited fields terminated by '\t'
> stored as textfile
> location '/db_test.db/student';
OK
Time taken: 0.058 seconds

-------------------------------------------------------------------

[root@bigdata113 hivetest]# cat data
1001 zhangshan
1002 lishi
1003 zhaoliu

--将/root/hivetest目录下文件data的内容导入刚刚创建的内部表student

hive (db_test)> load data local inpath '/root/hivetest/data' into table student;
Loading data to table db_test.student
Table db_test.student stats: [numFiles=1, totalSize=39]
OK
Time taken: 0.197 seconds

----------------------------------------------------------------------------------

--查询下刚刚导入文件的内部表student

hive (db_test)> select * from student;
OK
student.id student.name
1001 zhangshan
1002 lishi
1003 zhaoliu
Time taken: 0.073 seconds, Fetched: 3 row(s)

--在HDFS上查询下该表对应的导入的文件data

hive (db_test)> dfs -lsr /db_test.db hdfs://bigdata111:9000/;
lsr: DEPRECATED: Please use 'ls -R' instead.
drwxrwxr-x - root supergroup 0 2019-09-11 15:08 /db_test.db/student
-rwxrwxr-x 3 root supergroup 39 2019-09-11 15:08 /db_test.db/student/data
drwxrwxr-x - root supergroup 0 2019-09-11 15:12 hdfs://bigdata111:9000/db_test.db
drwxrwxr-x - root supergroup 0 2019-09-11 15:08 hdfs://bigdata111:9000/db_test.db/student
-rwxrwxr-x 3 root supergroup 39 2019-09-11 15:08 hdfs://bigdata111:9000/db_test.db/student/data

--查询下data的内容

hive (db_test)> dfs -cat /db_test.db/student/data hdfs://bigdata111:9000/;
1001 zhangshan
1002 lishi
1003 zhaoliu

---------------------------------------------------------------------------------------

--创建hive的外部表，使用关键字external,删除外部表的时候只是会删除元数据也就是hive的table删除，但是在HDFS上存储的文件不会被删除

hive (db_test)> create external table if not exists studentout(id int,name string)
> row format delimited fields terminated by '\t';
OK
Time taken: 0.054 seconds

-----------------------------------------------------------------------------------

[root@bigdata113 hivetest]# cat dataout
1001 zhangshan
1002 lishi
1003 zhaoliu

--将/root/hivetest目录下的文件dataout导入到刚刚创建的外部表

hive (db_test)> load data local inpath '/root/hivetest/dataout' into table studentout;
Loading data to table db_test.studentout
Table db_test.studentout stats: [numFiles=1, totalSize=39]
OK
Time taken: 0.204 seconds

-----------------------------------------------------------------------------------

--查询下刚刚创建和导入数据的外部表studentout

hive (db_test)> select * from studentout;
OK
studentout.id studentout.name
1001 zhangshan
1002 lishi
1003 zhaoliu
Time taken: 0.044 seconds, Fetched: 3 row(s)

-----------------------------------------------------------------------------------

--在HDFS上看看外部表导入的数据文件dataout

hive (db_test)> dfs -cat /db_test.db/studentout/dataout hdfs://bigdata111:9000/;
1001 zhangshan
1002 lishi
1003 zhaoliu

--------------------------------------------------------------------------------

--删除hive的外部表和内部表，外部表删除只会删除表的元数据不会删除HDFS上的文件，删除内部表就会删除表的元数据和HDFS上的文件

--删除内部表

hive (db_test)> drop table student;
OK
Time taken: 0.209 seconds

--删除外部表

hive (db_test)> drop table studentout;
OK
Time taken: 0.065 seconds

--查询下hive上刚刚删除的两张表的元数据

hive (db_test)> show tables;
OK
tab_name
Time taken: 0.012 seconds

--两张表的元数据已经被删除了，在看看HDFS上的数据

hive (db_test)> dfs -lsr /db_test.db/ hdfs://bigdata111:9000/;
lsr: DEPRECATED: Please use 'ls -R' instead.
drwxrwxr-x - root supergroup 0 2019-09-11 15:18 /db_test.db/studentout
-rwxrwxr-x 3 root supergroup 39 2019-09-11 15:18 /db_test.db/studentout/dataout
drwxrwxr-x - root supergroup 0 2019-09-11 15:22 hdfs://bigdata111:9000/db_test.db
drwxrwxr-x - root supergroup 0 2019-09-11 15:18 hdfs://bigdata111:9000/db_test.db/studentout
-rwxrwxr-x 3 root supergroup 39 2019-09-11 15:18 hdfs://bigdata111:9000/db_test.db/studentout/dataout

--内部表的文件data已经被删除了，但是外部表的文件dataout还在HDFS上

----------------------------------------------------------------------------------------------------------------------

--删除数据库

hive (default)> drop database db_test cascade;
OK
Time taken: 0.035 seconds
hive (default)> show databases;
OK
database_name
default
Time taken: 0.006 seconds, Fetched: 1 row(s)

-----------------------------------------------------------------------------------------------------------------

--创建数据库

hive (default)> create database if not exists db_test location '/db_test.db';
OK
Time taken: 0.03 seconds

-------------------------------------------------------------------------------------------------

--修改数据库的信息，数据库的信息除了数据库的名字和数据库的存储路径不能修改，可以修改数据库的创建时间

hive (default)> alter database db_test set dbproperties('createtime'='20190911');
OK
Time taken: 0.024 seconds

-------------------------------------------------------------------------------------------

--查询数据库的详细信息

hive (default)> desc database extended db_test;
OK
db_name comment location owner_name owner_type parameters
db_test hdfs://mycluster/db_test.db root USER {createtime=20190911}
Time taken: 0.019 seconds, Fetched: 1 row(s)

--------------------------------------------------------------------------------------------

3.2、在Linuxshell命令下执行hive的命令

--查询下hive的命令帮助

[root@bigdata113 hivetest]# hive -help
usage: hive
-d,--define <key=value> Variable subsitution to apply to hive
commands. e.g. -d A=B or --define A=B
--database <databasename> Specify the database to use
-e <quoted-query-string> SQL from command line
-f <filename> SQL from files
-H,--help Print help information
--hiveconf <property=value> Use value for given property
--hivevar <key=value> Variable subsitution to apply to hive
commands. e.g. --hivevar A=B
-i <filename> Initialization SQL file
-S,--silent Silent mode in interactive shell
-v,--verbose Verbose mode (echo executed SQL to the
console)

-------------------------------------------------------------------------

--在Linuxshell下查询数据库db_test下的表student的数据

[root@bigdata113 hivetest]# hive --database db_test -e 'select * from student;'

Logging initialized using configuration in file:/opt/module/apache-hive-1.2.1-bin/conf/hive-log4j.properties
OK
Time taken: 0.675 seconds
OK
student.id student.name
1001 zhangshan
1002 lishi
1003 zhaoliu
Time taken: 1.0 seconds, Fetched: 3 row(s)

---------------------------------------------------------------------------

--使用sql脚本来执行hive命令

[root@bigdata113 hivetest]# cat student.sql
select * from student;

--使用student.sql文件来查询数据库db_test下的表student的数据

[root@bigdata113 hivetest]# hive --database db_test -f student.sql

Logging initialized using configuration in file:/opt/module/apache-hive-1.2.1-bin/conf/hive-log4j.properties
OK
Time taken: 0.714 seconds
OK
student.id student.name
1001 zhangshan
1002 lishi
1003 zhaoliu
Time taken: 0.679 seconds, Fetched: 3 row(s)

---------------------------------------------------------------------------------------

--使用“!”在hiveshell下可以使用Linux的命令

hive (db_test)> !cat /root/hivetest/data;
1001 zhangshan
1002 lishi
1003 zhaoliu
hive (db_test)> !ls /root;
anaconda-ks.cfg
hivetest
rpm
zookeeper.out

------------------------------------------------------------------------------------

--使用hive的map、array、struct来创建表和导入相关的数据

--创建的表和导入的数据需要显示成如下的格式

{

"name": "songsong",

"friends": ["bingbing" , "lili"] , //列表Array,

"children": { //键值Map,

"xiao song": 18 ,

"xiaoxiao song": 17

}

"address": { //结构Struct,

"street": "hai dian qu" ,

"city": "beijing"

}

--先创建表

hive (db_test)> create table if not exists structsum(name string,
> friends array<string>,
> children map<string,int>,
> address struct<street:string,city:string>)
> row format delimited fields terminated by ','
> collection items terminated by '_'
> map keys terminated by ':'
> lines terminated by '\n';
OK
Time taken: 0.109 seconds

--MAP STRUCT 和 ARRAY 的分隔符(数据分割符号)用的是"_"这个符号

-------------------------------------------------------------------------------------------

--将/root/hivetest目录下的文件sum的数据导入到刚刚创建的表structsum里

[root@bigdata113 hivetest]# cat sum
songsong,bingbing_lili,xiao song:18_xiaoxiao song:17,hai dian qu_beijing
yangyang,caicai_susu,xiao yang:18_xiaoxiao yang:19,chao yang_beijing

hive (db_test)> load data local inpath '/root/hivetest/sum' into table structsum;
Loading data to table db_test.structsum
Table db_test.structsum stats: [numFiles=1, totalSize=142]
OK
Time taken: 0.151 seconds

--查询下表structsum的数据

hive (db_test)> select * from structsum;
OK
structsum.name structsum.friends structsum.children structsum.address
songsong ["bingbing","lili"] {"xiao song":18,"xiaoxiao song":17} {"street":"hai dian qu","city":"beijing"}
yangyang ["caicai","susu"] {"xiao yang":18,"xiaoxiao yang":19} {"street":"chao yang","city":"beijing"}
Time taken: 0.088 seconds, Fetched: 2 row(s)

--查询name是songsong的第一个朋友也就是bingbing及children里面的xiaoxiao song的年龄的信息及街道和城市信息，给字段去中文别名的话需要加特殊符号`，这个不是单引号是键盘Esc键上面的`号

hive (db_test)> select friends[0]as`朋友`,children['xiaoxiao song']as`年龄`,address.street,address.city from structsum where name="songsong";
OK
朋友年龄 street city
bingbing 17 hai dian qu beijing
Time taken: 0.097 seconds, Fetched: 1 row(s)

========================================================================================================================================

转载于:https://www.cnblogs.com/jeff190812/p/11507312.html

awys19931001

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
大数据--hiveDDL命令操作

1、在hive中显示当前数据库的名字和表头的信息，需要在$HIVE_HOME/conf目录下的hive-site.xml文件中添加如下内容：<property><name>hive.cli.print.header</name><value>true</value></property>&l...
复制链接

扫一扫