hive的数据定义相关语法知识

Data Definition
1.创建数据库:
CREATE DATABASE IF NOT EXISTS foo
[LOCATION]'/m/j/jkl'  [COMMENT]'haha' 
[WITH DBPROPERTIES]  ('creator' = 'Mark', 
'date' = '2012-01-02');
**************
  --(分布式默认LOCATION hdfs:///user/hive/warehouse/foo.db 
  等价于
  hdfs://hostname[:port]/user/hive/warehouse/foo.db 
Note the .db extension)
  --(COMMET命令用于在使用DESCRIBE DATABASE命令时,能够
  显示注释信息)
  --(WITH DBPROPERTIES 的键值对信息在使用命令
  DESCRIBE DATABASE EXTENDED foo;时可以显示出来)

  --(如果不加IF NOT EXISTS子句,会抛出异常,当foo存在时)


2.show命令;
SHOW DATABASES LIKE ‘h.*’;
**************

 (正则中的'.*'代表h后的任意个字符)


3.显示当前working DATABASE:
SET hive.cli.print.current.db=true;(反之 false)

示例:hive>………    hive(foo)>……


4.删除DATABASE:
DROP DATABASE IF EXISTS foo CASCADE;
--(IF EXISTS:避免数据库不存在而抛异常)
--(CASCADE 如果此数据库有表时须加此关键字强制删除)

--(RESTRICT如果此数据库有表时拒绝删除)


5.Alter Database:
ALTER DATABASE foo SET DBPROPERTIES(‘editby’=’haah’);

--(如果设置了此键值对属性,则没有方法去删除或unset一个DBPROPERTIES)


6.创建表:
CREATE TABLE IF NOT EXISTS [mydb].employees (
name  STRING [COMMENT]'Employee name',
salary  FLOAT  [COMMENT]'Employee salary',
subordinates ARRAY<STRING> COMMENT 'Names of subordinates',
deductions  MAP<STRING, FLOAT>
[COMMENT]'Keys are deductions names, values are percentages',
address  STRUCT<street:STRING, city:STRING, state:STRING, zip:INT>
[COMMENT]  'Home address')
[COMMENT]'Description of the table'
[TBLPROPERTIES] ('creator'='me', 'created_at'='2012-01-02 10:00:00', ...)
[LOCATION] '/user/hive/warehouse/mydb.db/employees';
--(由此可以看出COMMENT可以跟在DATABASE、TABALE、及其属性后面,作为注释来用)
--( IF NOT EXISTS如果已存在表的schema不同与所要创建的表,hive将忽略这差异,也就是说hive只认表名!)
--(TBLPROPERTIES 可以用 SHOW TBLPROPERTIES tblname查看表的键值对属性;
--(可以用 desc tblname(注意:与描述database的差异) 来显示表的各个属性及每个属性描述)

默认有last_modified_by last-usrname和last_modified_time 精确到纪元秒)


7.default database在  /user/hive/warehouse没有自己的目录,所创建的表,直接在此目录下以自己的表名创建目录。


8.拷贝表的schema:
CREATE TABLE IF NOT EXISTS mydb.employees2
LIKE mydb.employees;
--(只拷贝模式,不拷贝数据)

--(可以加入LOCATION子句,而其他属性则取决于原表)


9.SHOW TABLES:
--(如果显示的不是当前database中的表,则用SHOW TABLES IN dbname)
--(SHOW TABLES ‘foo.*’; show子句支持正则查询,注意 .*)

--(不支持以上两种查询同时使用!!)


10.  DESCRIBE EXTENDED
 DESCRIBE EXTENDED mydb.employees;
deductions  map<string,float> Keys are deductions names, values are percentages
address struct<street:string,city:string,state:string,zip:int>  Home address
Detailed Table Information  Table(tableName:employees, dbName:mydb, owner:me,
...
location:hdfs://master-server/user/hive/warehouse/mydb.db/employeesparameters:{creator=me, created_at='2012-01-02 10:00:00',
last_modified_user=me, last_modified_time=1337544510,
comment:Description of the table, ...}, ...)
--(用FORMATTED 可以提供更可读而且更冗长的信息)
--(也可以精确到具体的列 : DESCRIBE mydb.employees.salary;
salary  float  Employee salary  --(此时用EXTENDED 将不会输出更多信息))
--(last_modified_by和 last_modified_time 表属性是自动创建的)
--(此命令也可以查看表的属性: 【内/外表】   tableType:MANAGED_TABLE或 EXTERNAL_TABLE))
10-2. 描述带有分区的extended 表;
 DESCRIBE EXTENDED tablename [ partition (year=2012, month=1, day=2) ];
...
location:s3n://ourbucket/logs/2011/01/02,;


11. 创建外部表:
CREATE EXTERNAL TABLE IF NOT EXISTS stocks (
price_adj_close FLOAT)  comment ‘desc table’
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION '/data/stocks'; 

--(comment 选项必须紧跟括号之后,否则出错)


12.复制外部表的模式:
CREATE EXTERNAL TABLE IF NOT EXISTS mydb.employees3
LIKE mydb.employees
LOCATION '/path/to/data';

--(如果你忽略 EXTERNAL,并且这源表是 external, 这新表将是external. 如果你忽略EXTERNAL关键字,并且这源表是managed, 这新表将是managed. However, 如果你写了 EXTERNAL关键字,并且这源表是managed, 这新表将是external.)


13.分区内部表:
CREATE TABLE employees (……列..)
PARTITIONED BY (country STRING, state STRING);
--(注意:‘列’里定义过的属性列,在partitioned by里面就不用定义了,即partitioned by 相当于定义了新属性列)
目录结构: ../employees/country=CA/state=AB
.../employees/country=CA/state=BC
.../employees/country=US/state=AL
.../employees/country=US/state=AK
--(目录结构由分区的层数决定)
--(分区项使用时相当于普通表的列:
SELECT * FROM employees
WHERE country = 'US' AND state = 'IL';)

set hive.mapred.mode=strict/nonstrict;  (strict禁止查询语句不带where字句,否则会出现FAILED: Error in semantic analysis: No partition predicate(谓语) found for Alias "wxptbl_part" Table "wxptbl_part"的错误提示。, nonstrict 可以不带)


14.显示分区:
hive> SHOW PARTITIONS tablename  [PARTITION(….分区键值对…)] ;
country=US/state=AL
country=US/state=AK
DESCRIBE EXTENDED tablename;  (此命令也显示分区keys) ,形如:
partitionKeys:[FieldSchema(name:country, type:string, comment:null),
FieldSchema(name:state, type:string, comment:null)],…

中括号中的partition语句是为了更详细的查找某个分区表分区中的分区(只显示最外层有几个分区时可以不写)


15.导入数据:
LOAD DATA LOCAL INPATH '${env:HOME}/california-employees'
[ overwrite ] INTO TABLE employees

PARTITION (country = 'US', state = 'CA');


16. Primitive data types
Type Size Literal syntax examples
TINYINT 1byte signed integer. 20
SMALLINT 2 byte signed integer. 20
INT 4 byte signed integer. 20
BIGINT 8 byte signed integer. 20
BOOLEAN Boolean true or false. TRUE
FLOAT Single precision floating point. 3.14159
DOUBLE Double precision floating point. 3.14159
STRING Sequence of characters. The character
set can be specified. Single or double
quotes can be used.
'Now is the time', "for all
good men"
TIMESTAMP(v0.8.0+) Integer, float, or string. 1327882394(Unix epoch seconds),1327882394.123456789(Unix epoch seconds plus nanoseconds), and
'2012-02-03
12:34:56.123456789'(JDBCcompliant java.sql.Timestamp
format)

BINARY(v0.8.0+) Array of bytes. See discussion below


17. Collection data types
Type Description Literal syntax examples
STRUCT Analogous to a C  structor an “object.” Fields can be accessed using the “dot” notation. For example, if a column nameis of
type STRUCT {first STRING; last STRING}, then
the first name field can be referenced using name.first.
struct('John', 'Doe')
MAP A collection of key-value tuples, where the fields are accessed using array notation (e.g., ['key']). For example, if a column
nameis of type MAPwith key→value pairs
'first'→'John'and 'last'→'Doe', then the last
name can be referenced using name['last'].
map('first', 'John','last', 'Doe')
ARRAY Ordered sequences of the sametype that are indexable using zero-based integers. For example, if a column nameis of type
ARRAYof strings with the value ['John', 'Doe'], then the second element can be referenced using name[1].

array('John', 'Doe')


 18. Hive’s default record and field delimiters
Delimiter Description
\n For text files, each line is a record, so the line feed character separates records.
^A(“control” A) Separates all fields (columns). Written using the octal code \001when explicitly
specified in CREATE TABLEstatements.
^B Separate the elements in an ARRAY or STRUCT, or the key-value pairs in a MAP. Written using the octal code \002when explicitly specified in CREATE TABLE statements.

^C Separate the key from the corresponding value in  MAP key-value pairs. Written using the octal code \003when explicitly specified in CREATE TABLEstatements


19. Customizing Table Storage Formats:
CREATE TABLE employees (
subordinates ARRAY<STRING>,
deductions  MAP<STRING, FLOAT>,
address  STRUCT<street:STRING, city:STRING, state:STRING, zip:INT>
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\001'
COLLECTION ITEMS TERMINATED BY '\002'
MAP KEYS TERMINATED BY '\003'
LINES TERMINATED BY '\n'

STORED AS TEXTFILE;


20. DROP TABLE IF EXISTS tablename;
--(如果不加IF EXISTS字句,当表不在时将返回error)
--(For managed tables, the table metadata and data are deleted.)

--(For external tables, the metadata is deleted but the data is not.)


21.alter table : (Adding, Modifying, and Dropping a Table Partition)
给表重命名:
ALTER TABLE log_messages RENAME TO logmsgs;
给表添加分区:
ALTER TABLE log_messages ADD IF NOT EXISTS
PARTITION (year = 2011, month = 1, day = 1) LOCATION '/logs/2011/01/01'
PARTITION (year = 2011, month = 1, day = 2) LOCATION '/logs/2011/01/02'...;
--(location后面的目录名可以随意写,与partition后面的无关)
改变分区位置:
ALTER TABLE log_messages PARTITION(year = 2011, month = 12, day = 2)   SET LOCATION 's3n://ourbucket/logs/2011/01/02';
--( This command does not move the data from the old location, nor does it delete the old data.)
删除表的分区:
ALTER TABLE log_messages DROP IF EXISTS PARTITION(year = 2011, month = 12, day = 2);

--(The  IF EXISTS clause is optional, as usual. For managed tables, the data for the partition is deleted, along with the metadata, even if the partition was created using ALTER TABLE … ADD PARTITION. For external tables, the data is not deleted.)


22. Changing Columns:
--(You can rename a column, change its position, type, or comment:)
ALTER TABLE log_messages CHANGE
COLUMN  hms  hours_minutes_seconds INT
COMMENT 'The hours, minutes, and seconds part of the timestamp' AFTER severity;

--(你必须指定老名字,新名字以及类型,即使名字和类型不改变。这关键字COLUMN,COMMENT子句是可选的,如果你不移动列,这ARTER other_column 子句是不必要的。在这个例子中,我们移动列在severity列之后。如果你想要移列到列之前,用FIRST 代替 [ AFTER other_column ] 整个子句。)


23. Adding Columns:
--(You can add new columns to the end of the existing columns, before any partition columns.)
ALTER TABLE log_messages ADD COLUMNS (
app_name  STRING COMMENT 'Application name',

session_id LONG  COMMENT 'The current session id');


24. Deleting or Replacing Columns:
下面的例子移除了所有存在的列并用新指定的列替换它们:
ALTER TABLE log_messages REPLACE COLUMNS (
hours_mins_secs INT  COMMENT 'hour, minute, seconds from timestamp',
severity  STRING COMMENT 'The message severity'
message  STRING COMMENT 'The rest of the message');
--(这声明高效的重命名了 hms列并且移除了  server和process_id列从原来的模式定义.只改变元数据信息。

这 REPLACE声明 只可以被用于表-此表用一个本地的 SerDe模块: DynamicSerDeor MetadataTypedColumnsetSerDe. 重新调用这 SerDe 决定记录怎用被反序列化为列and记录的列怎样被序列化写进存储)


25. Alter Table Properties:
--(You can add additional table properties or modify existing properties, but not remove them:)
ALTER TABLE log_messages SET TBLPROPERTIES (

'notes' = 'The process id is no longer captured; this column is always NULL');


26. Alter Storage Properties:
ALTER TABLE log_messages PARTITION(year = 2012, month = 1, day = 1)   SET FILEFORMAT SEQUENCEFILE;
--(The following example demonstrates how to add new SERDEPROPERTIES for the current SerDe:)
ALTER TABLE table_using_JSON_storage
SET SERDEPROPERTIES (
'prop3' = 'value3',
'prop4' = 'value4');
You  can  alter  the  storage  properties :
 ALTER TABLE stocks
CLUSTERED BY (exchange, symbol)
SORTED BY (symbol)
INTO 48 BUCKETS;

--(The SORTED BY clause is optional, but the CLUSTER BY and INTO … BUCKETS are required.)


27. Miscellaneous Alter Table Statements:
Hive CLI:
hive -e 'ALTER TABLE log_messages TOUCH PARTITION(year = 2012, month = 1, day = 1);'
--(This statement won’t create the table or partition if it doesn’t already exist. Use the appropriate creation commands in that case.)
ALTER TABLE log_messages ARCHIVE
PARTITION(year = 2012, month = 1, day = 1);
--( The ALTER TABLE … ARCHIVE PARTITION statement captures the partition files into a Hadoop archive (HAR) file. This only reduces the number of files in the filesystem, reducing the load on the  NameNode, but doesn’t provide any space savings (e.g., through
compression):)
--(To reverse the operation, substitute UNARCHIVE for  ARCHIVE. This feature is only available for individual partitions of partitioned tables.)
various protections are available. The following statements prevent the partition from being dropped and queried:
ALTER TABLE log_messages PARTITION(year = 2012, month = 1, day = 1) ENABLE NO_DROP;
ALTER TABLE log_messages PARTITION(year = 2012, month = 1, day = 1) ENABLE OFFLINE;
--(To reverse either operation, replace ENABLE with DISABLE. These operations also can’t be used with nonpartitioned tables.)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值