-
LazySimpleSerDe: The default SerDe that’s used with the TEXTFILE format; it would be used with our_first_table from the following listing.
(A) $ $HIVE_HOME/bin hive --service cli (B) hive> set hive.cli.print.current.db=true; (C) hive (default)> CREATE DATABASE ourfirstdatabase; OK Time taken: 3.756 seconds (D) hive (default)> USE ourfirstdatabase; OK Time taken: 0.039 seconds (E) hive (ourfirstdatabase)> CREATE TABLE our_first_table ( > FirstName STRING, > LastName STRING, > EmployeeId INT); OK Time taken: 0.043 seconds hive (ourfirstdatabase)> quit; (F) $ ls /home/biadmin/Hive/warehouse/ourfirstdatabase.db our_first_table
It also would be used with data_types_table from the following listing.
$ ./hive --service cli hive> CREATE DATABASE data_types_db; OK Time taken: 0.119 seconds hive> USE data_types_db; OK Time taken: 0.018 seconds (1)Hive> CREATE TABLE data_types_table ( (2) > our_tinyint TINYINT COMMENT '1 byte signed integer', (3) > our_smallint SMALLINT COMMENT '2 byte signed integer', (4) > our_int INT COMMENT '4 byte signed integer', (5) > our_bigint BIGINT COMMENT '8 byte signed integer', (6) > our_float FLOAT COMMENT 'Single precision floating point', (7) > our_double DOUBLE COMMENT 'Double precision floating point', (8) > our_decimal DECIMAL COMMENT 'Precise decimal type based (9) > on Java BigDecimal Object', (10) > our_timestamp TIMESTAMP COMMENT 'YYYY-MM-DD HH:MM:SS.fffffffff" (11) > (9 decimal place precision)', (12) > our_boolean BOOLEAN COMMENT 'TRUE or FALSE boolean data type', (13) > our_string STRING COMMENT 'Character String data type', (14) > our_binary BINARY COMMENT 'Data Type for Storing arbitrary (15) > number of bytes', (16) > our_array ARRAY<TINYINT> COMMENT 'A collection of fields all of (17) > the same data type indexed BY (18) > an integer', (19) > our_map MAP<STRING,INT> COMMENT 'A Collection of Key,Value Pairs (20) > where the Key is a Primitive (21) > Type and the Value can be (22) > anything. The chosen data (23) > types for the keys and values (24) > must remain the same per map', (25) > our_struct STRUCT<first : SMALLINT, second : FLOAT, third : STRING> (26) > COMMENT 'A nested complex data (27) > structure', (28) > our_union UNIONTYPE<INT,FLOAT,STRING> (29) > COMMENT 'A Complex Data Type that can (30) > hold One of its Possible Data (31) > Types at Once') (32) > COMMENT 'Table illustrating all Apache Hive data types' (33) > ROW FORMAT DELIMITED (34) > FIELDS TERMINATED BY ',' (35) > COLLECTION ITEMS TERMINATED BY '|' (36) > MAP KEYS TERMINATED BY '^' (37) > LINES TERMINATED BY 'n' (38) > STORED AS TEXTFILE (39) > TBLPROPERTIES ('creator'='Bruce Brown', 'created_at'='Sat Sep 21 20:46:32 EDT 2013'); OK Time taken: 0.886 seconds
-
ColumnarSerDe: Used with the RCFILE format.
-
RegexSerDe: The regular expression SerDe, which ships with Hive to enable the parsing of text files, RegexSerDe can form a powerful approach for building structured data in Hive tables from unstructured blogs, semi-structured log files, e-mails, tweets, and other data from social media. Regular expressions allow you to extract meaningful information (an e-mail address, for example) with HiveQL from an unstructured or semi-structured text document incompatible with traditional SQL and RDBMSs.
-
HBaseSerDe: Included with Hive to enables it to integrate with HBase. You can store Hive tables in HBase by leveraging this SerDe.
-
JSONSerDe: A third-party SerDe for reading and writing JSON data records with Hive. You can quickly find (via Google and GitHub) two JSON SerDes by searching online for the phrase json serde for hive.
-
AvroSerDe: Included with Hive so that you can read and write Avro data in Hive tables.
DEFINING TABLE RECORD FORMATS IN HIVE
最新推荐文章于 2024-11-03 16:58:42 发布