新建表并从HDFS中导入数据:
CREATE EXTERNAL TABLE webpage
(page_id SMALLINT,
name STRING,
assoc_files STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LOCATION '/loudacre/webpage'
在Impala或者Hive中使用之前生成的、已放入HDFS的文件来创建AVRO表:
CREATE EXTERNAL TABLE accounts_avro
STORED AS AVRO
LOCATION '/loudacre/accounts_avro'
TBLPROPERTIES ('avro.schema.url'=
'hdfs:/loudacre/sqoop_import_accounts.avsc ')
根据已有的accounts_avro创建parquet格式表(hive中加external会报错,使用时可以去掉external):
create external table accounts_parquet stored as parquet
location '/loudacre/accounts_parquet/' as select * from accounts_avro;
创建分区表:
CREATE EXTERNAL TABLE accounts_by_areacode (
acct_num INT,
first_name STRING,
last_name STRING,
phone_number STRING)
PARTITIONED BY (areacode STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LOCATION '/loudacre/accounts_by_areacode'
修改hive配置以支持动态分区(在Impala里面一般来说可以不用这部):
hive> set hive.exec.dynamic.partition=true;
hive> set hive.exec.dynamic.partition.mode=nonstrict;
将accounts_avro表的数据写入:
INSERT INTO TABLE accounts_by_areacode
PATITIONED(areacode)
SELECT acct_num, first_name, last_name, phone_number, SUBSTR(phone_number,1,3) AS areacode FROM accounts_avro;