Hive 插入数据 显示hive表详细信息

describe tablename;

describe extended tablename;

describe FORMATTED tablename; 可以显示更详细的信息,是否压缩 分隔符号等

Replacing EXTENDED with FORMATTED provides more readable but also more verbose
output.

建表

create table callbacks_part
(
ID INT,
BOOKMARK_STEP_ID INT,
CLIENT_ID INT,
TXN_ID STRING,
SESSION_ID STRING,
IP_ADDRESS STRING,
METHOD STRING,
PAGE_ID INT,
RULE_NAME_ID INT,
CALLBACK_TYPE_ID INT,
PAGE_ELEMENT_ID INT,
CALLBACK_SENSE_ID INT,
CALLBACK_INFO STRING,
NUM_OCCURANCES INT,
UNIQUE_ID STRING,
EXEC_DATE STRING,
EXEC_DURATION FLOAT
)
PARTITIONED BY(EXEC_DATE STRING)
CLUSTERED BY(PAGE_ID) SORTED BY(PAGE_ID) INTO 32 BUCKETS
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE;

You don't need to specify "EXEC_DATE STRING," in your column definition as
it is already specified by "PARTITIONED BY(EXEC_DATE STRING)
 
PARTITIONED BY(EXEC_DATE STRING) 有了 就不要再create的字段里面写了


partion的终极解释,partion的字段不属于表结构的内容,他根本不存在表结构中,它的存在只是分区用,hdfs文件目录下会有以他为名字的目录!


hive> LOAD DATA LOCAL INPATH '/home/hadoop/xiaoft/hivedata/' overwrite INTO TABLE gprs_log_zip_bak PARTITION (day='20131108');

load编码问题 unicode utf-8


向管理表中加载数据

 因为,Hive没有 行级别的插入,更新和删除操作,往表中插入数据的唯一方法就是 使用成批载入操作。

或者你可以通过其他的工具向正确的目录写入数据。


当文件格式不一样时候解决方法:

http://stackoverflow.com/questions/16402538/is-there-a-way-to-load-csv-data-into-a-binary-hive-format/16404515#16404515

You can change this in your config files if you want these setting to persist outside the session (including other peoples Hive and MapReduce jobs if you are sharing a cluster). I use SnappyCodec because it works with Hive out of the box, is splittable, and gives good compression/decompression for the CPU time spent. You might decide a different codec is more suitable to your needs.

Now how do you apply all these options if all your data is in a CSV format? The easiest way is to create a table on top of the CSV files, then create another table with the fileformat and SerDe you want, then insert the data from the CSV backed table into the new table (making sure that you are compression your Hive output with your codec of choice). Under the hood, Hive will take care of reading the data from one format (CSV) and writing to another (whatever you decided). After this you will have a duplicate of the data and you can drop the CSV files if you desire.

CREATE EXTERNAL TABLE csv_table (id INT, name STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ","
LOCATION /user/muehlbau/yourData;

CREATE TABLE binary_table (id INT, name STRING)
ROW FORMAT SERDE org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
STORED AS SEQUENCEFILE;

set hive.exec.compress.output=true;
set mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec

INSERT OVERWRITE TABLE binary_table
SELECT * FROM csv_table

The example above demonstrates how you could take advantage of all the options available to you, but do not think of it as a default, reasonable use case. Read up on the different file formats / SerDes / compression codecs and do some performance testing to settle on your approach.


INSERT OVERWRITE TABLE gprs_log_zip partition(day=20131208) SELECT time,

uid,

site,

label,

content,

flow,

cost,

net_type,

gprs_type,

keywords,

user_agent,

multi_dimension,

content_name,

bearer_type,

service_type,

log_time,

url_stop_time,

down_flow,

url,

brand_id,

product_id,

theme_product,

validity_code,

dev,

mod,

sys,

ver,

web,

cli,

act,

topics

FROM  gprs_log_zip_bak













评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值