1、创建tidanic(泰坦尼克)表:
首先创建taitan的数据库,再创建tidanic的表:
create database taitan;
create table tidanic(
passengerid int,
survived int,
pclass int,
name string,
sex string,
age int,
sibsp int,
parch int,
ticket string,
fare double,
cabin string,
embarked string)
row format delimited fields terminated by ',';
表中的表头信息来自于train.csv文件中的表头,将train.csv利用xftp上传到otp目录下,如图:
train.csv文件的内容如图,应为表头中含有大写字母,所以需要在xshell中新开一个窗口进行表头改写:
使用命令vi train.csv进入文件删除第一行:
使用cat命令查看文件内容:
然后回到hive的窗口进行文件的传输:
dfs -put /opt/train.csv /user/hive/warehouse/taitan.db/tidanic;
然后进行文件的查看:
2、静态分区表:
创建表,分区字段为gender:
create table tidanic_part(
passengerid int,
survived int,
pclass int,
name string)
partitioned by(gender string)
row format delimited fields terminated by ',';
按照字段sex进行分区:
insert overwrite table tidanic_part partition(gender='female') select passengerid,survived,pclass,name from tidanic where sex='female';
insert overwrite table tidanic_part partition(gender='male') select passengerid,survived,pclass,name from tidanic where sex='male';
创建动态分区表:
create table tidanic_dynamic_part1(
passengerid int,
survived int,
name string)
partitioned by(passengerclass string)
row format delimited fields terminated by ',';
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nostrict;
insert overwrite table tidanic_dynamic_part1 partition(passengerclass) select passengerid,survived,nam,pclass from tidanic;