26个数据分析案例——第二站:基于Hive的民航客户价值分析
实验所需环境
• Python: Python 3.x;
• Hadoop2.7.2环境;
• Hive2.2.0
数据说明
资料包
链接:https://pan.baidu.com/s/1Uzx5g2r54k9Q2PYK5_DlTQ
提取码:irq2
实验步骤
第一步:加载数据集
1、在Hive中创建名为air_data_base的数据库。
hive > create database air_data_base;
hive > use air_data_base;
运行结果为:
2、在上述创建的数据苦衷创建名为air_data_table的表。
hive> create table air_data_table(
member_no string,
ffp_date string,
first_flight_date string,
gender string,
ffp_tier int,
work_city string,
work_province string,
work_country string,
age int,
load_time string,
flight_count int,
bp_sum bigint,
ep_sum_yr_1 int,
ep_sum_yr_2 bigint,
sum_yr_1 bigint,
sum_yr_2 bigint,
seg_km_sum bigint,
weighted_seg_km double,
last_flight_date string,
avg_flight_count double,
avg_bp_sum double,
begin_to_first int,
last_to_end int,
avg_interval float,
max_interval int,
add_points_sum_yr_1 bigint,
add_points_sum_yr_2 bigint,
exchange_count int,
avg_discount float,
p1y_flight_count int,
l1y_flight_count int,
p1y_bp_sum bigint,
1y_bp_sum bigint,
ep_sum bigint,
add_point_sum bigint,
eli_add_point_sum bigint,
l1y_eli_add_points bigint,
points_sum bigint,
l1y_points_sum float,
ration_l1y_flight_count float,
ration_p1y_flight_count float,
ration_p1y_bps float,
ration_l1y_bps float,
point_notflight int
)
row format delimited fields terminated by ',';
运行结果为:
3、在/usr/local目录下创建名为aviation_data的文件夹并将将名为air_data.csv的数据集上传到该目录下,结果为:
4、将数据加载到名为air_data_table的数据表中
hive > load