Hadoop项目实战4—数据处理

 Hadoop项目实战3—招聘数据预处理


目录

前言

一、上传CSV文件

二、Hive部分

1.创建数据库,将文件Load到Hive中

2.创建Job总表

3.创建分表

三、Sqoop部分



前言

平台搭建可以参考以往的文章。


一、上传CSV文件

上传文件到HDFS中

[root@master ~]# hdfs dfs -put ./Newjob.csv /user/hive/warehouse 
[root@master ~]# hdfs dfs -ls  /user/hive/warehouse

登录网页查看

二、Hive部分

1.创建数据库,将文件Load到Hive中

代码如下:

[root@master hive]# hive
hive (default)> create database job_update;
hive (default)> use job_update;

2.创建Job总表

代码如下:

创建总表

hive >
create external table job(
post string,    
wages string,    
city string,    
experience string,    
education string,
company string,
skill string)
row format serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde' with serdeproperties 
('separatorChar' = ',','quoteChar'= '\"','escapeChar'= '\\') STORED AS TEXTFILE TBLPROPERTIES ('skip.header.line.count'='1');

从hdfs导入csv数据到hive表:

hive (job_update)> load data  inpath '上传的csv文件的路径' into table job;

3.创建分表


职位数量分布情况--
CREATE TABLE job_city as SELECT city ,count(city) AS bigdata FROM job group by city order by bigdata desc;

职位工作方向--
CREATE TABLE job_post AS SELECT post,count(post) AS bigdata FROM job group by post order by bigdata desc;

公司招聘数量--
CREATE TABLE job_company AS SELECT company ,count(company) AS bigdata FROM job group by company order by bigdata desc;

技能需求--
CREATE TABLE job_skill AS SELECT skill ,count(skill) AS bigdata  FROM job group by skill order by bigdata desc;

学历要求--
CREATE TABLE job_education AS SELECT education ,count(education) AS bigdata FROM job GROUP BY education order by bigdata desc; 

工作经验--
CREATE TABLE job_experience AS SELECT experience ,count(experience) AS bigdata FROM job GROUP BY experience order by bigdata desc;

不同工作经验对应薪资--
create TABLE job_experience_wages AS select  experience,wages from job  group by experience,wages order by experience desc;

不同学历对应薪资--
create table job_education_wages as select education,wages from job group by education,wages order by education desc;

职位薪资TOP10--
create TABLE job_top_wages AS select  post,wages from job group by post,wages order by wages desc  limit 10;

三、Sqoop部分


将Hive中数据利用Sqoop导入MySQL

sqoop-export --connect  "jdbc:mysql://192.168.61.130:3306/job_data?useUnicode=true&characterEncoding=utf-8" --username root --password 000000 --table job --export-dir /user/hive/warehouse/job_data.db/job

================================================================================

sqoop-export --connect  "jdbc:mysql://192.168.61.130:3306/job_data?useUnicode=true&characterEncoding=utf-8" --username root --password 000000 --table job_city --export-dir /user/hive/warehouse/job_data.db/job_city  --input-fields-terminated-by '\001' -m 1

================================================================================

sqoop-export --connect  "jdbc:mysql://192.168.61.130:3306/job_data?useUnicode=true&characterEncoding=utf-8" --username root --password 000000 --table job_post --export-dir /user/hive/warehouse/job_data.db/job_post  --input-fields-terminated-by '\001' -m 1

================================================================================

sqoop-export --connect  "jdbc:mysql://192.168.61.130:3306/job_data?useUnicode=true&characterEncoding=utf-8" --username root --password 000000 --table job_experience --export-dir /user/hive/warehouse/job_data.db/job_experience  --input-fields-terminated-by '\001' -m 1

================================================================================

sqoop-export --connect  "jdbc:mysql://192.168.61.130:3306/job_data?useUnicode=true&characterEncoding=utf-8" --username root --password 000000 --table job_education --export-dir /user/hive/warehouse/job_data.db/job_education  --input-fields-terminated-by '\001' -m 1

  • 1
    点赞
  • 9
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

T怪物

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值