作业电影评分系统 HIVE实战 正则表达式(限于string)解决了HIVE 源文件多个分隔符的问题

 root@master:/usr/local/hadoop-2.6.0/sbin# start-dfs.sh
Starting namenodes on [master]
master: starting namenode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-root-namenode-master.out
worker1: starting datanode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-root-datanode-worker1.out
worker6: starting datanode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-root-datanode-worker6.out
worker4: starting datanode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-root-datanode-worker4.out
worker7: starting datanode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-root-datanode-worker7.out
worker3: starting datanode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-root-datanode-worker3.out
worker5: starting datanode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-root-datanode-worker5.out
worker8: starting datanode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-root-datanode-worker8.out
worker2: starting datanode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-root-datanode-worker2.out
Starting secondary namenodes [0.0.0.0]



root@master:/usr/local/hadoop-2.6.0/sbin# start-yarn.sh   
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop-2.6.0/logs/yarn-root-resourcemanager-master.out
worker4: starting nodemanager, logging to /usr/local/hadoop-2.6.0/logs/yarn-root-nodemanager-worker4.out
worker7: starting nodemanager, logging to /usr/local/hadoop-2.6.0/logs/yarn-root-nodemanager-worker7.out
worker8: starting nodemanager, logging to /usr/local/hadoop-2.6.0/logs/yarn-root-nodemanager-worker8.out
worker2: starting nodemanager, logging to /usr/local/hadoop-2.6.0/logs/yarn-root-nodemanager-worker2.out
worker3: starting nodemanager, logging to /usr/local/hadoop-2.6.0/logs/yarn-root-nodemanager-worker3.out
worker5: starting nodemanager, logging to /usr/local/hadoop-2.6.0/logs/yarn-root-nodemanager-worker5.out
worker6: starting nodemanager, logging to /usr/local/hadoop-2.6.0/logs/yarn-root-nodemanager-worker6.out
worker1: starting nodemanager, logging to /usr/local/hadoop-2.6.0/logs/yarn-root-nodemanager-worker1.out


电影表
^(.*)::(.*)::(.*)$
说明: .* 表示任意字符,()分组 


用户表
^(.*)::(.*)::(.*)::(.*)::(.*)$


评分表
^(.*)::(.*)::(.*)::(.*)$




建立电影表
//CREATE TABLE movies(MovieID BigInt, Title String, Genres String) ROW FORMAT DELIMITED FIELDS TERMINATED BY '::';
//LOAD DATA LOCAL INPATH '/usr/local/IMF_testdata/movie20161202/movies.dat' INTO TABLE movies;
//CREATE TABLE movies(MovieID BigInt, Title String, Genres String) ROW FORMAT   serde 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' with serdeproperties ('input.regex' = '^(.*)::(.*)::(.*)$' , 'output.format.string' = '%1$s%2$s%3$s') stored as textfile;


//ok
CREATE TABLE movies(MovieID String, Title String, Genres String) ROW FORMAT   serde 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' with serdeproperties ('input.regex' = '^(.*)::(.*)::(.*)$' , 'output.format.string' = '%1$s%2$s%3$s') stored as textfile;
LOAD DATA LOCAL INPATH '/usr/local/IMF_testdata/movie20161202/movies.dat' INTO TABLE movies;



//ok
CREATE TABLE users(UserID String, Gender String, Age String, Occupation String, Zipcode String) PARTITIONED BY (dt String) ROW FORMAT  serde 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' with serdeproperties ('input.regex' = '^(.*)::(.*)::(.*)::(.*)::(.*)$' , 'output.format.string' = '%1$s%2$s%3$s%4$s%5$s') stored as textfile;


LOAD DATA LOCAL INPATH '/usr/local/IMF_testdata/movie20161202/users.dat' INTO TABLE users PARTITION(dt='20161101');



CREATE TABLE ratings(UserID String, MovieID String, Rating String, Timestamped String) ROW FORMAT  serde 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' with serdeproperties ('input.regex' = '^(.*)::(.*)::(.*)::(.*)$' , 'output.format.string' = '%1$s%2$s%3$s%4$s') stored as textfile;
LOAD DATA LOCAL INPATH '/usr/local/IMF_testdata/movie20161202/ratings.dat' INTO TABLE ratings;




 SELECT  users.UserID, users.Age, users.Gender FROM ratings JOIN users ON (ratings.UserID = users.UserID) WHERE ratings.MovieID = 2916 ;



SELECT  /*+MAP JOIN(MOVIES)*/ users.UserID, users.Age, users.Gender,movies.title,movies.gendre  FROM ratings JOIN users ON (ratings.UserID = users.UserID) JOIN movies ON (rating.movie.id = movies .movid) WHERE ratins.MovieID = 2916 ;







  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

段智华

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值