pig进阶

一个练习:

1、student表
1:zhangsan:boy:12:compute
3:lisi:boy:32:math
2:xiaoli:girl:23:meish
模式:
A = load 'student' using PigStorage(':') as (sno:chararray,sname:chararray,ssex:chararray,sage:int,sdept:chararray);
2、课程表(课程名,课程号,先修课程号,学分)
01,English,,4
02,Data structure,05,2
03,DataBase,02,2
04,DB design,03,3
05,C languange,,3
06,Pricinples of NetWork,07,3
07,OS,05,3

模式:
Course(cno:chararray,cname:chararray,cpno:chararray,ccredit:int)
3、学生和课程表对应的选课表(学号,课程号,成绩)
1,01,92
1,03,84
2,01,23
3,05,23
2,05,89
1,04,90
模式:
SC(sno:chararray,cno:chararray,grade:int)

---------------
1、计算每个学生的平均成绩
 1、加载学生表和选课表
 2、对学生表和选课表通过学号进行连接
 3、基于学号对连接生成的表进行分组
 4、计算平均成绩

-- load
A = load '/input/student' using PigStorage(':') as (sno:chararray,sname:chararray,ssex:chararray,sage:int,sdept:chararray);
B = load '/input/course' using PigStorage(',') as(cno:chararray,cname:chararray,ccredit:int);
C = load '/input/SC' using PigStorage(',') as(sno:chararray,cno:chararray,grade:int);
-- join
D = join A by sno,C by sno;
-- group
E = foreach D generate A::sno,sname,grade;
F = group E by (sno,sname);
-- avg
G = foreach F generate group.sname,(SUM(E.grade)/COUNT(E)); -- 此处COUNT可以运行

2、找出不及格成绩的学生(姓名,课程名,成绩)
-- load
A = load '/input/student' using PigStorage(':') as (sno:chararray,sname:chararray,ssex:chararray,sage:int,sdept:chararray);
B = load '/input/course' using PigStorage(',') as(cno:chararray,cname:chararray,ccredit:int);
C = load '/input/SC' using PigStorage(',') as(sno:chararray,cno:chararray,grade:int);
-- 过滤C选课表(不及格的)
D = filter C by grade < 60;
-- join
E = join D by sno,A by sno;
-- join
F = join E by cno,B by cno;
-- output
G = foreach F generate sname,cname,grade;

3、找出修了C language课程的学生(姓名即可)
-- load
A = load '/input/student' using PigStorage(':') as (sno:chararray,sname:chararray,ssex:chararray,sage:int,sdept:chararray);
B = load '/input/course' using PigStorage(',') as(cno:chararray,cname:chararray,ccredit:int);
C = load '/input/SC' using PigStorage(',') as(sno:chararray,cno:chararray,grade:int);

-- 过滤
D = filter B by cname matches 'C.+';
E = foreach D generate cno;
F = join E by cno,C by cno;
G = join F by C::sno,A by sno;
H = foreach G generate A::sname;

-- 嵌套操作(这个下面是有错误的,嵌套的时候对字段的名称需要确定好,对于cno就丢失了)
E = foreach (filter B by cname matches 'C.+') generate cno;
F = join (foreach (filter B by cname matches 'C.+') generate cno) by cno,C by cno;
G = join (join (foreach (filter B by cname matches 'C.+') generate cno) by cno,C by cno) by C::sno,A by sno;
H = foreach (join (join (foreach (filter B by cname matches 'C.+') generate cno) by cno,C by cno) by C::sno,A by sno) generate A::sname;

 

转载于:https://www.cnblogs.com/jsunday/p/3872752.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值