pig分析脚本

发布一个k8s部署视频:https://edu.csdn.net/course/detail/26967

课程内容:各种k8s部署方式。包括minikube部署,kubeadm部署,kubeasz部署,rancher部署,k3s部署。包括开发测试环境部署k8s,和生产环境部署k8s。

腾讯课堂连接地址https://ke.qq.com/course/478827?taid=4373109931462251&tuin=ba64518

第二个视频发布  https://edu.csdn.net/course/detail/27109

腾讯课堂连接地址https://ke.qq.com/course/484107?tuin=ba64518

介绍主要的k8s资源的使用配置和命令。包括configmap,pod,service,replicaset,namespace,deployment,daemonset,ingress,pv,pvc,sc,role,rolebinding,clusterrole,clusterrolebinding,secret,serviceaccount,statefulset,job,cronjob,podDisruptionbudget,podSecurityPolicy,networkPolicy,resourceQuota,limitrange,endpoint,event,conponentstatus,node,apiservice,controllerRevision等。

第三个视频发布:https://edu.csdn.net/course/detail/27574

详细介绍helm命令,学习helm chart语法,编写helm chart。深入分析各项目源码,学习编写helm插件
————————————————------------------------------------------------------------------------------------------------------------------

--读取数据
data = LOAD '/user/mapred/PigData.txt' USING PigStorage('|') AS ( imsi:chararray,time:chararray,loc:chararray);


--转换格式
REGISTER /home/mapred/software/hadoops/pig/pig-0.11.1/contrib/piggybank/java/piggybank.jar;
REGISTER /home/mapred/practise/joda-time-2.0.jar;


DEFINE CustomFormatToISO org.apache.pig.piggybank.evaluation.datetime.convert.CustomFormatToISO();


toISO = FOREACH data GENERATE imsi, CustomFormatToISO( SUBSTRING(time,0,13),'YYYY-MM-dd HH') AS time:chararray,loc;


--数据分组
grp = GROUP toISO BY imsi;


--连续获取数据
REGISTER /home/mapred/practise/datafu-1.2.0.jar
DEFINE MarkovPairs datafu.pig.stats.MarkovPairs();


pairs = FOREACH grp
{
sorted = ORDER toISO BY time;
pair = MarkovPairs(sorted);
GENERATE FLATTEN(pair) AS (data:tuple(imsi,time,loc),next:tuple(imsi,time,loc) );
}


--展开数据
prj = FOREACH pairs GENERATE data.imsi AS imsi,data.time AS time,next.time AS next_time,data.loc AS loc,next.loc AS next_loc;




DEFINE ISODaysBetween org.apache.pig.piggybank.evaluation.datetime.diff.ISODaysBetween();


flt = FILTER prj BY ISODaysBetween(next_time, time) == 0L;




--计算每一个位置的总数


total_count = FOREACH (GROUP flt BY loc) GENERATE group AS loc,COUNT(flt) AS total;


--计算每一对位置的数目
pairs_count = FOREACH (GROUP flt by (loc,next_loc) ) GENERATE FLATTEN(group) AS (loc,next_loc),COUNT(flt) AS cnt;




jnd = JOIN pairs_count BY loc,total_count BY loc USING 'replicated';


prob = FOREACH jnd GENERATE pairs_count::loc AS loc, pairs_count::next_loc AS next_loc,(double)cnt/(double)total AS probability;


top3 = FOREACH (GROUP prob BY loc)
{
sorted = ORDER prob BY probability DESC;
top = LIMIT sorted 3;
GENERATE FLATTEN(top);
};


STORE top3 INTO 'output';


cat output;


 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

hxpjava1

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值