MapReduce常见算法练习

目录:

1 数据去重----(预处理:清洗、过滤、去重)

2 数据排序

3 求均值

4 单表关联

5 多表关联

6 日志解析

7 共同好友

8 其他杂例


1 数据去重----(预处理:清洗、过滤、去重)

2018-3-1 a

2018-3-2 b

2018-3-3 c

2018-3-4 d

2018-3-5 a

2018-3-6 b

2018-3-7 c

2018-3-3 c

2 数据排序

  1. 用一个reducer
  2. 用多个reducer(自定义partitioner/用inputsampler抽样后生成partitioner,用totalorder)

变形:分组排序、topk(自己写一遍)

 

 

如原始数据:

2

32

654

32

15

756

65223

5956

22

650

92

要求结果:

1    2

2    6

3    15

4    22

5    26

6    32

7    32

8    54

9    92

10    650

11    654

12    756

13    5956

14    65223

3 求均值

原始数据:

1)math:

 

张三    88

李四    99

王五    66

赵六    77

 

2)chinese:

 

张三    78

李四    89

王五    96

赵六    67

 

3)english:

 

张三    80

李四    82

王五    84

赵六    86

 

输出结果:

张三    82

李四    90

王五    82

赵六    76

4 单表关联

给出child-parent(孩子——父母)表,要求输出grandchild-grandparent(孙子——爷奶)表。

样例输入如下所示。

file:

child              parent

Tom        Lucy

Tom        Jack

Jone        Lucy

Jone        Jack

Lucy       Mary

Lucy       Ben

Jack        Alice

Jack        Jesse

Terry       Alice

Terry       Jesse

Philip      Terry

Philip      Alma

Mark       Terry

Mark       Alma

输出结果:

grandchild      grandparent

Tom        Alice

Tom        Jesse

Jone        Alice

Jone        Jesse

Tom        Mary

Tom        Ben

Jone        Mary

Jone        Ben

Philip      Alice

Philip      Jesse

Mark       Alice

Mark       Jesse

5 多表关联

Map side join

Reduce side join

6 日志解析

简单转换(如字段截取,字符串替代等)

外部字典替换

格式转换(如json,xml等格式转换为plain text)

7 共同好友

原始数据:每个人的好友列表

A:B,C,D,F,E,O

B:A,C,E,K

C:F,A,D,I

D:A,E,F,L

E:B,C,D,M,L

F:A,B,C,D,E,O,M

G:A,C,D,E,F

H:A,C,D,E,O

I:A,O

J:B,O

K:A,C,D

L:D,E,F

M:E,F,G

O:A,H,I,J

……

 

输出结果:每个人和其他各人所拥有的功能好友

A-B C,E,

A-C D,F,

A-D E,F,

A-E B,C,D,

A-F B,C,D,E,O,

A-G C,D,E,F,

A-H C,D,E,O,

A-I  O,

A-J  B,O,

A-K C,D,

A-L D,E,F,

A-M       E,F,

B-C A,

B-D A,E,

……

8 其他杂例

去哪儿网笔试题:

去哪儿旅行的APP每天会产生大量的访问日志。用户【uuid-x】的每一次操作记录会产生一条日志记录,假设用户可以通过单程搜索【search-dancheng】,往返搜索【search-wangfan】等多个入口进入报价详情页【detail】选择航班并完成最后的下订单【submit】购票操作。日志格式如下,请编写Map/Reduce程序完成如下需求(伪代码完成即可)

a) 计算20140510这一天去哪儿旅行APP的订单有多少来自单程搜索,有多少来自往返搜索

日志示例(仅作示例【片段,每天数据量会非常大】):

20140510     09:17:19       uuid-01        search-dancheng       dep=北京&arr=上海&date=20140529&pnvm=0

20140510     09:18:20       uuid-02        search-wangFan       dep=北京&arr=上海&sdate=20140529&edate=20140605

20140510     09:18:23       uuid-01        detail  dep=北京&arr=上海&date=20140529&fcode=CA1810

20140510     09:20:29       uuid-02        detail  dep=北京&arr=上海&date=20140529&fcode=CA1810

20140510     09:21:19       uuid-01        submit         dep=北京&arr=上海&date=20140529&fcode=CA1810&price=1280

20140510     09:23:19       uuid-03        search-dancheng       dep=北京&arr=广州&date=20140529&pnvm=0

20140510     09:25:19       uuid-04        search-dancheng       dep=北京&arr西安&date=20140529&pnvm=0

20140510     09:25:30       uuid-05        search-dancheng       dep=北京&arr=天津&date=20140529&pnvm=0

20140510     09:26:29       uuid-04        detail  dep=北京&arr=西安&上海&date=20140529&fcode=CA1810

20140510     09:28:19       uuid-06        submit         dep=北京&arr=拉萨&date=20140529&fcode=CA1810&price=2260

 

电力公司数据更新日志合并

某公司日志处理需求说明:

              根据系统和关键字查询日志,并将关键字所在行以下10行数据输出或保存到hdfs,最终是把这些数据展示到Web页面。

(关键字所在的数据行与它以下10行数据并没有关联关系,日志数据为很乱的原数据。)

java应用+shell脚本+spark.jar包

java应用负责用户登录后,输入系统、关键字等参数,提交查询,java调用shell脚本-->submit

结果数据保存到hdfs上。保存的该文件用随机数命名,最后在Web页面读取展示出来。

 

样例数据如下:

15-06-10.23:58:02.321 [pool-22-thread-5] INFO  HttpPostMessageSender   -  HttpPostMessageSender resp statusCode: 200 content:success tradeno:2015061010001000070650683

15-06-10.23:58:02.321 [pool-22-thread-5] INFO  HttpPostMessageSender   - HttpPost 是否发送成功  true

15-06-10.23:58:02.321 [pool-22-thread-5] INFO  NotificationServiceImpl  - ****进行入库操作****

15-06-10.23:58:02.324 [pool-22-thread-5] INFO  NotificationServiceImpl  - ****没有此TRADE_NO,新增Notification****tradeNo=2015061010001000070650683

15-06-10.23:58:02.327 [pool-22-thread-5] INFO  NotifyServiceImpl       - ****enter--saveOrUpdateNotity****

15-06-10.23:58:02.330 [pool-22-thread-5] INFO  NotifyServiceImpl       - ****没有此TRADE_NO,新增Notify****tradeNo=2015061010001000070650683

15-06-10.23:58:02.333 [pool-22-thread-5] INFO  ACCESS                  - 2015061010001000070650683,FINISHED_SUCCESS

15-06-10.23:58:04.250 [pool-20-thread-2] INFO  NPPListener             - Received a new message OutTradeNotify{tradeInfo=TradeInfo{outTradeNo='150610263916206067998', tradeNo='2015061010001000070650827', originalTradeNo='null', bizTradeNo='9488771051', tradeType=TRADE_GENERAL, subTradeType=SALE, payMethod=CASHIERGATEMODE, tradeMoney=Money{currency=CNY, amount=10420}, tradeSubject='消费订单', submitter=ThinCustomer{merchantNo='23077370', customerNo='360080000230773708', customerLoginName='null', customerName='null', customerOutName='null'}, seller=ThinCustomer{merchantNo='23077370', customerNo='360080000230773708', customerLoginName='null', customerName='null', customerOutName='null'}, sellerAccountNo='360080000230773708000811', buyer=ThinCustomer{merchantNo='null', customerNo='360000000260175680', customerLoginName='null', customerName='null', customerOutName='null'}, tradeStatus=TRADE_FINISHED, createdDate=Wed Jun 10 23:57:32 CST 2015, deadlineTime=null, tradeFinishedDate='20150610', tradeFinishedTime='235802', payTool=EXPRESS, bankCode='CEB', exchangeDate='null', exchangeRate='null', returnParams='null', oldGWV60AuthCode='null', oldEXV10TerminalNo='null', clearingCurrency=null, clearingMoney=null, tradeExtInfo=TradeExtInfo{notifyStatus='NOT', outMessageId='null', cardSha1='null', signNo='null', returnParams='null', extendParams='null', pageBackUrl='null', serverNotifyUrl='http://gw.jd.com/payment/notify_chinabankReal.action', notifySmsMoible='null', notifyMailAddress='null', innerMessageFormat='XML', apiMessageFormat='EX_V1.0', requestCharset='UTF-8', encryptType='3DES', signType='MD5', requestModule='null', requestVersion='null', remoteIp='109.145.60.24', receivingChannel='JDSC', requestProtocol='HTTP', requestMethod='null', outTradeDate='20150610', outTradeTime='235731', outTradeIp='109.145.60.24', outRefererHosts='null', retryCount=1}, ext=null}}, OutMessageNotify{apiMessageFormat=null, messageFormat=null, notifyCharset='null', signType='null', encryptType='null'}, MessageNotify{responseModule='null', responseCode='null', responseDesc='null'}

15-06-10.23:58:04.253 [pool-20-thread-2] INFO  KeyServiceImpl          - Calling SecurityService to get {} key for merchant {} with codeClass {}23077370KeyTypeEnum{code='3DES', cnName='三DES'}EXPRESS

15-06-10.23:58:04.264 [pool-20-thread-2] INFO  CustomerCenterFacade    - [INVOCATION_LOG_C] 2015-06-10.23:58:04.264;pool-20-thread-2;172.17.92.48:0->172.17.87.47:20996;com.wangyin.customer.api.CustomerCenterFacade:1.1.6.getMerchantCustomerKeys(com.wangyin.customer.common.dto.customer.CustomerParamDTO);***;2015-06-10.23:58:04.253;RESULT:***;11,112,359;

15-06-10.23:58:04.264 [pool-20-thread-2] INFO  KeyServiceImpl          - 获取的3DES 密钥值为20B0984A9B751F0B911A1AEA0738D557AE16548CCE029E2A

15-06-10.23:58:04.264 [pool-20-thread-2] INFO  KeyServiceImpl          - Calling SecurityService to get {} key for merchant {} with codeClass {}23077370KeyTypeEnum{code='SALT', cnName='签名密钥'}EXPRESS

15-06-10.23:58:04.270 [pool-24-thread-4] INFO  NPPListener             - Received a new message OutTradeNotify{tradeInfo=TradeInfo{outTradeNo='22015061023575751670871914', tradeNo='2015061010001000070651406', originalTradeNo='null', bizTradeNo='null', tradeType=TRADE_GENERAL, subTradeType=SALE, payMethod=APIEXPRESSMODE, tradeMoney=Money{currency=CNY, amount=500000}, tradeSubject='消费订单', submitter=ThinCustomer{merchantNo='22843776', customerNo='360080000228437761', customerLoginName='null', customerName='null', customerOutName='null'}, seller=ThinCustomer{merchantNo='22843776', customerNo='360080000228437761', customerLoginName='null', customerName='null', customerOutName='null'}, sellerAccountNo='360080000228437761000811', buyer=null, tradeStatus=TRADE_FINISHED, createdDate=Wed Jun 10 23:57:57 CST 2015, deadlineTime=null, tradeFinishedDate='20150610', tradeFinishedTime='235802', payTool=EXPRESS, bankCode='ICBC', exchangeDate='null', exchangeRate='null', returnParams='22894010', oldGWV60AuthCode='null', oldEXV10TerminalNo='00000002', clearingCurrency=null, clearingMoney=null, tradeExtInfo=TradeExtInfo{notifyStatus='NOT', outMessageId='API.150610.0ddf8c2f7ed94f3e9f741cd44500a866', cardSha1='5D72C7755A82576EE906BAB8314164ABAC513C9C', signNo='201505110010089270009113541', returnParams='22894010', extendParams='null', pageBackUrl='null', serverNotifyUrl='http://jrb-api.d.chinabank.com.cn/notify/quick.htm', notifySmsMoible='null', notifyMailAddress='null', innerMessageFormat='XML', apiMessageFormat='EX_V1.0', requestCharset='UTF-8', encryptType='3DES', signType='MD5', requestModule='null', requestVersion='null', remoteIp='172.17.80.168', receivingChannel='API', requestProtocol='HTTP', requestMethod='POST', outTradeDate='null', outTradeTime='null', outTradeIp='null', outRefererHosts='null', retryCount=1}, ext=null}}, OutMessageNotify{apiMessageFormat=null, messageFormat=null, notifyCharset='null', signType='null', encryptType='null'}, MessageNotify{responseModule='null', responseCode='null', responseDesc='null'}

15-06-10.23:58:04.271 [pool-20-thread-2] INFO  CustomerCenterFacade    - [INVOCATION_LOG_C] 2015-06-10.23:58:04.271;pool-20-thread-2;172.17.92.48:0->172.17.91.104:20996;com.wangyin.customer.api.CustomerCenterFacade:1.1.6.getMerchantCustomerKeys(com.wangyin.customer.common.dto.customer.CustomerParamDTO);***;2015-06-10.23:58:04.264;RESULT:***;6,971,110;

15-06-10.23:58:04.272 [pool-20-thread-2] INFO  KeyServiceImpl          - 获取MD5 TOKEN 密钥的值为1qaz2wsx3edc

15-06-10.23:58:04.273 [pool-20-thread-2] INFO  NPPNotifyProcessorImpl  - ApiMessageFormatEX_V1.0

15-06-10.23:58:04.273 [pool-24-thread-4] INFO  KeyServiceImpl          - Calling SecurityService to get {} key for merchant {} with codeClass {}22843776KeyTypeEnum{code='3DES', cnName='三DES'}EXPRESS

15-06-10.23:58:04.273 [pool-20-thread-2] INFO  NPPNotifyProcessorImpl  - 转化为NotificationDTO的结果为: com.wangyin.npp.notify.facade.dto.NotificationDTO@54b27890

15-06-10.23:58:04.273 [pool-20-thread-2] INFO  NotificationServiceImpl  - 准备入库(可能会入库)的 notification=Notification [TRADE_NO=2015061010001000070650827, SOURCE_NAME=NPP_PAYMENT_COMPLETE, FROM_ADDRESS=EXPRESS, FROM_NAME=23077370, TO_ADDRESS=http://gw.jd.com/payment/notify_chinabankReal.action, CHANNEL=HTTP_POST, SUBJECT=null, CONTENT=resp=PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiPz4NCjxDSElOQUJBTks%2BCiAgPFZFUlNJT04%2BMS4wLjA8L1ZFUlNJT04%2BCiAgPE1FUkNIQU5UPjIzMDc3MzcwPC9NRVJDSEFOVD4KICA8VEVSTUlOQUw%2BMDAwMDAwMDE8L1RFUk1JTkFMPgogIDxEQVRBPllFbm10T0Zkb0RBK0tHdmhVZmJBZTlKOVZDOC9ONGx1YW5uMlBTRFF0L0VNTUh3eHR6L29tYi9vdlArTjAybnlsTGdhbUhCVDBZYVpBMUxoSC9iV3RndmoxN0JMTDhPTFc3U3laZmMxMU5sczRqSFdGeUR1UHNsb3F4YU51aFdUUFFDTzljMCtrTFpDZkpuZHB6d2sxN3J4dU5mRGVuYmljZ21kWHphSlhQNElQZzFKQ2h1ZGRRNWdTQTQ4UWVPVEE0UUhJYUsyQVFJNTNZQU03RHdQWFBrZkNPMythRUgvMk5oeGJMRmtYMTEvalJWUUI0NDM1K2FtSm1zclE0UFJ5cVVSWmx6eGVJQk5XNU4xZnZjMUE1NXRVa1RmRjNWc1orWjU2WkdydFoyQzdnQ3BWNkxqOUNDUWlzbjhKMEd3Z2JLS0kvdUMyUVNDTHJOMUl3YU8waSsxUUFIVWdPRGRtTFZHUGxhSTBqTS85UWVmY0Q2R0FjaVJua214R

  • 1
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值