读入后分组,要快

【问题】

I am writing a script in perl. but got stuck in one part. Below is the sample of my csv files.

"MP","918120197922","20150806125001","prepaid","prepaid","3G","2G"

"GJ","919904303790","20150806125002","prepaid","prepaid","2G","3G"

"MH","919921990805","20150806125003","prepaid","prepaid","2G",

"MP","918120197922","20150806125001","prepaid","prepaid","3G","2G"
"GJ","919904303790","20150806125002","prepaid","prepaid","2G","3G"
"MH","919921990805","20150806125003","prepaid","prepaid","2G",
"MP","918120197922","20150806125004","prepaid","prepaid","2G",
"MUM","919904303790","20150806125005","prepaid","prepaid","2G","3G"
"MUM","918652624178","20150806125005","prepaid","prepaid","2G","3G"
"MP","918120197922","20150806125005","prepaid","prepaid","2G","3G"

Now I need to take unique records on the basis of 2nd column (i.e. mobile numbers) but considering only the latest value of 3rd column (ie timestamp) eg: for mobile number "918120197922".

"MP","918120197922","20150806125001","prepaid","prepaid","3G","2G"
"MP","918120197922","20150806125004","prepaid","prepaid","2G"
"MP","918120197922","20150806125005","prepaid","prepaid","2G","3G"

it should select the 3rd record as it has the latest value of timestamp (20150806125005). Please help.

Additional Info: Sorry for inconsistency in data..I have rectified it now. Yes data is in order which means latest timestamp will appear in the latest rows. One more thing that my file has the size of more than 1 gb so is there any way to efficiently do this? Will awk work faster than perl in this case. Please help?

【回答】

       算法不难,就是求各组最大值,不过文本的解析一向很慢,应当尽量用多线程,另外分组时要用hash方法,简单的遍历比较很慢。Perl写这些代码有些繁琐,建议用SPL写,脚本会简单一些:

A
1=file("d:\\source.csv").cursor@qmc()
2=A1.groups(#2;top(-1;#3):a)
3=A2.(a).conj()
4=file("d:\\result.csv").export@c(A3)

       A1:读取文件source.csv中的内容,剥离引号,返回成多路游标。

       A2:多路游标,每路游标先按照第2列分组,再选出每组中第3列最大的值对应的记录。

       A3:合并。

       A4:将A3结果导入到文件result.csv。 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值