记一次数据处理的过程

    由于所在公司是主要做短信行业,平时和手机号码打交道较多,各种奇葩需求也比较多,近期接到一个一个总监的奇葩需求,就是将两个文件中相同的手机号码弄处理,由于编程水平以及excel玩的有限,所以只能自己想其他额办法解决,首先每个文件有好几个字段,不过全是结构化数据,格式如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
15994710001,2016 /11/3  0:24,53100010
15994710001,2016 /11/3  0:24,53100010
15001313373,2016 /11/3  3:39,53100010
13937713309,2016 /11/3  6:16,53100010
13758943333,2016 /11/3  7:19,53100010
13868044333,2016 /11/3  8:33,53100010
13500732333,2016 /11/3  10:29,53100010
13523072333,2016 /11/3  10:30,53100010
15138132777,2016 /11/3  10:31,53100010
13960985779,2016 /11/3  10:45,53100010
此文件有4000多行,
文件2 字段比较多,恰好一部分内容乱码,所以也算保护个人隐私吧。
"311-SD10658" 2114781676479382330 "," 13703774555 "," 11λP50rit "," 1 "," 2016 /11/3  10:07:43 "," 2016 /11/3  10:07:41 "," 0 "," DELIVRD"
"311-SD10658" 2114781676479382330 "," 15920510111 "," 11λP50rit "," 1 "," 2016 /11/3  10:07:43 "," 2016 /11/3  10:07:41 "," 0 "," DELIVRD"
"311-SD10658" 2114781676479382330 "," 18319609333 "," 11λP50rit "," 1 "," 2016 /11/3  10:07:43 "," 2016 /11/3  10:07:41 "," 0 "," DELIVRD"
"311-SD10658" 2114781676479382330 "," 15221090555 "," 11λP50rit "," 1 "," 2016 /11/3  10:07:43 "," 2016 /11/3  10:07:41 "," 0 "," DELIVRD"
"311-SD10658" 2114781676479382330 "," 13905879555 "," 11λP50rit "," 1 "," 2016 /11/3  10:07:43 "," 2016 /11/3  10:07:41 "," 0 "," DELIVRD"
"311-SD10658" 2114781676479382330 "," 13818586777 "," 11λP50rit "," 1 "," 2016 /11/3  10:07:43 "," 2016 /11/3  10:07:41 "," 0 "," DELIVRD"
"311-SD10658" 2114781676479382330 "," 13916387773 "," 11λP50rit "," 1 "," 2016 /11/3  10:07:43 "," 2016 /11/3  10:07:41 "," 0 "," DELIVRD"
"311-SD10658" 2114781676479382330 "," 13882133333 "," 11λP50rit "," 1 "," 2016 /11/3  10:07:43 "," 2016 /11/3  10:07:41 "," 0 "," DELIVRD"
"311-SD10658" 2114781676479382330 "," 18200980999 "," 11λP50rit "," 1 "," 2016 /11/3  10:07:43 "," 2016 /11/3  10:07:41 "," 0 "," DELIVRD"

处理的思路:

由于只是要相同的号码,所以就在linux下用一些文本处理工具对其处理,先将其处理成只含手机号码的文件,然后再做其他的处理

可以用cut或者awk截取相关的列,但是由于awk不是太熟悉,这里就使用cut截取,注意分隔符以及相关的第几列就可以。

然后可以用grep  比较,也试过diff,但是效果

1、统计两个文本文件的相同行

grep -Ff file1 file2


2、统计file2中有,file1中没有的行 比较两个不同的行

grep  -vFf  file2 file1



本文转自 tianya1993 51CTO博客,原文链接:http://blog.51cto.com/dreamlinux/1869844,如需转载请自行联系原作者

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值