awk 应用

概述
  • 作用
  • 示例
作用
  • 字符串处理
示例
  • 原始记录为,想把最后一个字段为 "" 的记录去掉
  • t,2013-01-06 00:00:00:155,121.10.83.33,CNGDZJ,2feb5fea7c1365,3028e559281d58,218713,0,0,0,0,0,"",0,"",0,"i.adsame.com/sammax/adsame2011/15.swf"
    t,2013-01-06 00:00:00:191,59.180.144.158,IN0000,302bc09edb4edf,30397dd29965e0,44321,0,0,0,0,0,"",0,"",0,"photogallery.navbharattimes.indiatimes.com/articleshow/9923463.cms"
    t,2013-01-06 00:00:00:212,116.11.150.223,CNGXYL,2fd4e27d7d41bf,30397ec75a9ee1,218712,0,0,0,0,0,"",0,"",0,"i.adsame.com/sammax/adsame2011/15.swf"
    t,2013-01-06 00:00:00:212,186.24.39.211,VE0000,2f813ccbe40a57,2fff3625e9ecc4,218710,0,0,0,0,0,"",0,"",0,"i.adsame.com/sammax/adsame2011/15.swf"
    t,2013-01-06 00:00:00:229,27.24.140.130,CNHB00,302813c563c1a3,30333c8d31e160,218712,0,0,0,0,0,"",0,"",0,"i.adsame.com/sammax/adsame2011/15.swf"
    t,2013-01-06 00:00:00:222,115.153.179.198,CNJXFZ,3039776bb28aa4,303979b0e2fe3f,277625,0,0,0,0,0,"",0,"",0,"k.adsame.com/sammax/visa2012/300300100100.swf"
    t,2013-01-06 00:00:00:232,221.11.26.246,CNXAXA,30123960b8cb2e,300036fbd1d034,218710,0,0,0,0,0,"",0,"",0,""
    t,2013-01-06 00:00:00:241,182.205.162.23,CNLNSY,302048e883146b,3039764c2c5e63,218711,0,0,0,0,0,"",0,"",0,""
    t,2013-01-06 00:00:00:235,121.63.222.58,CNHBXF,3025c09cca68ba,30397edb90cbd2,218710,0,0,0,0,0,"",0,"",0,""
    t,2013-01-06 00:00:00:253,123.80.109.13,CNXJWL,30304f09977673,30397e92ada324,326504,0,0,0,0,0,"",0,"",0,"i.adsame.com/sammax/chaoren/201301/300300.swf"
    t,2013-01-06 00:00:00:250,120.70.27.19,CNXJ00,2ff8fd8a55cb76,30397e6bdd3cc4,277628,0,0,0,0,0,"",0,"",0,"k.adsame.com/sammax/visa2012/300300100100.swf"
  • 假定输入文件名为 xyz  输出文件名为 save  则 awk 命令为
  • awk -F \, 'length($17)>3{print $0}' xyz >> save

  • hadoop fs -ls hdfs://xxxx:9000/AdsameData/compute/CookieInterest/part-r-00001  此命令执行后的结果为
  • Found 22 items
    -rw-r--r--   3 hadoop supergroup          0 2013-01-04 18:27 /AdsameData/compute/CookieInterest/_SUCCESS
    drwxr-xr-x   - hadoop supergroup          0 2013-01-04 16:08 /AdsameData/compute/CookieInterest/_logs
    -rw-r--r--   3 hadoop supergroup 1636656499 2013-01-04 16:16 /AdsameData/compute/CookieInterest/part-r-00000
    -rw-r--r--   3 hadoop supergroup 1637816931 2013-01-04 16:16 /AdsameData/compute/CookieInterest/part-r-00001
    -rw-r--r--   3 hadoop supergroup 1637583887 2013-01-04 16:16 /AdsameData/compute/CookieInterest/part-r-00002
  • 现在只针对  /AdsameData/compute/CookieInterest/part-r-00001 提取时间字段,则脚本如下
  • hadoop fs -ls hdfs://xxx:9000/AdsameData/compute/CookieInterest/part-r-00001 | awk '{print $6,$7}'
  • 结果如下
  •  2013-01-04 16:16

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值