概述
- 作用
- 示例
- 字符串处理
示例
- 原始记录为,想把最后一个字段为 "" 的记录去掉
t,2013-01-06 00:00:00:155,121.10.83.33,CNGDZJ,2feb5fea7c1365,3028e559281d58,218713,0,0,0,0,0,"",0,"",0,"i.adsame.com/sammax/adsame2011/15.swf" t,2013-01-06 00:00:00:191,59.180.144.158,IN0000,302bc09edb4edf,30397dd29965e0,44321,0,0,0,0,0,"",0,"",0,"photogallery.navbharattimes.indiatimes.com/articleshow/9923463.cms" t,2013-01-06 00:00:00:212,116.11.150.223,CNGXYL,2fd4e27d7d41bf,30397ec75a9ee1,218712,0,0,0,0,0,"",0,"",0,"i.adsame.com/sammax/adsame2011/15.swf" t,2013-01-06 00:00:00:212,186.24.39.211,VE0000,2f813ccbe40a57,2fff3625e9ecc4,218710,0,0,0,0,0,"",0,"",0,"i.adsame.com/sammax/adsame2011/15.swf" t,2013-01-06 00:00:00:229,27.24.140.130,CNHB00,302813c563c1a3,30333c8d31e160,218712,0,0,0,0,0,"",0,"",0,"i.adsame.com/sammax/adsame2011/15.swf" t,2013-01-06 00:00:00:222,115.153.179.198,CNJXFZ,3039776bb28aa4,303979b0e2fe3f,277625,0,0,0,0,0,"",0,"",0,"k.adsame.com/sammax/visa2012/300300100100.swf" t,2013-01-06 00:00:00:232,221.11.26.246,CNXAXA,30123960b8cb2e,300036fbd1d034,218710,0,0,0,0,0,"",0,"",0,"" t,2013-01-06 00:00:00:241,182.205.162.23,CNLNSY,302048e883146b,3039764c2c5e63,218711,0,0,0,0,0,"",0,"",0,"" t,2013-01-06 00:00:00:235,121.63.222.58,CNHBXF,3025c09cca68ba,30397edb90cbd2,218710,0,0,0,0,0,"",0,"",0,"" t,2013-01-06 00:00:00:253,123.80.109.13,CNXJWL,30304f09977673,30397e92ada324,326504,0,0,0,0,0,"",0,"",0,"i.adsame.com/sammax/chaoren/201301/300300.swf" t,2013-01-06 00:00:00:250,120.70.27.19,CNXJ00,2ff8fd8a55cb76,30397e6bdd3cc4,277628,0,0,0,0,0,"",0,"",0,"k.adsame.com/sammax/visa2012/300300100100.swf"
- 假定输入文件名为 xyz 输出文件名为 save 则 awk 命令为
awk -F \, 'length($17)>3{print $0}' xyz >> save
- hadoop fs -ls hdfs://xxxx:9000/AdsameData/compute/CookieInterest/part-r-00001 此命令执行后的结果为
Found 22 items -rw-r--r-- 3 hadoop supergroup 0 2013-01-04 18:27 /AdsameData/compute/CookieInterest/_SUCCESS drwxr-xr-x - hadoop supergroup 0 2013-01-04 16:08 /AdsameData/compute/CookieInterest/_logs -rw-r--r-- 3 hadoop supergroup 1636656499 2013-01-04 16:16 /AdsameData/compute/CookieInterest/part-r-00000 -rw-r--r-- 3 hadoop supergroup 1637816931 2013-01-04 16:16 /AdsameData/compute/CookieInterest/part-r-00001 -rw-r--r-- 3 hadoop supergroup 1637583887 2013-01-04 16:16 /AdsameData/compute/CookieInterest/part-r-00002
- 现在只针对 /AdsameData/compute/CookieInterest/part-r-00001 提取时间字段,则脚本如下
hadoop fs -ls hdfs://xxx:9000/AdsameData/compute/CookieInterest/part-r-00001 | awk '{print $6,$7}'
- 结果如下
2013-01-04 16:16