awk处理多个文件

最新推荐文章于 2024-09-14 19:32:54 发布

free_to_fly

最新推荐文章于 2024-09-14 19:32:54 发布

阅读量1.4k

点赞数

awk的数据输入有两个来源，标准输入和文件，后一种方式支持多个文件。

如：
1. shell的Pathname Expansion方式：awk '{...}' *.txt

# *.txt先被shell解释，替换成当前目录下的所有*.txt，
# 如当前目录有1.txt和2.txt，则命令最终为awk '{...}' 1.txt 2.txt

2. 直接指定多个文件： awk '{...}' a.txt b.txt c.txt ...

# awk对多文件的处理流程是，依次读取各个文件内容，如上例，先读a.txt，再读b.txt....

那么，在多文件处理的时候，如何判断awk目前读的是哪个文件，而依次做对应的操作呢？

########################
#
#     处理 2 个文件
#
########################

当awk读取的文件只有两个的时候，比较常用的有两种方法：

(1) 一种是 awk 'NR==FNR{...}NR>FNR{...}' file1 file2   或    awk 'NR==FNR{...}NR!=FNR{...}' file1 file2

(2) 另一种是 awk 'NR==FNR{...;next}{...}' file1 file2

当awk读取的文件只有两个的时候，比较常用的有两种方法：

(1)一种是

awk 'NR==FNR{...}NR>FNR{...}' file1 file2

或

awk 'NR==FNR{...}NR!=FNR{...}' file1 file2

(2) 另一种是

awk 'NR==FNR{...;next}{...}' file1 file2

了解了FNR和NR这两个awk内置变量的意义就很容易知道这两种方法是如何运作的

FNR The input record number in the current input file. #已读入当前文件的记录数

NR The total number of input records seen so far. #已读入的总记录数

next Stop processing the current input record. The next input record is read and processing starts over with the first pattern in the AWK program. If the end of the input data is reached, the END block(s), if any, are executed.

awk 'NR==FNR{...}NR>FNR{...}' file1 file2

# 读入file1的时候，已读入file1的记录数FNR一定等于awk已读入的总记录数NR，因为file1是awk读入的首个文件，故读入file1时执行前一个命令块{...}
# 读入file2的时候，已读入的总记录数NR一定>读入file2的记录数FNR，故读入file2时执行后一个命令块{...}

awk 'NR==FNR{...;next}{...}' file1 file2

# 读入file1时，满足NR==FNR，先执行前一个命令块，但因为其中有next命令，故后一个命令块{...}是不会执行的
# 读入file2时，不满足NR==FNR，前一个命令块{..}不会执行，只执行后一个命令块{...}

########################
#
# 处理多个文件
#
########################

当awk处理的文件超过两个时，显然上面那种方法就不适用了。因为读第3个文件或以上时，也满足NR>FNR (NR!=FNR)，显然无法区分开来，所以就要用到更通用的方法了：

1. ARGIND # 当前被处理参数标志

awk 'ARGIND==1{...}ARGIND==2{...}ARGIND==3{...}... ' file1 file2 file3 ...

2. ARGV # 命令行参数数组

awk 'FILENAME==ARGV[1]{...}FILENAME==ARGV[2]{...}FILENAME==ARGV[3]{...}...' file1 file2 file3 ...

3. 把文件名直接加入判断

awk 'FILENAME=="file1"{...}FILENAME=="file2"{...}FILENAME=="file3"{...}...' file1 file2 file3 ...

########################
#
# 例子 1
#
########################

现有file1,file2 两个文件。文件file1有2列，内容如：

no1 name1 no2 name2 no3 name2 no4 name3 no5 name4 no6 name4 no7 name4 no8 name5 no9 name6 no10 name6

文件file2 有6列，部分有空格，内容如下：

name1 data1 dada2 data3 data4 dada5 name2 dada6 data7 dada8 name3 data9 dada10 data11 dada12 name4 data13 dada14 name5 data15 dada16 name6 data17 data18

如果file1的第2列跟file2的第1列匹配，则将两条数据合并成一条，合并后的数据应该是这样的：

no1 name1 data1 dada2 data3 data4 dada5 no2 name2 dada6 data7 dada8 no3 name2 dada6 data7 dada8 no4 name3 data9 dada10 data11 dada12 no5 name4 data13 dada14 no6 name4 data13 dada14 no7 name4 no8 name5 data15 dada16 no9 name6 data17 data18 no10 name6 data17 data18

程序：

awk 'NR==FNR{a[$1]=$0}NR>FNR{print $1" "a[$2]}' file2 file1

########################
#
# 例子 2
#
########################

file1：

sina.com 52.5 sohu.com 42.5 baidu.com 35

file 2：

www.news.sina.com sina.com 80 www.over.sohu.com baidu.com 20 www.fa.baidu.com sohu.com 50 www.open.sina.com sina.com 60 www.sport.sohu.com sohu.com 70 www.xxx.sohu.com sohu.com 30 www.abc.sina.com sina.com 10 www.fa.baidu.com baidu.com 50 www.open.sina.com sina.com 60 www.over.sohu.com sohu.com 20

合并的结果：

www.news.sina.com sina.com 80 52.5 www.over.sohu.com baidu.com 20 42.5 www.fa.baidu.com sohu.com 50 35 www.open.sina.com sina.com 60 52.5 www.sport.sohu.com sohu.com 70 42.5 www.xxx.sohu.com sohu.com 30 42.5 www.abc.sina.com sina.com 10 52.5 www.fa.baidu.com baidu.com 50 35 www.open.sina.com sina.com 60 52.5 www.over.sohu.com sohu.com 20 42.5

程序：

awk 'NR==FNR{a[$1]=$2;next}{print $0,a[$2]}' file1 file2

关注

0
点赞
踩
5

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

free_to_fly CSDN认证博客专家 CSDN认证企业博客

码龄14年

102: 原创

20万+: 周排名

208万+: 总排名

11万+: 访问

: 等级

2228: 积分

8: 粉丝

8: 获赞

5: 评论

22: 收藏

私信

关注

热门文章

分类专栏

C++ 4篇
C++ 链表
链表 1篇
面试 2篇
MINA
codereview java 1篇

最新评论

求一个区间[a,b]中数字1出现的次数
Daxiunewpoint: 之前试了很多题解不太清楚，直到看了这个，一下就弄懂了。
求二叉树的深度和宽度
vancooler: 请问currentsize不需要随着队列存入的子节点多少而改变吗？
2015届美团笔试
free_to_fly 回复 Kathryn_: 假设任务都是相同的，执行任务的机器所需要的时间最小时间是t，t最小是0，t上限是【任务数*最小的机器执行时间】，找到上限和下限后进行二分查找。。。二分查找的过程中通过判断【假设mid是最小时间，判断mid是否能满足当前的n个任务】来决定继续在左边二分还是右边二分。。。
2015届美团笔试
Kathryn_: 能说下第一题的思路吗？没看明白
04-27 看面经做题
西瓜_guns: 简单题一开始都没做对。。。我真是水爆了。。。

最新文章

目录

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。