我想使用Linux命令从大的制表符分隔文件中删除重复的单词/字符串.
names john, cnn, mac, tommy, mac, patrick, ngc, discovery, john, cnn, adam, patrick
cities san jose, santa clara, san franscisco, new york, san jose, santa clara
以上是文件格式,我想删除重复的单词后保留标签和逗号.
names john, cnn, mac, tommy, patrick, ngc, discovery, adam
cities san jose, santa clara, san franscisco, new york
任何帮助,将不胜感激.
解决方法:
awk 'BEGIN {
FS = ", |\t"
}
{
printf "%s\t", $1
delim = ""
for (i = 2; i <= NF; i++) {
if (! ($i in seen)) {
printf "%s%s", delim, $i
delim = ", "
}
seen[$i]
}
printf "\n"
delete seen
}' inputfile
如果您没有使用GNU AWK(gawk),那么您无法删除数组,而是使用split(“”,array).
标签:linux,awk,sed
来源: https://codeday.me/bug/20190902/1788376.html