Word Frequency

最新推荐文章于 2022-07-29 20:10:13 发布

奔跑吧小蜗牛

最新推荐文章于 2022-07-29 20:10:13 发布

阅读量1.1k

点赞数

分类专栏： leetcode 文章标签： leetcode shell

本文链接：https://blog.csdn.net/abinge317/article/details/50462455

版权

leetcode 专栏收录该内容

13 篇文章 0 订阅

订阅专栏

题目大意：写一段shell脚本来将一个文本文件中出现的单词的频度进行统计，并按照频度大小倒序输出。假设每行的单词都是由一个空格分隔的，并且假设任何两个单词的频度都不相同。比如文本为：

the day is sunny the the
the sunny is is

则输出为：

the 4
is 3
sunny 2
day 1

代码如下：

# Read from the file words.txt and output the word frequency list to stdout.
tr -s ' ' '\n' < words.txt  | sort | uniq -c | sort -nr | awk '{print $2, $1}'

解析：tr 用来将空格替换成"\n"， -s选项会保证不会出现多余的换行，具体用法参见帮助手册。之后文本就变成了每个单词一行，如下所示：

the
day
is
sunny
the
the
the
sunny
is
is

再调用sort对单词进行排序，相同的单词则在相同的行，如下所示：

day
is
is
is
sunny
sunny
the
the
the
the

之后调用uniq 并用-c选项统计每个单词的频度，文本如下所示：

1 day
3 is
2 sunny
4 the

之后再调用sort 用-nr选项让文本根据数字倒序排列，文本如下所示：

4 the
3 is
2 sunny
1 day

之后再调用 awk将第一列和第二列互换一下达到最后目的，最终文本如下：

the 4
is 3
sunny 2
day 1

另解：最前面将每个单词放在一行上面也可以用sed来进行替换操作

sed -r 's/\s+/\n/g' words.txt  | sed '/^$/d' | sort | uniq -c | sort -nr | awk '{print $2, $1}'

第一次sed操作之后如果文本的某一行开头有空格，则转换后的文本会出现空行，所以可以通过第二个sed操作去除空行，其中第一个sed的-r选项表示采用扩展的正则表达式，以便支持\s以及+等符号。

奔跑吧小蜗牛

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录