统计文件中单词出现频率
Write a bash script to calculate the frequency of each word in a text file words.txt.
For simplicity sake, you may assume:
words.txt contains only lowercase characters and space ' ' characters.
Each word must consist of lowercase characters only.
Words are separated by one or more whitespace characters.
For example, assume that words.txt has the following content:
the day is sunny the the
the sunny is is
Your script should output the following, sorted by descending frequency:
the 4
is 3
sunny 2
day 1
#!/bin/bash
awk '{ for (i=1; i <= NF; ++i) { if (arr[$i] == 0) {arr[$i] = 1;} else { ++arr[$i]; }}}; END { for (k in arr) print k " " arr[k] | "sort -r -n -k2"; }' words.txt
通过管道,调用sort排序,-r 从大到小,-n 按照数字排序,-k2 以第2列排序;如果以key值排序 –k2 变成 -k1
-n, --numeric-sort
compare according to string numerical value
-r, --reverse
reverse the result of comparisons
-k, --key=POS1[,POS2]
start a key at POS1 (origin 1), end it at POS2 (default end of line). See POS syntax below