Word Frequency
Total Accepted: 5884
Total Submissions: 22927
Difficulty: Medium
Contributors: Admin
Write a bash script to calculate the frequency of each word in a text file words.txt.
For simplicity sake, you may assume:
- words.txt contains only lowercase characters and space ’ ’ characters.
- Each word must consist of lowercase characters only.
- Words are separated by one or more whitespace characters.
For example, assume that words.txt has the following content:
the day is sunny the the
the sunny is is
Your script should output the following, sorted by descending frequency:
the 4
is 3
sunny 2
day 1
Note:
Don’t worry about handling ties, it is guaranteed that each word’s frequency count is unique.
#!/bin/bash
declare -A HashWord
File="words.txt"
function ReadTxtFile
{
while read Line
do
Word=(${Line})
for Var in ${Word[@]}
do
HashWord[${Var}]=${HashWord[${Var}]}'1' # 等效于 HashWord+=( [${Var}]='1')
echo "Hashword datagroup $Var : ${HashWord[${Var}]}"
Word[${Var}]=
done
done < ${File}
for Key in ${!HashWord[*]} #${!HashWord[*]} or ${!HashWord[@]} 是返回所有下角标
do
echo "${Key} ${#HashWord[${Key}]}"
done
}
### Main Logic
ReadTxtFile
执行结果:
root@ubuntu:~/test# ./t11.sh
Hashword datagroup the : 1
Hashword datagroup day : 1
Hashword datagroup is : 1
Hashword datagroup sunny : 1
Hashword datagroup the : 11
Hashword datagroup the : 111
Hashword datagroup the : 1111
Hashword datagroup sunny : 11
Hashword datagroup is : 11
Hashword datagroup is : 111
day 1
is 3
sunny 2
the 4
或者
#!/bin/bash
declare -A HW
File=$1
while read line
do
word=${line[*]}
for var in ${word[*]}
do
HW[$var]=${HW[$var]}'1'
done
done < $File
for key in ${!HW[*]}
do
echo "${key} ${#HW[$key]}"
done
执行结果:
root@ubuntu:~/test# ./t11-1.sh words.txt
day 1
is 3
sunny 2
the 4
Reference:
符号${!arry[@]}返回所有下角标http://blog.csdn.net/baiwz/article/details/25078551
while read line一次读入一行,read读到的值放在line中,可加echo “Word : ${Word[*]}” 验证。
python3:
import pprint
message='the day is sunny the the \n the sunny is is'
print(message)
a=[]
count={}
#lines=message.replace('\n','').split(' ') 与下行一样
lines=message.strip('\n').split(' ') #去掉换行符,以空格为标志把文本分割开成列表项
a.extend(lines)
print(a)
for word in a:
count.setdefault(word,0)
count[word]=count[word]+1
pprint.pprint(count)
执行结果:
================== RESTART: /Users/valen/Documents/test.py ==================
the day is sunny the the
the sunny is is
['the', 'day', 'is', 'sunny', 'the', 'the', '\n', 'the', 'sunny', 'is', 'is']
{'\n': 1, 'day': 1, 'is': 3, 'sunny': 2, 'the': 4}
>>>
https://zhidao.baidu.com/question/1690382694635348108.html
http://blog.csdn.net/huguangshanse00/article/details/14639871