Parsing Words
We define a word as any sequence of one or more lower-case letters (no numbers, no punctuation) where words are separated by white space.
Write a function that takes a list of input lines and produces a string that contains the following :
- the count of words in the input
- the word “words”
- each unique word, and the count of times it occurs in the input (listed in alphabetical order, each on its own line, with a space between the word and count)
- the word “letters”
- for every letter from a to z, the letter, and the count of times that letter occurred IN A WORD in the input (listed in alphabetical order, each on its own line, with a space between the letter and count).
There must be “whitespace” separating valid words in the input – actual spaces, and newlines. If your program finds something that is not whitespace, and not a word, it should skip until it comes to a valid word (or the end of the input). Finding a non-word character next to word-
characters makes the whole sequence a non-word.
import sys, re, string
from io import StringIO
data = sys.stdin.readlines()
tmp = []
findNonWord = False
wordList = []
for iLine in data:
findNonWord = False
for idx in range(len(iLine)):
iChar = iLine[idx]
if re.match("[a-z]", iChar) and not findNonWord:
tmp.append(iChar)
elif iChar == ' ':
if not findNonWord and len(tmp)>0:
wordList.append(''.join(tmp))
tmp = []
findNonWord = False
else:
findNonWord = True
tmp = []
if idx==len(iLine)-1:
if not findNonWord and len(tmp)>0:
wordList.append(''.join(tmp))
tmp = []
print(wordList)
print('')
wordCount = len(wordList)
print(wordCount)
print('words')
wordSet = sorted(set(wordList))
for iWord in wordSet:
print (iWord + " " + str(wordList.count(iWord)))
print('letters')
for iLetters in string.ascii_lowercase[:26]:
count = 0
for iWord in wordList:
count += iWord.count(iLetters)
print(iLetters + " " + str(count))
count = 0