此代码将打印文本文件中的总行数,单词总数和字符总数。 它工作正常,并提供了预期的输出。 但是我想计算每行中的字符数并像这样打印:-
Line No. 1 has 58 Characters
Line No. 2 has 24 Characters
代码:-
import string
def fileCount(fname):
#counting variables
lineCount = 0
wordCount = 0
charCount = 0
words = []
#file is opened and assigned a variable
infile = open(fname, 'r')
#loop that finds the number of lines in the file
for line in infile:
lineCount = lineCount + 1
word = line.split()
words = words + word
#loop that finds the number of words in the file
for word in words:
wordCount = wordCount + 1
#loop that finds the number of characters in the file
for char in word:
charCount = charCount + 1
#returns the variables so they can be called to the main function
return(lineCount, wordCount, charCount)
def main():
fname = input('Enter the name of the file to be used: ')
lineCount, wordCount, charCount = fileCount(fname)
print ("There are", lineCount,"lines in the file.")
print ("There are", charCount,"characters in the file.")
print ("There are", wordCount,"words in the file.")
main()
如
for line in infile:
lineCount = lineCount + 1
正在计算整条线,但是如何进行此操作的每一条线呢?
我正在使用Python 3.X
您可以使用len函数。
但是len也会计算空格和制表符。 另外,如何将其应用于每一行? 我需要另一个循环。
len(re.findall(r\S, line))
不需要为此使用正则表达式
Python具有一个超级有用的内置collections.Counter,这是对输入进行计数的专用指令。 看我的答案。 代码更短,性能更高,因为无需迭代地添加到列表words
将所有信息存储在字典中,然后按键访问。
def fileCount(fname):
#counting variables
d = {"lines":0,"words": 0,"lengths":[]}
#file is opened and assigned a variable
with open(fname, 'r') as f:
for line in f:
# split into words
spl = line.split()
# increase count for each line
d["lines"] += 1
# add length of split list which will give total words
d["words"] += len(spl)
# get the length of each word and sum
d["lengths"].append(sum(len(word) for word in spl))
return d
def main():
fname = input('Enter the name of the file to be used: ')
data = fileCount(fname)
print ("There are {lines} lines in the file.".format(**data))
print ("There are {} characters in the file.".format(sum(data["lengths"])))
print ("There are {words} words in the file.".format(**data))
# enumerate over the lengths, outputting char count for each line
for ind, s in enumerate(data["lengths"], 1):
print("Line: {} has {} characters.".format(ind, s))
main()
该代码仅适用于由空格分隔的单词,因此您需要牢记这一点。
collections.Counter是一个特殊的字典,它计算其输入。
定义要计数的允许字符的set,然后可以使用len获取大部分数据。
在下面,我选择了字符集:
['!','"",'#','$','%','&','''','(',')','*','+',',',' -','。','/','0','1','2','3','4','5','6','7','8','9' ,':',';','','?','@','A','B','C','D','E',' F','G','H','I','J','K','L','M','N','O','P','Q','R' ," S"," T"," U"," V"," W"," X"," Y"," Z"," ["," ","]"," ^"," _','`','a','b','c','d','e','f','g','h','i','j','k' ," l"," m"," n"," o"," p"," q"," r"," s"," t"," u"," v"," w"," x','y','z','{','|','}','?']
#Define desired character set
valid_chars = set([chr(i) for i in range(33,127)])
total_lines = total_words = total_chars = 0
line_details = []
with open ('test.txt', 'r') as f:
for line in f:
total_lines += 1
line_char_count = len([char for char in line if char in valid_chars])
total_chars += line_char_count
total_words += len(line.split())
line_details.append("Line %d has %d characters" % (total_lines, line_char_count))
print ("There are", total_lines,"lines in the file.")
print ("There are", total_chars,"characters in the file.")
print ("There are", total_words,"words in the file.")
for line in line_details:
print (line)
这是使用内置collections.Counter的更简单版本,它是对输入进行计数的专用字典。我们可以使用Counter.update()方法在每一行中都包含所有单词(无论是否唯一):
from collections import Counter
def file_count_2(fname):
line_count = 0
word_counter = Counter()
infile = open(fname, 'r')
for line in infile:
line_count += 1
word_counter.update( line.split() )
word_count = 0
char_count = 0
for word, cnt in word_counter.items():
word_count += cnt
char_count += cnt * len(word)
print(word_counter)
return line_count, word_count, char_count
笔记:
我对此进行了测试,它为您的代码提供了相同的计数
因为您不必迭代地追加到列表words(最好只对唯一的单词进行散列并存储其计数,这是Counter的工作),所以它会更快,并且也不需要每次都迭代和递增charCount我们看到一个单词的出现。
如果只希望word_count而不是char_count,则可以直接使用word_count = sum(word_counter.values()),而无需遍历word_counter
PS命名word_count,line_count等比wordCount,lineCount具有更多Pythonic(PEP-8格式);我们仅使用CamelCase作为类名,而不使用变量,函数或方法。
尽管此答案可能比原始代码更有效,但它并未回答"如何计算和打印每行中的字符数"的问题。
@RolfofSaxony:实际上,按照OP的原始标题和代码示例。标题编辑是我的,而不是他们的,试图抓住他们的意图。现在,我已对其进行修复,以使"每行的每个单词"而不是"每行的每个单词"更清晰
问题:"第1行有58个字符。第2行有24个字符"?
@RolfofSaxony:啊,我将OP的代码作为他们想要的规范,并对其进行了清理。但是他们希望将其扩展到每一行中的计数。让我更正我的代码...
请注意,注释中应排除空格和制表符
@RolfofSaxony:是的,几天前我发表了类似的评论
我被分配了创建一个程序来打印一行中的字符数的任务。
作为编程的菜鸟,我发现这很困难:(。
这是我想出的,以及他的回应-
这是您程序的核心部分:
with open ('data_vis_tips.txt', 'r') as inFile:
with open ('count_chars_per_line.txt', 'w') as outFile:
chars = 0
for line in inFile:
line = line.strip('
')
chars = len(line)
outFile.write(str(len(line))+'
')
可以简化为:
with open ('data_vis_tips.txt', 'r') as inFile:
for line in inFile:
line = line.strip()
num_chars = len(line)
print(num_chars)
请注意,不需要strip()函数的参数;默认情况下会去除空格,而' n'是空格。