python编程单词排序_如何在python中读取，附加和排序文本文件的所有单词？-CSDN博客

Open the file romeo.txt and read it line by line. For each line,

split the line into a list of words using the split() function. The

program should build a list of words. For each word on each line check

to see if the word is already in the list and if not append it to the

list. When the program completes, sort and print the resulting words

in alphabetical order.

http://www.pythonlearn.com/code/romeo.txt

这是我的代码：

9fname = raw_input("Enter file name:")

fh = open(fname)

for line in fh:

for word in line.split():

if word in line.split():

line.split().append(word)

if word not in line.split():

continue

print word

由于某种原因，它仅返回最后一行的最后一个单词。

您究竟希望line.split().append(word)做什么？

我测试了您的代码：按照我的预期，它会打印每行中的每个单词。调用line.split()时会有什么期望？您期望您的哪个条件(单词在...或单词不在...)是正确的？

我是不是误解了这个问题，还是说要针对文件中的每一行，将行拆分为单词，然后检查结果列表中的每个单词是否都在列表中？这样的冗余不是永远都是真的吗？将单词添加到列表末尾的目的是什么？那不是两次在列表中吗？

我认为您误解了line.split()在做什么。 line.split()将返回一个列表，其中包含字符串line中的"单词"。在这里，我们将"单词"解释为"由空格字符分隔的子字符串"。因此，如果line等于"Hello, World. I <3 Python"，则line.split()将返回列表["Hello,","World.","I","<3","Python"]。

编写for word in line.split()时，您正在迭代该列表的每个元素。因此条件word in line.split()将始终为真！您真正想要的是"您已经遇到过的单词"的累积列表。在程序顶部，您将使用DiscoveredWords = []创建它。然后针对每一行中的每个单词

2if word not in DiscoveredWords:

DiscoveredWords.append(word)

得到它了？ :)现在，既然您似乎对Python还是陌生的(顺便欢迎您的乐趣)，这就是我编写代码的方式：

5fname = raw_input("Enter file name:")

with open(fname) as fh:

words = [word for line in fh for word in line.strip().split()]

words = list(set(words))

words.sort()

让我们快速浏览一下此代码，以便您了解发生了什么：

with open(fname) as fh是一个方便记住的技巧。它可以确保您的文件被关闭！ python退出with块后，它将自动为您关闭文件：D

words = [word for line in fh for word in line.strip().split()]是另一个方便的把戏。这是获取包含文件中所有单词的列表的更简洁的方法之一！我们告诉python通过获取文件中的每一行(for line in fh)，然后获取该行中的每个单词(for word in line.strip().split())来列出列表。

words = list(set(words))将我们的列表强制转换为set，然后返回至list。这是删除重复项的快速方法，因为python中的set包含唯一元素。

最后，我们使用words.sort()对列表进行排序。

希望这是有益的和有益的:)

1sorted(set([w for l in open(fname) for w in l.split()]))

尝试以下操作，它使用set()构建唯一的单词列表。每个单词也都小写，因此" The"和" the"被视为相同。

17import re

word_set = set()

re_nonalpha = re.compile('[^a-zA-Z ]+')

fname = raw_input("Enter file name:")

with open(fname,"r") as f_input:

for line in f_input:

line = re_nonalpha.sub(' ', line) # Convert all non a-z to spaces

for word in line.split():

word_set.add(word.lower())

word_list = list(word_set)

word_list.sort()

print word_list

这将显示以下列表：

1['already', 'and', 'arise', 'bits', 'breaks', 'but', 'east', 'envious', 'fair', 'grief', 'has', 'is', 'it', 'juliet', 'kill', 'light', 'many', 'moon', 'pale', 'punctation', 'sick', 'soft', 'sun', 'the', 'this', 'through', 'too', 'way', 'what', 'who', 'window', 'with', 'yonder']

我认为正则表达式是多余的。 OP正在使用的文件已使用空格分隔，没有标点符号：)

同意，但我记得一个几乎相同的问题和案文，并且在绊倒事物的整个地方都有细微的标点符号。

在循环的顶部，添加一个列表，您将在其中收集单词。现在，您只是丢弃所有内容。

您的逻辑也相反，您正在丢弃本应保存的单词。

10words = []

fname = raw_input("Enter file name:")

fh = open(fname)

for line in fh:

for word in line.split():

if word not in words:

words.append(word)

fh.close()