你需要创建一个包装函数;这很容易:
def read_by_tokens(fileobj):
for line in fileobj:
for token in line.split():
yield token
请注意,.readline()不只是逐个字符地读取文件,直到遇到换行符为止;以块(缓冲区)读取文件以提高性能.
上面的方法按行读取文件,但在空白处产生结果拆分.使用它像:
with open('somefilename') as f:
for token in read_by_tokens(f):
print(token)
因为read_by_tokens()是一个生成器,你需要直接遍历函数结果,或者使用next() function逐个获取标记:
with open('somefilename') as f:
tokenized = read_by_tokens(f)
# read first two tokens separately
first_token = next(tokenized)
second_token = next(tokenized)
for token in tokenized:
# loops over all tokens *except the first two*
print(token)