python读文件出现特殊字符_从python中的.txt文件读取特殊字符

该博客探讨了在Python中读取包含特殊字符(如'é)的文本文件时遇到的问题。作者展示了尝试去除标点符号和处理编码转换的代码片段,但遇到了'OverflowError: Python int too large to convert to C long'错误。文章主要关注于如何正确解析包含特殊字符的文本并统计单词频率。
摘要由CSDN通过智能技术生成

这段代码的目的是找出一本书中所用单词的频率。

我想读一本书,但下面这一行一直在把我的代码扔掉:precious protégés. No, gentlemen; he'll always show 'em a clean pair

specifically the é character

这是我的代码:import string

# Create word dictionary from the comprehensive word list

word_dict = {}

def create_word_dict ():

# open words.txt and populate dictionary

word_file = open ("./words.txt", "r")

for line in word_file:

line = line.strip()

word_dict[line] = 1

# Removes punctuation marks from a string

def parseString (st):

st = st.encode("ascii", "replace")

new_line = ""

st = st.strip()

for ch in st:

ch = str(ch)

if (n for n in (1,2,3,4,5,6,7,8,9,0)) in ch or ' ' in ch or ch.isspace() or ch == u'\xe9':

print (ch)

new_line += ch

else:

new_line += ""

# now remove all instances of 's or ' at end of line

new_line = new_line.strip()

print (new_line)

if (new_line[-1] == "'"):

new_line = new_line[:-1]

new_line.replace("'s", "")

# Conversion from ASCII codes back to useable text

message = new_line

decodedMessage = ""

for item in message.split():

decodedMessage += chr(int(item))

print (decodedMessage)

return new_line

# Returns a dictionary of words and their frequencies

def getWordFreq (file):

# Open file for reading the book.txt

book = open (file, "r")

# create an empty set for all Capitalized words

cap_words = set()

# create a dictionary for words

book_dict = {}

total_words = 0

# remove all punctuation marks other than '[not s]

for line in book:

line = line.strip()

if (len(line) > 0):

line = parseString (line)

word_list = line.split()

# add words to the book dictionary

for word in word_list:

total_words += 1

if (word in book_dict):

book_dict[word] = book_dict[word] + 1

else:

book_dict[word] = 1

print (book_dict)

# close the file

book.close()

def main():

wordFreq1 = getWordFreq ("./Tale.txt")

print (wordFreq1)

main()

我收到的错误如下:Traceback (most recent call last):

File "Books.py", line 80, in

main()

File "Books.py", line 77, in main

wordFreq1 = getWordFreq ("./Tale.txt")

File "Books.py", line 60, in getWordFreq

line = parseString (line)

File "Books.py", line 36, in parseString

decodedMessage += chr(int(item))

OverflowError: Python int too large to convert to C long

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值