问题:python3 TypeError: a bytes-like object is required, not ‘str’
运行机器学习算法原理与编程实践的代码:
import sys
import os
import jieba
def savefile(savepath, content):
fp = open(savepath, "wb")
#content = str.encode(content)
fp.write(content)
fp.close()
def readfile(path):
fp = open(path, "rb")
print(path)
content = fp.read()
fp.close()
return content
corpus_path = "train_corpus_small/"
seg_path = "train_corpus_seg/"
catelist = os.listdir(corpus_path)
for mydir in catelist:
class_path = corpus_path + mydir + "/"
seg_dir = seg_path + mydir + "/"
if not os.path.exists(seg_dir):
os.makedirs(seg_dir)
file_list = os.listdir(class_path)
for file_path in file_list:
fullname = class_path + file_path
content = readfile(fullname)
#content = str(content, encoding='utf-8')
content = content.strip()
content = content.replace("\r\n", "").strip()
content_seg = jieba.cut(content)
savefile(seg_dir+file_path, " ".join(content_seg))
print("中文语料分词结束!")
报错:a bytes-like object is required, not ‘str’
网上搜了后,这是在pyhon2中是可以的,但是python3中不支持。尝试将读写文件中打开文件不以二进制的形式,而以文本的形式,但是报了一些编码上的错误。后来想是否存在二进制序列与str之间是否存在相互转换的功能,经过网上搜索,找了解决方案。
str转换成bytes
>>> s="我是中国人"
>>> s
'我是中国人'
>>> type(s)
<class 'str'>
>>> b1 = bytes(s, encoding = 'utf-8')
>>> type(b1)
<class 'bytes'>
>>> b1
b'\xe6\x88\x91\xe6\x98\xaf\xe4\xb8\xad\xe5\x9b\xbd\xe4\xba\xba'
>>> b2 = str.encode(s)
>>> b2
b'\xe6\x88\x91\xe6\x98\xaf\xe4\xb8\xad\xe5\x9b\xbd\xe4\xba\xba'
>>> b3 = s.encode()
>>> b3
b'\xe6\x88\x91\xe6\x98\xaf\xe4\xb8\xad\xe5\x9b\xbd\xe4\xba\xba'
>>>
bytes转str
>>> s1 = str(b1, encoding = 'utf-8')
>>> s1
'我是中国人'
>>> s2 = bytes.decode(b1)
>>> s2
'我是中国人'
>>> s3 = b1.decode()
>>> s3
'我是中国人'
>>>