Python3和Python2关于csv文件的读写区别

最新推荐文章于 2023-06-20 07:00:00 发布

cwj1412

最新推荐文章于 2023-06-20 07:00:00 发布

阅读量414

点赞数

文章标签： python 人工智能

本文链接：https://blog.csdn.net/cwj1412/article/details/128340177

版权

今天在处理数据的时候遇到了一个因为python版本不同，导致csv读写出错的问题。

我首先在Python 2.7的环境下抽取特征，按照一定的格式存成csv文件：

def get_detections_from_im(net, im_file, image_id, conf_thresh=0.2):
    im = cv2.imread(im_file)
    scores, boxes, attr_scores, rel_scores = im_detect(net, im)

    # Keep the original boxes, don't worry about the regresssion bbox outputs
    rois = net.blobs['rois'].data.copy()
    # unscale back to raw image space
    blobs, im_scales = _get_blobs(im, None)
    cls_boxes = rois[:, 1:5] / im_scales[0]
    cls_prob = net.blobs['cls_prob'].data
    pool5 = net.blobs['pool5_flat'].data
	......
	
    return {
        'image_id': image_id,
        'image_h': np.size(im, 0),
        'image_w': np.size(im, 1),
        'num_boxes' : len(keep_boxes),
        'boxes': base64.b64encode(cls_boxes[keep_boxes]),
        'features': base64.b64encode(pool5[keep_boxes])
    }

with open(outfile, 'ab') as tsvfile:
	writer = csv.DictWriter(tsvfile, delimiter = '\t', fieldnames = FIELDNAMES)
	......
	
	for im_file,image_id in image_ids:
		writer.writerow(get_detections_from_im(net, im_file, image_id))

然后试图在Python3.7的环境下读取csv文件中的特征，存成字典。下面这段代码是直接从Python 2.7的环境下可运行的代码修改来的：

feature = {}
csv.field_size_limit(sys.maxsize)

with open(input_path, 'r+b') as tsv_in_file:
	reader = csv.DictReader(tsv_in_file, delimiter='\t', fieldnames = FIELDNAMES)
	for item in reader:
		num_boxes = int(item['num_boxes'])
		data = item['features']
		buf = base64.decodestring(data)
		temp = np.frombuffer(buf, dtype=np.float32)
		item['features'] = temp.reshape((num_boxes,-1))
		img_id = item['image_id']
		feature[img_id] = item['features']

不出意料，果然报错了。于是开始debug，顺便了解一下Python3和Python2关于csv文件的读写区别。

报错信息如下：

    row = next(self.reader)
_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)

查了一下资料，参考 Python2与Python3的字符编码与解码，发现问题可能出在Python3和Python2读取文件的方式发生了变化。

字符与字节的转换
在计算机中，数据实际是以二进制的形式（字节）进行存放，包括不同语言的文字（即字符）。
这就涉及到了如何将字符和字节进行转换。Unicode定义了每个字符对应的数字编码，比如：“字”对应十进制编码是23383。有了编码后我们就可以进一步将它表示成程序可读的二进制数据，表示的方法有很多种，常见的如：UTF-8、UTF-16、UTF-32，它们采用不同的字节数来储存字符对应的编码。
抛开具体的编码方式不谈，实际就是：计算机中以字节的形式存储字符编码，当我们希望看到真正的字符时，需要对其进行解码；反之为编码的过程。
Python2和Python3对字节和字符的处理
Python2.x的版本下，有字符编码类str（严格上算是字节序列）和字符类unicode，只有当需要的时候才通过编码encode（将字符转为字节）和解码decode（将字节转为字符）进行转换。
而在Python3.x的版本中，合并了这两类为字符类str，同时添加了一个字节类bytes，实现unicode内置，严格区分开字符和字节。

根据 csv库官方文档 reader会根据csvfile生成一个迭代器，每次返回一个字符串string。代码中用到的csv.DictReader和csv.reader的输入和功能基本上差不多，只不过前者可以将每一行的信息和表头的列名对应组合成一个字典dict返回，后者直接返回一个字符串string。在Python2中，由于没有严格区分字节和字符，允许csvfile是一个二进制流文件输入，但在Python3中将出现问题：

Python2中关于reader的解释
Python3中关于reader的解释
因此需要把输入文件的读取形式改为以文本模式读入：

with open(input_path, 'r+b') as tsv_in_file:
	reader = csv.DictReader(tsv_in_file, delimiter='\t', fieldnames = FIELDNAMES)

修改为：

with open(input_path, 'r', encoding='utf8') as tsv_in_file:
	reader = csv.DictReader(tsv_in_file, delimiter='\t', fieldnames = FIELDNAMES)

修改好后再次运行，发现还有问题，新的报错信息如下：

    buf = base64.decodestring(data)
TypeError: expected bytes-like object, not str

由于我们读取方式变成了以字符串形式读入，所以获得的feature也变成了字符串的形式而不是原本的字符编码串（字节形式），因此需要对data进行encode，才能正确转换进行后续操作：

        for item in reader:
            num_boxes = int(item['num_boxes'])
            data = item['features']
            buf = base64.decodestring(data)

修改为：

        for item in reader:
            num_boxes = int(item['num_boxes'])
            data = item['features']
            buf = base64.decodestring(data.encode())

修改后再次运行，已经能成功运行了，就是可能因为decodestring已经在3.1以上版本弃用而收到warning：

DeprecationWarning: decodestring() is a deprecated alias since Python 3.1, use decodebytes()
  buf = base64.decodestring(data.encode())

放着不管也没关系，如果介意就根据提示修改为：

buf = base64.decodebytes(data.encode())

到此debug结束，程序终于跑起来啦~

其他相关参考链接：
https://www.cnpython.com/qa/73410
https://www.cnpython.com/qa/58940
https://blog.csdn.net/qq_37189082/article/details/124562148
https://blog.csdn.net/Sandwichsauce/article/details/88638454
https://www.5axxw.com/questions/content/f9d0kv
https://www.cnblogs.com/springsnow/p/13174511.html