*笨办法学python3 学习笔记习题17-19

最新推荐文章于 2024-08-16 23:39:21 发布

weixin_45938096

最新推荐文章于 2024-08-16 23:39:21 发布

阅读量1.6k

点赞数 5

分类专栏：学习笔记文章标签： python

本文链接：https://blog.csdn.net/weixin_45938096/article/details/104608448

版权

本文详细记录了在学习Python过程中遇到的文件编码错误问题，包括UnicodeDecodeError的解决尝试，探讨了PowerShell文件输出的默认编码为UTF-16 (LE)与实际需要的UTF-8之间的冲突，以及如何通过修改代码避免编码问题。同时介绍了os库的基本用法，如路径操作、进程管理和环境参数，并讨论了二进制与文本模式打开文件的差异。最后提到了函数创建和调用的相关知识点。

摘要由CSDN通过智能技术生成

习题17 更多文件操作

如何解决文件编码错误

在运行程序时报错，称编码失败。

原代码：

# 从sys包导入argv模块
from sys import argv
# 从os.path包导入exists函数
from os.path import exists

# 将argv解包
script, from_file, to_file = argv

print(f"Copying from {from_file} to {to_file}")

# We could do these two on one line, how?
# 打开from_file的文件对象并将其赋值给in_file
in_file = open(from_file)
# 读取in_file内容并将其赋值给indata
indata = in_file.read()

# 输出indata的文件字符长度
print(f"The input file is {len(indata)} bytes long")

# 查看to_file文件是否存在
print(f"Does the output file exist? {exists(to_file)}")

print("Ready, hit RETURN to continue, hit CTRL-C to abort.")
input()

# 打开to_file的文件对象并将其赋值给out_file
out_file = open(to_file, 'w')
# 将indata的内容写入out_file文件
out_file.write(indata)

print("Alright, all done.")

#关闭in_file, out_file文件
out_file.close()
in_file.close()

PS D:\pythonp> # first make a sample file.
PS D:\pythonp> echo "This is a test file." > test17.txt
PS D:\pythonp> #then look at it.
PS D:\pythonp> cat test17.txt
This is a test file.
PS D:\pythonp> # now run our script on it.
PS D:\pythonp> python ex17.py test17.txt new_file17.txt
Copying from test17.txt to new_file17.txt
Traceback (most recent call last):
  File "ex17.py", line 10, in <module>
    indata = in_file.read()
UnicodeDecodeError: 'gbk' codec can't decode byte 0xff in position 0: illegal multibyte sequence

在read（）函数处报错

查询发现可能是文件编码问题。

按照如下文章调试不成功，具体错误如下列代码。
UnicodeDecodeError: ‘gbk’ codec can’t decode byte 0xab in position 11126: illegal multibyte sequence

尝试一（×）

将第9行改为

# 将编码格式改为gbk
in_file = open(from_file， encoding = 'gbk')

PS D:\pythonp> python ex17.py test17.txt new_file17.txt
Copying from test17.txt to new_file17.txt
Traceback (most recent call last):
  File "ex17.py", line 10, in <module>
    indata = in_file.read()
UnicodeDecodeError: 'gbk' codec can't decode byte 0xff in position 0: illegal multibyte sequence

仍然在同一个位置报错

尝试二：（×）

将第9行改为

# 将编码格式改为gb18030
in_file = open(from_file， encoding = 'gb18030')

PS D:\pythonp> python ex17.py test17.txt new_file.txt
Copying from test17.txt to new_file.txt
Traceback (most recent call last):
  File "ex17.py", line 10, in <module>
    indata = in_file.read()
UnicodeDecodeError: 'gb18030' codec can't decode byte 0xff in position 0: illegal multibyte sequence

继续同一位置报错

尝试三（×）

将第9行改为

# 将编码格式改为gb18030，令忽略错误
in_file = open(from_file, encoding = 'gb18030', errors = 'ignore')

PS D:\pythonp> python ex17.py test17.txt new_file17.txt
Copying from test17.txt to new_file17.txt
The input file is 44 bytes long
Does the output file exist? False
Ready, hit RETURN to continue, hit CTRL-C to abort.

Traceback (most recent call last):
  File "ex17.py", line 19, in <module>
    out_file.write(indata)
UnicodeEncodeError: 'gbk' codec can't encode character '\u2e84' in position 0: illegal multibyte sequence

原来的位置运行成功，但在write（）函数处再一次报错，所以又去查阅资料，想要搞清楚问题根源。

powershell的文件输出格式

经查，问题在于PowerShell对于文件的输出重定向默认选择”UTF-16 (LE)”(微软称之为Unicode编码)，而实际需要文件输出格式为”UTF-8”

试用参考文章中的解决方法
Windows PowerShell 输出文件编码格式问题
 Powershell改变默认编码
 将PowerShell的默认输出编码更改为UTF-8

更倾向于不更改默认输出编码的方式
所以做以下尝试

尝试一（×）

PS D:\pythonp> chcp 65001
chcp : 无法将“chcp”项识别为 cmdlet、函数、脚本文件或可运行程序的名称。请检查名称的拼写，如果包括路径，请确保路径正确
，然后再试一次。
所在位置 行:1 字符: 1
+ chcp 65001
+ ~~~~
    + CategoryInfo          : ObjectNotFound: (chcp:String) [], CommandNotFoundException
    + FullyQualifiedErrorId : CommandNotFoundException

报错，输入chcp 65001切换当前命令行窗口工作编码格式为”UTF-8”的方式不适用。

在不想尝试改变默认输出编码的情况下只能从输出方式入手，尝试不同的输出途径

尝试一：在powershell中利用echo创建txt文件（最初导致报错的输出方式）

PS D:\pythonp> echo "TEST one">test17.1.txt

最低0.47元/天解锁文章

weixin_45938096

关注

5
点赞
踩
7

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录

*笨办法学python3 学习笔记 习题17-19

习题17 更多文件操作

如何解决文件编码错误

powershell的文件输出格式

*笨办法学python3 学习笔记习题17-19