python中的中文问题

最新推荐文章于 2024-08-10 10:09:09 发布

生年不满百，常怀千岁忧

最新推荐文章于 2024-08-10 10:09:09 发布

阅读量178

点赞数

文章标签：爬虫 visual studio python

本文链接：https://blog.csdn.net/qq_45623158/article/details/121582722

版权

# coding:utf-8
# coding:unicode_escape
import re
import codecs

# 在打开一个文件读取的时候，再同时打开一个文件来写入。
# 通过反斜杠来对过长的代码行进行拆分，
# 但要注意在\后面不要有任何字符，包括空格。
with codecs.open("movies.txt", "r", encoding="utf-8") as f, \
        codecs.open("data.txt", "w", encoding="utf-8") as out:
    for line in f:
        new_line = line.replace("\xa0", " ").strip()
        
        #获取排名跟标题
        temp = new_line.split()
        ranking = temp[0]
        title = temp[1]
        
        # 获取电影年份
        matched = re.search('\s+(\d{4})(\s|\()', new_line)
        year = matched.group(1)

        # 通过一个稍微复杂的正则来获取电影的国家，标签，评分以及评价人数
        matched = re.match(".+/\s(.+)\s/\s(.+?)\s+(\d\.\d)\s+(\d+)人评价", new_line)
        country, tag, rating, comment = matched.group(1, 2, 3, 4)
        
        # 每个数据使用逗号分隔组成一行，保存到文件data.txt里
        print("{},{},{},{},{},{},{}".format(ranking,title,rating, year,country,tag,comment),file=out)

visual studio 2022

调试

在这里插入图片描述

原因是第24行中有中文

将中文删去后：

在这里插入图片描述

这样就可以了！

生年不满百，常怀千岁忧

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python中的中文问题

# coding:utf-8# coding:unicode_escapeimport reimport codecs# 在打开一个文件读取的时候，再同时打开一个文件来写入。# 通过反斜杠来对过长的代码行进行拆分，# 但要注意在\后面不要有任何字符，包括空格。with codecs.open("movies.txt", "r", encoding="utf-8") as f, \ codecs.open("data.txt", "w", encoding="utf-8") a
复制链接

扫一扫