Python学习笔记（笨方法学Python3）-习题23：字符串、字节串和字符编码

本文链接：https://blog.csdn.net/qq_34595352/article/details/99406897

import sys

script, encoding, error=sys.argv

def main(language_file, encoding, errors):
    line=language_file.readline()
    if line: # 文件到达结尾时，readline()返回的个空字符串，递归退出点
        print_line(line, encoding, errors)
        return main(language_file, encoding, errors) # 递归

def print_line(line, encoding, errors):
    next_lang=line.strip()  # 去掉'\n'
    raw_bytes=next_lang.encode(encoding, errors = errors) #编码字符串
    cooked_string=raw_bytes.decode(encoding, errors = errors) #解码字节串
    
    print(raw_bytes, "<===>", cooked_string)


fp=open("languages.txt", encoding="utf-8")

main(fp, encoding, error)

结果为：
在这里插入图片描述
*位（bit）：0，1为一位
*字节（byte）：8位（0，1）序列
*ASCII：美国信息交互标准代码
*Unicode：所有人类语言的通用编码
*UTF-8：压缩编码方式，常见字符使用8位，不够时逃去使用更大的数（16，32）字节

*在python中，string是UTF-8编码的字节序列，是显示和处理文本的基础
*bytes是python用来存储UTF-8字符串的原始字节序列，用b''告诉python你处理的是原始字节串
*处理原始字节串，需要通过.decode()来获取字符串，原始字节串不包含编码方式，它们就是字节序列，一堆数字，所以要告诉python（把它解码成UTF字节串）
*python编码出错时，必须使用.encode()来获取所要的字节
*DBES(decode bytes, encode strings) 解码字节串，编码字符串