字符串和字节序列及编码解码

最新推荐文章于 2022-11-19 08:50:16 发布

庸人自扰665

最新推荐文章于 2022-11-19 08:50:16 发布

阅读量508

点赞数

分类专栏：关于python 文章标签： python

本文链接：https://blog.csdn.net/zxl061/article/details/122012392

版权

关于python 专栏收录该内容

6 篇文章 1 订阅

订阅专栏

字符串（字符序列）和字节序列

字符
- 由于历史原因，将字符定义为unicode字符还不够准确，但是未来字符的定义一定是unicode
字节

就是字符的二进制表现形式
码位

我们计算机现实的实际上是码位
```
>>>'你好'.encode("unicode_escape").decode()
'\\u4f60\\u597d'
>>>
>>>'\u4f60\u597d'
'你好'
```
- UNICODE标准以4-6个十六进制数字表示
编码
- 字符序列（string）->字节序列（bytes）-------------编码（encode）
```
>>>"你好".encode("utf-8")
b'\xe4\xbd\xa0\xe5\xa5\xbd'
```
- 字节序列(bytes)->字符序列（string)-------------------解码（decode）
```
>>>b
b'\xe4\xbd\xa0\xe5\xa5\xbd'
>>>b.decode("utf")
'你好'
```
编码错误
- 乱码和混合编码
  - 检查编码
    
    没有办法通过字节序列来得出编码格式，都是通过统计学预估当前的编码
```
#安装chardet
pip intsall chardet

#导入chardet
>>>import chardet
>>>chardet.detect(b)
```
    - 解决乱码和混合编码
      - 忽略错误编码
        
        >>>b_2.decode("utf-8",errors='ignore') '你好'
      - 利用鬼符来进行替换
        
        >>>b_2.decode("utf-8",errors='replace')

字符串CRUD操作

通过dir("")可以查看当前字符串的操作方法

Create(创建)

+

>>>a = "a"
id(a)
2018067917656
>>>a = a + "b"
>>>id(a)
2018112569504
>>>a
'ab'

+=

a += "b" 就是 a = a + "b"  省略写法

Retrieve（检索）

根据索引获取字符

在计算机语言中，索引值是从0开始数的
```
>>>a = "hello,world"
>>>a[1]
'e'
```

find和index(获取目标字符的索引值)

>>>a.find("e")
1
>>>a.find("!")      #find找不到会返回-1
-1

#找不到目标字符时，index会报错
>>>a.index("!")
Traceback (most recent call last):
  File "<input>", line 1, in <module>
ValueError: substring not found

startswith和endwith

>>>f = "2021-12-18-xxxxxx"
>>>f.startswith("2021-12-18")
True
>>>f = "xxxxxx.jpg"
>>>f.endswith("jpg")
True

UPDATE(更新)

返回的是一个新的字符串
```
a.replace("wer", "wor")
```

split(分隔)

>>>a = "<<python>>,<<java>>,<<c++>>"
>>>a.split(",")
['<<python>>', '<<java>>', '<<c++>>']

join(拼接)

>>>b
['<<python>>', '<<java>>', '<<c++>>']
>>>",".join(b)
'<<python>>,<<java>>,<<c++>>'

DELETE（删除)

strip(去皮)

>>>a
'                         hello,word                     '
>>>a.strip()
'hello,word'
>>>

lstrip

rstrip

>>>a.lstrip()    #删除前端
'hello,word                     '
>>>a.rstrip()    #删除后端
'                         hello,word'

字符串的输出和输入

保存到文件

#open函数打开一个文件，没有文件会新建，但是路径不对会报错
#指定文件名，方法（读，写，追加），编码格式
output = open("output.txt","w",encoding="utf-8")
content = "Hello,world"
#正式写入文件
output.write(content)
#关闭文件句柄
output.close()

读取文件

input = open("output.txt","r",encoding="utf-8")
#获取文件中的内容
content = input.read()
print(content)

#暂时理解为只能读取一遍
content_2 = input.read()
print(content_2)

追加文件

output = open("output.txt","a",encoding="utf-8")
content = "\nHello,world"
#正式写入文件
output.write(content)
#关闭文件句柄
output.close()

字符串的格式化输出

format

按照传入参数默认顺序

a = "ping"
b = "pong"

print("play pingpong:{},{}".format(a,b))

按照指定的参数索引

a = "ping"
b = "pong"

print("play pingpong:{0},{1},{0},{1}".format(a,b))

按关键词参数

a = "ping"
b = "pong"

print("play pingpong:{a},{b},{a},{b}".format(a='ping',b='pong'))

按变量（推荐），但是只有3.6以上才可以使用
```
a = "ping"
b = "pong"

print(f"playing pingpong:{a},{b}")
```
小数的表示
```
>>>"{:.2f}".format(3.14159)
'3.14'
```

"playing %s %s" % ("ping","pong")
'playing ping pong'

课后作业

联系字符串的编码和解码
联系字符串的CRUD
联系字符串的格式化

下节请见链接：

庸人自扰665

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
字符串和字节序列及编码解码

自学python第二章第二节
复制链接

扫一扫

专栏目录