python 字节编码_【Python】08、字符串格式化、编码及bytes、bytearray

最新推荐文章于 2023-03-27 08:37:39 发布

weixin_39709178

最新推荐文章于 2023-03-27 08:37:39 发布

阅读量305

点赞数

文章标签： python 字节编码

一、字符串格式化

1、字符串格式化

字符串格式化是拼接字符串的一种手段

此前学过str.join()和+来拼接str，但难以控制格式

str格式化有另种方式printf style 和str.format()

2、printf style

从c语言继承过来的In [2]: s = 'i love %s'

待格式化的字符串，当一个str存在占位符的时候；

占位符：%加一个格式控制符In [3]: s

Out[3]: 'i love %s'

In [4]: s %('python',)

Out[4]: 'i love python'

In [5]: s %('python')

Out[5]: 'i love python'

In [6]: s %'python'

Out[6]: 'i love python'

In [7]: 'i love %s' %'python'

Out[7]: 'i love python'

传入参数顺序的替换占位符，返回替换后的str，原str不变In [9]: 'i love %s, i am %d' % ('python', 18)

Out[9]: 'i love python, i am 18'

In [11]: 'i love %s, i am %d' % ('python' 18)

File "", line 1

'i love %s, i am %d' % ('python' 18)

SyntaxError: invalid syntax

In [13]: 'i love %s, i am %d' % ('python',)

---------------------------------------------------------------------------

TypeError Traceback (most recent call last)

in ()

----> 1 'i love %s, i am %d' % ('python',)

TypeError: not enough arguments for format string

In [14]: 'i love %s, i am %d' % ('python',"xxj")

---------------------------------------------------------------------------

TypeError Traceback (most recent call last)

in ()

----> 1 'i love %s, i am %d' % ('python',"xxj")

TypeError: %d format: a number is required, not str

当占位符个数和参数个数不匹配的时候，会抛出TypeError

In [25]: 'i love %s, i am %d' % ('python', 18)

Out[25]: 'i love python, i am 18'

In [26]: 'i love %s, i am %d' % ('python', "xxj")

---------------------------------------------------------------------------

TypeError Traceback (most recent call last)

in ()

----> 1 'i love %s, i am %d' % ('python', "xxj")

TypeError: %d format: a number is required, not str

In [27]: 'i love %s, i am %d' % (18, 18) # 为什么这里类型不对，但没报错

Out[27]: 'i love 18, i am 18'

In [28]: 'i love %s, i am %d' % ([1, 2], 18)

Out[28]: 'i love [1, 2], i am 18'

In [29]: 'i love %s, i am %d' % (1, 2, 18)

---------------------------------------------------------------------------

TypeError Traceback (most recent call last)

in ()

----> 1 'i love %s, i am %d' % (1, 2, 18)

TypeError: not all arguments converted during string formatting

In [30]: 'i love %s, i am %d' % ((1, 2), 18)

Out[30]: 'i love (1, 2), i am 18'

%s：表示str或任意对象，或隐式的调用str()将对象转化成str

语法格式：

print “String %format1 %format2 …” %(variable1, varialbe2, …)

%后面format前面可以使用的修饰符，(如果有，则只能按如下顺序)：

%[(name)][flags][width][.precision]typecode

typecode就是上面的format和图中的字符

位于括号中的一个属于后面的字典的键名，用于选出一个具体项

flags是下面标志中的一个或多个：

-：表示左对齐，默认为右对齐

+：表示包含数字符号，正数也会带“+”

0：表示一个零填充

width指定最小宽度的数字

.用于按照精度分割字段的宽度

precision指定要打印字符串中的最大字符个数，浮点数中小数点之后的位数，或者整数的最小位数(前面补0)；## %s

In [34]: 'I love %s, i am %d' % ("python", 18)

Out[34]: 'I love python, i am 18'

In [35]: 'I love %-s, i am %d' % ("python", 18)

Out[35]: 'I love python, i am 18'

In [36]: 'I love %-30s, i am %d' % ("python", 18)

Out[36]: 'I love python , i am 18'

In [37]: 'I love %30s, i am %d' % ("python", 18)

Out[37]: 'I love python, i am 18'

In [38]: 'I love %030s, i am %d' % ("python", 18)

Out[38]: 'I love python, i am 18'

In [39]: 'I love %-030s, i am %d' % ("python", 18)

Out[39]: 'I love python , i am 18'

In [40]: 'I love %-030.5s, i am %d' % ("python", 18)

Out[40]: 'I love pytho , i am 18'

## %d

In [49]: 'I love %s, i am %d' % ("python", 18)

Out[49]: 'I love python, i am 18'

In [50]: 'I love %s, i am %20d' % ("python", 18)

Out[50]: 'I love python, i am 18'

In [51]: 'I love %s, i am %020d' % ("python", 18)

Out[51]: 'I love python, i am 00000000000000000018'

In [52]: 'I love %s, i am %-20d' % ("python", 18)

Out[52]: 'I love python, i am 18

In [53]: 'I love %s, i am %-20.5d' % ("python", 18)

Out[53]: 'I love python, i am 00018 '

In [54]: 'I love %s, i am %-20.6d' % ("python", 18)

Out[54]: 'I love python, i am 000018

## %f

In [43]: 'I love %s, i am %f' % ("python", 18)

Out[43]: 'I love python, i am 18.000000'

In [44]: 'I love %s, i am %20f' % ("python", 18)

Out[44]: 'I love python, i am 18.000000'

In [45]: 'I love %s, i am %020f' % ("python", 18)

Out[45]: 'I love python, i am 0000000000018.000000'

In [46]: 'I love %s, i am %-020f' % ("python", 18)

Out[46]: 'I love python, i am 18.000000 '

In [47]: 'I love %s, i am %-020.5f' % ("python", 18)

Out[47]: 'I love python, i am 18.00000 '

In [48]: 'I love %s, i am %-020.5d' % ("python", 18)

Out[48]: 'I love python, i am 00018

printf style 格式化对其它语言，尤其是c语言转过来的人，非常容易接受；但并不是Python建议使用的方法。

3、str.format()In [67]: 'I love {}'.format('python')

Out[67]: 'I love python'

str.format()使用大括号作为占位符

当调用str.format()方法，format()传入的参数会替换大括号In [68]: 'I love {}, i am {}'.format('python', 18)

Out[68]: 'I love python, i am 18'

format()的参数个数是可变的In [70]: 'I love {}, i am {}'.format('python', 18)

Out[70]: 'I love python, i am 18'

In [71]: 'I love {}, i am {}'.format(18, 'python')

Out[71]: 'I love 18, i am python'

In [72]: 'I love {1}, i am {0}'.format(18, 'python')

Out[72]: 'I love python, i am 18'

In [73]: 'I love {1}, i am {1}'.format(18, 'python')

Out[73]: 'I love python, i am python'

In [74]: 'I love {1}, i am {1}'.format(18) # 占位符中的数字指定的位置参数需要存在

---------------------------------------------------------------------------

IndexError Traceback (most recent call last)

in ()

----> 1 'I love {1}, i am {1}'.format(18)

IndexError: tuple index out of range

In [75]: 'I love {0}, i am {1}'.format(18)

---------------------------------------------------------------------------

IndexError Traceback (most recent call last)

in ()

----> 1 'I love {0}, i am {1}'.format(18)

IndexError: tuple index out of range

In [76]: 'I love {0}, i am {0}'.format(18) # 占位符中的数字可以多次调用一个位置参数

Out[76]: 'I love 18, i am 18'

可以使用占位符加数字调用format的位置参数，并且可以多次调用同一个位置参数In [77]: 'I love {lang}, i am {age}'.format(lang='python', age=18)

Out[77]: 'I love python, i am 18'

In [78]: 'I love {lang}, i am {lang}'.format(lang='python', age=18)

Out[78]: 'I love python, i am python'

In [79]: 'I love {lang}, i am {lang}'.format(lang='python')

Out[79]: 'I love python, i am python'

In [81]: 'My name is {0}, i love {lang}, i am {age}'.format('xxj', lang='python', age=18)

Out[81]: 'My name is xxj, i love python, i am 18'

可以在占位符里加标识符，来使用关键字参数

可以同时支持位置参数和关键字参数

占位符和参数不匹配时，会抛出异常In [82]: '{} {}'.format(18)

---------------------------------------------------------------------------

IndexError Traceback (most recent call last)

in ()

----> 1 '{} {}'.format(18)

IndexError: tuple index out of range

In [83]: '{} {lang}'.format(18)

---------------------------------------------------------------------------

KeyError Traceback (most recent call last)

in ()

----> 1 '{} {lang}'.format(18)

KeyError: 'lang'

In [84]: '{1} {2}'.format(0, 1, 2)

Out[84]: '1 2'

In [85]: '{1} {2}'.format("a", "b", "c")

Out[85]: 'b c'

{}会按照顺序使用位置参数

{数字}会把位置参数当成一个列表args, agrs[i]，当i不是args的索引的时候，抛出IndexError

{关键字｝会把关键字参数当成一个字典kwargs，使用kwargs[k]当k不是kwargs的key时，会抛出KeyError

在python2.6版本中，不能胜率大括号里的数字或者关键字

如何print 大括号呢？In [90]: '{}'.format(18)

Out[90]: '18'

In [91]: '{{}}'.format(18)

Out[91]: '{}'

In [92]: '{{{}}}'.format(18)

Out[92]: '{18}'

In [93]: '{{{}}}'.format()

---------------------------------------------------------------------------

IndexError Traceback (most recent call last)

in ()

----> 1 '{{{}}}'.format()

IndexError: tuple index out of range

二、bytes

1、bytes

bytes是python新引入的type

str是文本序列，bytes是字节序列

文本是有编码的(utf-8,gbk,GB18030等)，字节没有编码这种说法

文本的编码是指，如何使用字节来表示字符

python3 str默认使用utf-8编码

str的所有操作除了encode，都有队友bytes的版本，但是传入的参数也必须是bytesIn [175]: b = b'i love python'

In [176]: b.find('o')

---------------------------------------------------------------------------

TypeError Traceback (most recent call last)

in ()

----> 1 b.find('o')

TypeError: a bytes-like object is required, not 'str'

In [177]: b.find(b'o')

Out[177]: 3

In [180]: s = '马哥教育'

In [181]: s.encode()

Out[181]: b'\xe9\xa9\xac\xe5\x93\xa5\xe6\x95\x99\xe8\x82\xb2'

In [182]: s.encode().find(b'\xac') # bytes的操作是按字节来的

Out[182]: 2

In [184]: b

Out[184]: b'i love python'

In [185]: b.decode() # bytes所特有的方法

Out[185]: 'i love python'

In [186]: b.hex() # bytes所特有的方法

Out[186]: '69206c6f766520707974686f6e'

2、str转为bytesIn [145]: help(str.encode)

Help on method_descriptor:

encode(...)

S.encode(encoding='utf-8', errors='strict') -> bytes

Encode S using the codec registered for encoding. Default encoding

is 'utf-8'. errors may be given to set a different error

handling scheme. Default is 'strict' meaning that encoding errors raise

a UnicodeEncodeError. Other possible values are 'ignore', 'replace' and

'xmlcharrefreplace' as well as any other name registered with

codecs.register_error that can handle UnicodeEncodeErrors.

In [103]: s.encode() # 将str编码成bytes

Out[103]: b'\xe9\xa9\xac\xe5\x93\xa5\xe6\x95\x99\xe8\x82\xb2' # 每3个16进制的数字表示一个中文

In [106]: type(s.encode)

Out[106]: builtin_function_or_method

In [107]: type(s.encode())

Out[107]: bytes

In [109]: '马'.encode()

Out[109]: b'\xe9\xa9\xac'

In [127]: bin(0xe9) # 将16进制转化为二进制

Out[127]: '0b11101001'

In [128]: bin(0xa9)

Out[128]: '0b10101001'

In [129]: bin(0xac)

Out[129]: '0b10101100'

11101001 10101001 10101100 代码马字

In [133]: s.encode()

Out[133]: b'\xe9\xa9\xac\xe5\x93\xa5\xe6\x95\x99\xe8\x82\xb2'

In [134]: s.encode(gbk)

---------------------------------------------------------------------------

NameError Traceback (most recent call last)

in ()

----> 1 s.encode(gbk)

NameError: name 'gbk' is not defined

In [135]: s.encode("gbk") # 使用不同的编码，所得到的bytes不同

Out[135]: b'\xc2\xed\xb8\xe7\xbd\xcc\xd3\xfd'

In [136]: s.encode("GBK")

Out[136]: b'\xc2\xed\xb8\xe7\xbd\xcc\xd3\xfd'

In [137]: s.encode("utf8")

Out[137]: b'\xe9\xa9\xac\xe5\x93\xa5\xe6\x95\x99\xe8\x82\xb2'

In [138]: s.encode("utf-8")

Out[138]: b'\xe9\xa9\xac\xe5\x93\xa5\xe6\x95\x99\xe8\x82\xb2'

3、bytes转为strIn [143]: help(bytes.decode)

Help on method_descriptor:

decode(self, /, encoding='utf-8', errors='strict')

Decode the bytes using the codec registered for encoding.

encoding

The encoding with which to decode the bytes.

errors

The error handling scheme to use for the handling of decoding errors.

The default is 'strict' meaning that decoding errors raise a

UnicodeDecodeError. Other possible values are 'ignore' and 'replace'

as well as any other name registered with codecs.register_error that

can handle UnicodeDecodeErrors.

In [139]: s.encode().decode() # decode()解码

Out[139]: '马哥教育'

In [140]: s.encode().decode("gbk") # 需要使用编码时所使用的编码方式才能正确解码

---------------------------------------------------------------------------

UnicodeDecodeError Traceback (most recent call last)

in ()

----> 1 s.encode().decode("gbk")

UnicodeDecodeError: 'gbk' codec can't decode byte 0xac in position 2: illegal multibyte sequence

三、bytearray

1、bytearray

bytearray是bytes的可变版本

str和bytes都是不可变的In [197]: help(bytearray)

Help on class bytearray in module builtins:

class bytearray(object)

| bytearray(iterable_of_ints) -> bytearray

| bytearray(string, encoding[, errors]) -> bytearray

| bytearray(bytes_or_buffer) -> mutable copy of bytes_or_buffer

| bytearray(int) -> bytes array of size given by the parameter initialized with null bytes

| bytearray() -> empty bytes array

| Construct a mutable bytearray object from:

| - an iterable yielding integers in range(256)

| - a text string encoded using the specified encoding

| - a bytes or a buffer object

| - any object implementing the buffer API.

| - an integer

In [206]: bytearray(10)

Out[206]: bytearray(b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00')

In [207]: bytearray(b"10")

Out[207]: bytearray(b'10')

In [208]: bytearray(b"abc")

Out[208]: bytearray(b'abc')

In [209]: bytearray("abc")

---------------------------------------------------------------------------

TypeError Traceback (most recent call last)

in ()

----> 1 bytearray("abc")

TypeError: string argument without an encoding

In [210]: bytearray("abc", encoding)

---------------------------------------------------------------------------

NameError Traceback (most recent call last)

in ()

----> 1 bytearray("abc", encoding)

NameError: name 'encoding' is not defined

In [211]: bytearray("abc", "utf")

Out[211]: bytearray(b'abc')

In [212]: bytearray("abc", "utf8")

Out[212]: bytearray(b'abc')

In [213]: bytearray("abc", "utf-8")

Out[213]: bytearray(b'abc')

In [215]: bytearray([1, 2])

Out[215]: bytearray(b'\x01\x02')

In [226]: b = bytearray(12) # bytearry 是可变的

In [227]: b

Out[227]: bytearray(b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00')

In [228]: b[3]

Out[228]: 0

In [229]: b[3]= 5

In [230]: b

Out[230]: bytearray(b'\x00\x00\x00\x05\x00\x00\x00\x00\x00\x00\x00\x00')

bytearray相对于bytes来说，多了insert、append、extend、pop、remove、clear reverse方法，并且可以索引操作

bytearray的insert、append、remove、count的参数必须是int，因为bytearray操作的字节，但python中没有byte这种类型，可以用int来表示byte。int必须在0-256这个范围内

weixin_39709178

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python 字节编码_【Python】08、字符串格式化、编码及bytes、bytearray

一、字符串格式化1、字符串格式化字符串格式化是拼接字符串的一种手段此前学过str.join()和+来拼接str，但难以控制格式str格式化有另种方式printf style 和str.format()2、printf style从c语言继承过来的In[2]:s='ilove%s'待格式化的字符串，当一个str存在占位符的时候；占位符：%加一个格式控制符In[3]:sOut[3]:...
复制链接

扫一扫