python字符串常量有什么区别_Python基础008--字符串常量方法、decode和encode、str和unicode的区别...-CSDN博客

本文链接：https://blog.csdn.net/weixin_39576018/article/details/113511315

字符串常量方法的掌握以及decode和encode的熟练掌握，str和unicode的区别使用

字符串常量的方法

# 列表反序 [::-1] 内置函数reversed

In [83]: s = "hello world"

In [84]: s[::-1]

Out[84]: 'dlrow olleh'

In [87]: "".join(reversed(s))

Out[87]: 'dlrow olleh'

# 索引和分片

str = "abcdefg"

print(str)--->打印全部

print(str[0])--->打印第一个值a

print(str[0:-1])--->打印全部

print(str[2:4])--->打印cd--[m,n]/表示的是从下标m到n-1

print(str[2:])--->从下标2到最后

print(str*2)--->打印全部数据两次

# 经常用到的是字符串和整数之间的转换

eg: str="2" b=int(str)-->b=2是一个整数类型

print(str[::-1])--->字符串反转

print(str[::2])--->每间隔两个下标取一个值

# 字符串大小写相关的方法

In [93]: x = "abcdefg"

In [94]: len(x) # 获取字符串的长度

Out[94]: 7

In [95]: x.upper() # 将字符串转换为大写

Out[95]: 'ABCDEFG'

In [96]: x.lower() # 将字符串转换为小写

Out[96]: 'abcdefg'

In [97]: x1 = "heLLo wORld"

In [98]: x1.isupper() # 判断字符串是否都是大写

Out[98]: False

In [99]: x1.islower() # 判断字符串是否都是小写

Out[99]: False

In [100]: x1.swapcase() # 将字符串中的字母大写转小写，小写转大写

Out[100]: 'HEllO WorLD'

In [101]: x.capitalize() # 将字符串中的首字母转为大写

Out[101]: 'Abcdefg'

# 判断类 startswith endswith--->判断字符串是否是以什么开头结尾的

In [103]: x1.startswith("index")

Out[103]: False

In [104]: x1

Out[104]: 'heLLo wORld'

In [105]: x1.startswith("he")

Out[105]: True

# 查找类函数

find --->查找字串在字符串中的位置，查找失败返回-1

index--->和find相似，查找失败报错ValueError;index-->在字符串中查找字串第一次出现的位置,返回下标；

rfind -->与find类似，区别在于从后面开始查找

In [106]: s = 'Return the lower index in S where substring sub is found'

In [107]: s.find("in")

Out[107]: 17

In [108]: s.find("hh")

Out[108]: -1

In [109]: s.rfind("is")

Out[109]: 48

In [110]: s.find("is",20) # 指定从哪个下标开始查找

Out[110]: 48

In [111]: s.index("the")

Out[111]: 7

# 拆分去重复

字符串中strip用法--->只移除字符串头尾指定的字符，中间的部分不会移除

str = "0000000this is string 0000example....wow!!!0000000"

print str.strip('0')

结果：this is string 0000example....wow!!!

字符串中lstrip()--->用于截掉字符串左边的空格或指定字符

字符串中rstrip()--->用于截掉字符串右边的空格或指定字符

字符串中split的用法--->用于分割某个字符串/得到一个分割后的列表

str = "abcdefg"

str.split("c")

结果：['ab', 'defg']

# 字符串格式化format

1、占位符或者下标形式显示

In [114]: "{} is apple".format("apple")

Out[114]: 'apple is apple'

In [115]: "{0} is apple".format("apple")

Out[115]: 'apple is apple'

2、关键字参数形式访问

In [116]: dic = {"a":1,"b":2,"c":3}

In [118]: "{a} is 1, {b} is 2,{c} is 3,{a} little {c}".format(**dic)

Out[118]: '1 is 1, 2 is 2,3 is 3,1 little 3'

3、format的其他功能

In [120]: "{:.2f}".format(3.1415926) # 保留两位小数

Out[120]: '3.14'

In [121]: "{:10.2f}".format(3.1415926) # 前面补十个空格

Out[121]: ' 3.14'

In [122]: "{:^10.2f}".format(3.1415926) # ^两端对齐

Out[122]: ' 3.14 '

In [124]: "{:_^10.2f}".format(3.1415926) # _空格补位

Out[124]: '___3.14___'

decode 和 encode

decode-->解码、encode-->编码

首先要搞清楚，字符串在python内部的表示是unicode编码；因此在做编码转的时，通常需要以unicode作为中间编码，即先将其他编码的字符串解码(decode)成unicode,再从unicode编码(encode)成另一种编码；

decode的作用是将其他编码的字符串转换成unicode编码，如str1.decode("gbk2312"),表示将gbk2312编码的字符串str1转换成unicode编码

encode的作用是将unicode编码转换成其他编码的字符串，如str2.encode("gb2312"),表示将unicode编码的字符串str2转换成gb2312编码

总的意思是：想要将其他的编码转换成utf-8必须先将其解码成unicode然后重新编码成utf-8,它是以unicode为转换媒介的

In [150]: ss = "中文"

In [151]: l = ss.decode("utf-8") # 将utf8解码成unicode

In [152]: isinstance(l,unicode)

Out[152]: True

In [153]: l = l.encode("utf-8") # 将unicode编码成utf8

In [154]: isinstance(l,unicode)

Out[154]: False

In [155]: import sys

In [156]: print sys.getdefaultencoding() # Linux系统下默认的是ascii

ascii

# 修改系统默认编码

In [167]: print sys.getdefaultencoding() # 获取系统默认编码

ascii

In [168]: reload(sys)

In [169]: sys.setdefaultencoding("utf8") # 修改系统默认编码

In [170]: print sys.getdefaultencoding()

utf8

str和unicode的区别使用

首先理解概念：str->decode("coding")->unicode->encode("coding")->str

str和unicode都是basestring下面的子类

区别：str是字符串，是unicode编码(encode)后的字节组成的

一个中文字符串在unicode中占一个字节，在gbk中占2个字节，在utf-8中占3个字节

win系统默认编码是gbk,Linux默认编码是utf-8,py文件默认编码是ascii

py文件默认的是ascii，如果用到非ascii字符，需要在文件的头部进行编码声明

# -*- coding: utf-8 -*- 或者#coding=utf-8

若头部声明coding=utf-8, a = '中文'其编码为utf-8

若头部声明coding=gb2312, a = '中文' 其编码为gbk

默认使用规则

不对str使用编码(encode)，不对unicode使用解码(decode)

In [8]: a = "中文"

In [14]: u = a.decode("utf-8")

In [15]: type(u)

Out[15]: unicode

In [16]: u = u.encode("utf-8")

In [17]: type(u)

Out[17]: str

In [18]: len(u)

Out[18]: 6