python 常用模块简介string 模块

最新推荐文章于 2024-04-23 00:46:58 发布

谁不小心的

最新推荐文章于 2024-04-23 00:46:58 发布

阅读量9.7k

点赞数 1

分类专栏： python 文章标签： python string

本文链接：https://blog.csdn.net/trochiluses/article/details/9232723

版权

python 专栏收录该内容

24 篇文章 0 订阅

订阅专栏

#/usr/bin/python

首先声明一点：str模块已经是python2.x以后的过时模块，不推荐使用。字符串类型包括str类型和unicode类型，每个字符串类型又可以采用utf-8，ascii等多种编码，其中编码的指定可以在脚本的第二行指定。

1.概述

1.1字符串的初始化

1）使用“”或者‘’将字符串包含起来，同时，如果字符串内部含有“或者‘，不需要用反斜杠进行转义。

2）扩展到多行，利用反斜杠

big=”this is a long string\
that spans two lines."

3)分行：使用\n

big=”this is a long string\n\
that spans two lines."

分行还可以用一个三引用符号来实现

bigger=“””
this is a long string
that spans two lines
happy
“”“

4）保持字符串原貌，禁止转义：r

big=r”this is a long string\n\
that spans two lines."

string 模块提供了一些用于处理字符串类型的函数

File: string-example-1.py

import string
text = "Monty Python's Flying Circus"
print "upper", "=>", string.upper(text)#大小写转换
print "lower", "=>", string.lower(text)
print "split", "=>", string.split(text)#字符串分割
print "join", "=>", string.join(string.split(text), "+")#字符串连接
print "replace", "=>", string.replace(text, "Python", "Java")#字符串替换
print "find", "=>", string.find(text, "Python"), string.find(text,"Java")#字符串查找
print "count", "=>", string.count(text, "n")#字符串计数
upper => MONTY PYTHON'S FLYING CIRCUS
lower => monty python's flying circus
split => ['Monty', "Python's", 'Flying', 'Circus']
join => Monty+Python's+Flying+Circus
replace => Monty Java's Flying Circus
find => 6 -1
count => 3

在 Python 1.5.2 以及更早版本中, string 使用 strop 中的函数来实现模块功能.在 Python1.6 和后继版本,更多的字符串操作都可以作为字符串方法来访问,string 模块中的许多函数只是对相对应字符串方法的封装.

2.使用字符串方法替代 string 模块函数

File: string-example-2.py

text = "Monty Python's Flying Circus"
“upper", "=>", text.upper()
"lower", "=>", text.lower()
"split", "=>", text.split()
"join", "=>", "+".join(text.split())
"replace", "=>", text.replace("Python", "Perl")
"find", "=>", text.find("Python"), text.find("Perl")
"count", "=>", text.count("n")
upper => MONTY PYTHON'S FLYING CIRCUS
lower => monty python's flying circus
split => ['Monty', "Python's", 'Flying', 'Circus']
join => Monty+Python's+Flying+Circus
replace => Monty Perl's Flying Circus
find => 6 -1
count => 3

为了增强模块对字符的处理能力, 除了字符串方法, string 模块还包含了类型转换函数用于把字符串转换为其他类型, (如 Example 1-53 所示).

3.使用 string 模块将字符串转为数字

如果需要数字转化成字符串，需要使用str()函数

str（123）

File: string-example-3.py

import string
int("4711"),
string.atoi("4711"),
string.atoi("11147", 8), # octal 八进制
string.atoi("1267", 16), # hexadecimal 十六进制
string.atoi("3mv", 36) # whatever...
print string.atoi("4711", 0),
print string.atoi("04711", 0),
print string.atoi("0x4711", 0)
print float("4711"),
print string.atof("1"),
print string.atof("1.23e5")
4711 4711 4711 4711 4711
4711 2505 18193
4711.0 1.0 123000.0

大多数情况下 (特别是当你使用的是 1.6 及更高版本时) ,你可以使用 int 和float 函数代替 string 模块中对应的函数。atoi 函数可以接受可选的第二个参数, 指定数基(number base). 如果数基为0, 那么函数将检查字符串的前几个字符来决定使用的数基: 如果为 "0x," 数基将为 16 (十六进制), 如果为 "0," 则数基为 8 (八进制). 默认数基值为 10(十进制), 当你未传递参数时就使用这个值.在 1.6 及以后版本中, int 函数和 atoi 一样可以接受第二个参数. 与字符串版本函数不一样的是 , int 和 float 可以接受 Unicode 字符串对象.

4.一些函数剖析

4.1函数原型strip

声明：s为字符串，rm为要删除的字符序列

s.strip(rm) 删除s字符串中开头、结尾处，位于 rm删除序列的字符

s.lstrip(rm) 删除s字符串中开头处，位于 rm删除序列的字符

s.rstrip(rm) 删除s字符串中结尾处，位于 rm删除序列的字符

注意：

4.1.1. 当rm为空时，默认删除空白符（包括'\n', '\r', '\t', ' ')

例如：

>>> a = ' 123'

>>> a.strip()
'123'
>>> a='\t\tabc'
'abc'
>>> a = 'sdff\r\n'
>>> a.strip()
'sdff'

4.1.2.这里的rm删除序列是只要边（开头或结尾）上的字符在删除序列内，就删除掉。

例如：

>>> a = '123abc'

>>> a.strip('21')
'3abc' 结果是一样的
>>> a.strip('12')
'3abc'

4.2split方法

python 字符串的split方法是用的频率还是比较多的。比如我们需要存储一个很长的数据，并且按照有结构的方法存储，方便以后取数据进行处理。当然可以用json的形式。但是也可以把数据存储到一个字段里面，然后有某种标示符来分割比如我们的存储的格式的：

#列表对象自动用逗号分割开来

4.3join方法

list = [1, 2, 3, 4, 5, 6, 7]
','.join(str(i) for i in list)   #str(i) for i in list  为啥这么写可以执行成功,

join是string类型的一个函数，用调用他的字符串去连接参数里的列表‘,'.join调用者是','，python里面万物皆对象，','是一个string类型的对象，调用join函数，将后面的列表里的值用逗号连接成新的字符串；str(i) for i in list 这是一个映射，就是把list中每个值都转换成字符串。如果你要str(i) for i in list的结果是['1', '2', '3', '4', '5', '6', '7']

注意，这个函数如果不制定第二个参数string.jion(list),那么，自动会使用空格作为分割符。

string.join(sep):　　以string作为分割符，将sep中所有的元素(字符串表示)合并成一个新的字符串

>>>li = ['my','name','is','bob']

>>>' '.join(li)

'my name is bob'

>>>'_'.join(li)

'my_name_is_bob'

>>> s = ['my','name','is','bob']

>>> ' '.join(s)

'my name is bob'

>>> '..'.join(s)

'my..name..is..bob'

4.4capwords（）将一个字符串中所有单词，首字母大写

4.5maketrans（）创建转化表

4.6编码函数decode()&&encode()

S.encode([encoding[,errors]]) -> object

S.decode([encoding[,errors]]) -> object

5.模板

template strings

Templates provide simpler string substitutions as described in PEP 292. Instead of the normal %-based substitutions, Templates support $-based substitutions, using the following rules:

$$ is an escape; it is replaced with a single $.定界符
$identifier names a substitution placeholder matching a mapping key of "identifier". By default, "identifier" must spell a Python identifier. The first non-identifier character after the $ character terminates this placeholder specification.命名变量
${identifier} is equivalent to $identifier. It is required when valid identifier characters follow the placeholder but are not part of the placeholder, such as "${noun}ification".用大括号括起来的命名变量

>>> from string import Template
>>> s = Template('$who likes $what')
>>> s.substitute(who='tim', what='kung pao')
'tim likes kung pao'
>>> d = dict(who='tim')
>>> Template('Give $who $100').substitute(d)
Traceback (most recent call last):
...
ValueError: Invalid placeholder in string: line 1, col 11
>>> Template('$who likes $what').substitute(d)
Traceback (most recent call last):
...
KeyError: 'what'
>>> Template('$who likes $what').safe_substitute(d)
'tim likes $what'

它实际上是一种替换机制。

5.2高级模板

Advanced usage: you can derive subclasses of Template to customize the placeholder syntax, delimiter character, or the entire regular expression used to parse template strings. To do this, you can override these class attributes:

delimiter – This is the literal string describing a placeholder introducing delimiter. The default value is $. Note that this shouldnot be a regular expression, as the implementation will call re.escape() on this string as needed.
idpattern – This is the regular expression describing the pattern for non-braced placeholders (the braces will be added automatically as appropriate). The default value is the regular expression [_a-z][_a-z0-9]*.