python字符串出栈方法_python字符串和文本操作

最新推荐文章于 2023-08-06 13:53:49 发布

weixin_39722563

最新推荐文章于 2023-08-06 13:53:49 发布

阅读量222

点赞数

文章标签： python字符串出栈方法

1.需要将一个字符串切割为多个字段，分隔符并不是固定不的(比如空格个数不确定)

这时就不能简单的使用string对象的split()方法，需要使用更加灵活的re.split()方法

>>> line = 'adaead jilil; sese, lsls,aea, foo'

>>> importre>>> re.split(r'[;,\s]\s*',line)

['adaead', 'jilil', 'sese', 'lsls', 'aea', 'foo']>>>

其中\s指匹配任何空白符,\S是\s的反义 *代表0次或多次；任何逗号、分号、空格，并且后面可以再紧跟任意个空格。会返回一个列表。和str.split()返回值类型一样。

当使用re.split()函数时，如果正则表达式中包含一个括号捕获分组，那么被匹配的文本(即分隔符)也将出现在结果列表中，如下：

>>> fields = re.split(r'(;|,|\s)\s*',line)>>>fields

['adaead', ' ', 'jilil', ';', 'sese', ',', 'lsls', ',', 'aea', ',', 'foo']>>>

获取分隔字符在某些情况下也是有用的，这样可以重要构造一个新的输出字符串：

>>> values = fields[::2]>>> delimiters = fields[1::2] + ['']>>>values

['adaead', 'jilil', 'sese', 'lsls', 'aea', 'foo']>>>delimiters

[' ', ';', ',', ',', ',', '']>>>line'adaead jilil; sese, lsls,aea, foo'

>>> ''.join(v+d for v,d inzip(values,delimiters))'adaead jilil;sese,lsls,aea,foo'

>>>

以上是通过步长获取分隔字符

也同样可以不以分组正则表达式，而不保存分组分隔符，使用如下形式：(?:...)

>>>line'adaead jilil; sese, lsls,aea, foo'

>>> re.split(r'(?:,|;|\s)\s*',line)

['adaead', 'jilil', 'sese', 'lsls', 'aea', 'foo']>>>

2.字体串开头或结尾匹配

可以简单的使用str.startswith()或者str.endswith()方法

>>> importos>>> files = os.listdir('./')>>> if any(filename.endswith('.py') for filename infiles):

...print('That`s python file.')

...else:

...print('There`s not python file exists.')

...

That`s python file.>>>files

['tsTserv.py']>>>

>>> [filename for filename in files if filename.endswith(('.py','.txt'))]

['tsTserv.py','locked_account.txt']

>>>

如下例子，说明此方法必须要以一个元组作为参数，否则会报错：

>>> from urllib.request importurlopen>>> defread_data(name):

...if name.startswith(('http:','https:','ftp:')):

...returnurlopen(name).read()

...else:

... with open(name) as f:

...returnf.read()

...>>> read_data('http://www.baidu.com')>>> choices = ['http:','ftp:']>>> url = 'http://www.python.org'

>>>url.startswith(choices)

Traceback (most recent call last):

File"", line 1, in TypeError: startswith first arg must be stror a tuple of str, notlist>>>

>>>url.startswith(tuple(choices))

True>>>

其它配置的开关和结尾的方法：

>>> filename = 'helloworld.py'

>>> filename[-3:] == '.py'True>>> url = 'http://www.python.org'

>>> url[:5] == 'http:' or url[:6] == 'https:' or url[:4] == 'ftp:'True>>>

>>> importre>>>url'http://www.python.org'

>>> re.match('http:|https:|ftp:',url)<_sre.sre_match object span="(0," match="http:">

>>>

3.字符匹配和搜索

正常的可以使用str.find(),str.startswith(),str.endswith();或者re模块

>>> text = 'yes no aggree not aggree'

>>> text1 = '2016-01-31'

>>> text2 = 'jan 31,2016'

>>> text == 'yes'False>>> text.find('no')4

>>> if re.match(r'\d+-\d+-\d+',text1):

...print('yes')

...else:

...print('no')

...

yes>>> if re.match(r'\d+-\d+-\d+',text2):

...print('yes')

...else:

...print('no')

...

no>>>对一个模式多次匹配>>> datepatt = re.compile(r'\d+-\d+-\d+')>>> ifdatepatt.match(text1):

...print('yes')

...else:

...print('no')

...

yes>>>datepatt.match(text2)>>> print(datepatt.match(text2))

None>>> print(datepatt.match(text1))<_sre.sre_match object span="(0," match="2016-01-31">

>>>

match()总是从字符串开始去匹配，匹配到就返回；findall()返回所有匹配到的记录

在定义正则时，通常会使用捕获分组如：

datepat = re.compile(r'(\d+)-(\d+)-(\d+)')

捕获分组可以使得后面的处理更加简单，因为可以分别将每个组的内容提取出来。

>>> datepat = re.compile(r'(\d+)-(\d+)-(\d+)')>>> m = datepat.match('2016-01-28')>>>m<_sre.sre_match object span="(0," match="2016-01-28">

>>>m.group(0)'2016-01-28'

>>> m.group(1)'2016'

>>> m.group(2)'01'

>>> m.group(3)'28'

>>>m.groups()

('2016', '01', '28')>>> year,month,day =m.groups()>>> print(year,month,day)2016 01 28

>>> text = 'today is 2016-01-28. lesson start 2016-01-01'

>>>datepat.findall(text)

[('2016', '01', '28'), ('2016', '01', '01')]>>> for year,month,day indatepat.findall(text):

...print('{}-{}-{}'.format(year,month,day))

...2016-01-28

2016-01-01

>>>

findall()方法会搜索文本并以列表形式返回所有的匹配，如果你想以迭代方式返回匹配，可以使用finditer()方法

>>>datepat.findall(text)

[('2016', '01', '28'), ('2016', '01', '01')]>>> for m indatepat.finditer(text):

...print(m.groups())

...

('2016', '01', '28')

('2016', '01', '01')>>>

4.字符串搜索和替换

如何在字符串找到匹配的模式再替换，简单的可以使用str.replace()方法。复杂的可以使用re.sub()函数

>>>text'today is 2016-01-28. lesson start 2016-01-01'

>>> re.sub(r'(\d+)-(\d+)-(\d+)',r'\2/\3/\1',text)'today is 01/28/2016. lesson start 01/01/2016'

>>>

sub()函数中第一个参数是被匹配的模式，第二个参数是替换模式。反斜杠数字比如\3指向前面模式的捕获组号。如果要多少匹配，可以先编译它来提升性能。

对于更复杂的替换，可以传递一个替换回调函数来代替，回调函数的参数是一个match对象，也就是match()/find()返回的对象。如果想知道有多少替换发生了，可以使用re.subn()函数：

>>> datepat = re.compile(r'(\d+)-(\d+)-(\d+)')>>> m = datepat.match('2016-01-28')>>>m<_sre.sre_match object span="(0," match="2016-01-28">

>>>m.group(0)'2016-01-28'

>>>m.groups()

('2016', '01', '28')>>>

>>> text = 'today is 2016-01-28. lesson start 2016-01-01'

>>> defchange_date(m):

... mon_name= month_abbr[int(m.group(2))]

...return '{} {} {}'.format(m.group(3),mon_name,m.group(1))

...>>> from calendar importmonth_abbr>>>datepat.sub(change_date,text)'today is 28 Jan 2016. lesson start 01 Jan 2016'

>>>获取更新的个数>>> newtext, n = datepat.subn(r'\3/\2/\1',text)>>>newtext'today is 28/01/2016. lesson start 01/01/2016'

>>>n2

>>>

忽略大小写搜索替换

>>> importre>>> text4 = 'PYTHON, pYTHON,Python python'

>>> re.findall('python',text4,flags=re.IGNORECASE)

['PYTHON', 'pYTHON', 'Python', 'python']>>>

>>> re.sub('python','snake',text4,flags=re.IGNORECASE)'snake, snake,snake snake'

>>>

最短匹配模式

比如想匹配字符串双引号之前的内容，有时可能匹配的结果不是想要的，因为*号的匹配是贪婪匹配

>>> text1 = 'you says "no."'

>>> str_pat = re.compile(r'\"(.*)\"')>>>str_pat.findall(text1)

['no.']>>> text2 = 'you says "no.", I say "yes."'

>>>str_pat.findall(text2)

['no.", I say "yes.']>>>

这时候要使用?修饰符，让其以最短模式匹配：

>>> str_pat = re.compile(r'\"(.*?)\"')>>>str_pat.findall(text2)

['no.', 'yes.']>>>

.号匹配除换行外的任何单个字符，通常在*/+这样的操作符后添加一个?，可以强制匹配算法改成寻找最短的可能匹配。

多行匹配模式

.号不能匹配换行，可以使用如下方法实现：

>>> text1 = '/* this is a comment */'

>>> text2 = '''/* this is a

... multiline comment */

...'''

>>> comment = re.compile(r'/\*(.*?)\*/')>>>comment.findall(text1)

['this is a comment']>>>comment.findall(text2)

[]>>> #增加对换行的支持

...>>> comment = re.compile(r'/\*((?:.|\n)*?)\*/')>>>

>>>comment.findall(text2)

['this is a \nmultiline comment']>>>

其中(.*?)代表只匹配两个*号之前的短模式匹配，(?:.|\n)*? 不捕获分隔符的短模式匹配，且把换行也当成捕获分隔符。

或者使用re.DOTALL，它可以让正则表达式中的点(.)匹配包括换行符在内的任意字符。如：

>>> comment = re.compile(r'/\*(.*?)\*/',re.DOTALL)>>>comment.findall(text2)

['this is a \nmultiline comment']>>>

但是最好定义自己的正则表达式，这样在不需要额外的标记参数下也能工作的很好。

将Unicode文本标准化

可以使用unicodedata模块先将文本标准化，后再比较。其中normalize()的第一个参数指定字符串标准化的方式。NFC表示字符应该是整体组成(比如可能的话使用单一编码)，NFD表示字符应该分解为多个组合字符表示。

>>> s1 = 'Spicy Jalape\u00f1o'

>>> s2 = 'Spicy Jalapen\u0303o'

>>>s1'Spicy Jalapeño'

>>>s2'Spicy Jalapeño'

>>> s1 ==s2

False>>>len(s1)14

>>>len(s2)15

>>> importunicodedata>>> t1 = unicodedata.normalize('NFC',s1)>>> t2 = unicodedata.normalize('NFC',s2)>>> t1 ==t2

True>>> print(ascii(t1))'Spicy Jalape\xf1o'

>>> print(ascii(t2))'Spicy Jalape\xf1o'

>>>

>>>t1'Spicy Jalapeño'

>>>

>>> t1 = unicodedata.normalize('NFD',s1)>>>t1'Spicy Jalapeño'

>>> ''.join(c for c in t1 if notunicodedata.combining(c))'Spicy Jalapeno'

>>>

删除字符串中不需要的字符

可以删除开关、结尾、中间的字符，如空白符;其中strip()方法能用于删除开始或结尾的字符，不会对中间的字符做任何操作。lstrip()和rstrip()分别从左各从右执行删除操作。默认情况下，会自动删除空白字符，但可以指定其它字符;

删除中间的字符可以使用replace,re.sub等：

>>> t = '---------hello========'

>>> t.lstrip('-')'hello========'

>>> t.rstrip('=')'---------hello'

>>> t.strip('-=')'hello'

>>>

>>> s =s.strip()>>>s'Hello World'

>>>

>>> s = 'Hello World \n'

>>>s.lstrip()'Hello World \n'

>>>s.rstrip()'Hello World'

>>> #替换

...>>> s.replace(' ','')'Hello World \n'

>>>

>>> importre>>> re.sub('\s+',' ',s)'Hello World'

>>>

审查清理文本字符串

有时候用户注册时，会输出变音符，比如 'pýtĥöñ\fis\tawesome\r\n' ，这样可以使用str.translate()方法，去除变音符

如下，通过使用dict.fromkeys()方法构造一个字典，每个unicode和音符作为键，对应的值全部为None，然后使用unicodedata.normalize()将原始输入标准化为分解形式字符。然后再调用translate函数删除所有的变音符

>>> remap ={

... ord('\t') : ' ',

... ord('\f') : ' ',

... ord('\r') : None

... }>>> s = 'pýtĥöñ\fis\tawesome\r\n'

>>> a =s.translate(remap)>>>a'pýtĥöñ is awesome\n'

>>> #空白符\t ,\f已经被映射替换了

...>>> importunicodedata>>> importsys>>> cmb_chrs = dict.fromkeys(c for c in range(sys.maxunicode) ifunicodedata.combining(chr(c)))>>> b = unicodedata.normalize('NFD',a)>>>b'pýtĥöñ is awesome\n'

>>>b.translate(cmb_chrs)'python is awesome\n'

>>>

这里将所有unicode数字字符映射到对应的ASCII字符上的表格：

>>> importsys>>> importunicodedata>>> digitmap = {c:ord('0') + unicodedata.digit(chr(c)) for c in range(sys.maxunicode) if unicodedata.category(chr(c)) == 'Nd'}>>>len(digitmap)460

>>> x = '\u0661\u0662\u0663'

>>>x.translate(digitmap)'123'

另外一种清理文本的技术涉及到I/O解码与编码函数。这里的思路是先对文本做一些初步的清理，然后再结合encode()/decode()操作来清除或修改它。

>>> a = 'pýtĥöñ is awesome\n'

>>> b = unicodedata.normalize('NFD',a)>>>b'pýtĥöñ is awesome\n'

>>> b.encode('ascii','ignore').decode('ascii')'python is awesome\n'

>>>

def clean_spaces(s):

s = s.replace('\r', '')

s = s.replace('\t', ' ')

s = s.replace('\f', ' ')

return

这里的标准化操作作将原来的文本分解为单独的和音符，接下来ASCII编码/解码只是简单的一下子丢弃掉那些字符。这种方法仅仅只在最后的目标是获取到文本对应ACSII表示的时候生效。

字符串对齐

简单的可以使用字符串ljust()/rjust()/center() ; 或者format()，只需要使用>,< ,^字符后面紧跟一个指定的宽度。

>>> text = 'Hello World'

>>> text.ljust(20)'Hello World'

>>> text.rjust(20)'Hello World'

>>> text.center(20)'Hello World'

>>> text.rjust(20,'='... )'=========Hello World'

>>> text.center(20,'-')'----Hello World-----'

>>>

>>> format(text,'>20')'Hello World'

>>> format(text,'<20')'Hello World'

>>> format(text,'^20')'Hello World'

>>>

>>> #在对齐符的前面加上要填充的字符即可

...>>> format(text,'-^20s')'----Hello World-----'

>>> format(text,'-^20')'----Hello World-----'

>>> #当格式化多个值的时候,这些格式代码也可以被用在format()方法中：

...>>> '{:>10s} {:>10s}'.format('Hello','World')'Hello World'

>>> #format()函数适用于任何值

...>>> x = 1.2534

>>> format(x,'>10')'1.2534'

>>> format(x,'^10.2f')'1.25'

>>>

字符拼接

>>> a = 'beijing'

>>> b = 'is'

>>> c = 'center'

>>> print(a + ':' + b + ':' +c)

beijing:is:center>>> print(':'.join([a,b,c]))

beijing:is:center>>> print(a,b,c,sep=':') #best

beijing:is:center>>>

字符串中插入变量

>>> s = '{name} has {n} messages.'

>>> s.format(name='QHS',n= 200)'QHS has 200 messages.'

>>> #变量在作用域可以找到,可以结合使用format_map()和vars()

...>>> name = 'QHS'

>>> n = 200

>>>s.format_map(vars())'QHS has 200 messages.'

>>> #vars()也适用于对象实例

...>>> classInfo:

...def __init__(self,name,n):

... self.name=name

... self.n=n

...>>>

>>> a = Info('QHS',200)>>>s.format_map(vars())'QHS has 200 messages.'

>>> #变量缺失时，format、format_map()会报错

...>>> s.format(name='QHS')

Traceback (most recent call last):

File"", line 1, in KeyError:'n'

>>>

一种避免这个错误的方法是另外定义一个含有__missing__方法的字典对象：

>>> classsafesub(dict):

..."""防止 key 找不到"""...def __missing__(self,key):

...return '{' + key + '}'...>>> deln>>>s'{name} has {n} messages.'

>>>s.format_map(safesub(vars()))'QHS has {n} messages.'

>>>

>>> #可以将变量替换步骤用一个工具函数封闭起来:

...>>> importsys>>> defsub(text):

...return text.format_map(safesub(sys._getframe(1).f_locals))

...>>>

>>> #就可以如下写了

...>>> name = 'qhs'

>>> n = 200

>>> print(sub('Hello {name}'))

Hello qhs>>> print(sub('You have {n} messages.'))

You have200messages.>>> print(sub('Your favorite color is {color}'))

Your favorite coloris{color}>>>

sub() 函数使用sys. getframe(1) 返回调用者的栈帧。可以从中访问属性f_locals 来获得局部变量。毫无疑问绝大部分情况下在代码中去直接操作栈帧应该是不推荐的。但是，对于像字符串替换工具函数而言它是非常有用的。另外，值得

注意的是f locals 是一个复制调用函数的本地变量的字典。尽管你可以改变f_locals的内容，但是这个修改对于后面的变量访问没有任何影响。所以，虽说访问一个栈帧看上去很邪恶，但是对它的任何操作不会覆盖和改变调用者本地变量的值。

指定列宽格式化字符串

有些长字符串，想以指定的列宽将它们重新格式化。使用textwrap模块来格式化字符串的输出。

>>> s = "Look into my eyes, look into my eyes, the eyes, the eyes, \

... the eyes, not around the eyes, don't look around the eyes, \

... look into my eyes, you're under."

>>> importtextwrap>>> print(textwrap.fill(s,50))

Look into my eyes, look into my eyes, the eyes,

the eyes, the eyes,not around the eyes, don't

look around the eyes, look into my eyes, you're

under.>>>

>>> print(textwrap.fill(s,70))

Look into my eyes, look into my eyes, the eyes, the eyes, the eyes,not around the eyes, don't look around the eyes, look into my eyes,

you're under.

>>>

>>> print(textwrap.fill(s,40,initial_indent=' '))

Look into my eyes, look into my

eyes, the eyes, the eyes, the eyes,notaround the eyes, don't look around the

eyes, look into my eyes, you're under.

>>>

>>> print(textwrap.fill(s,40,subsequent_indent=' '))

Look into my eyes, look into my eyes,

the eyes, the eyes, the eyes,notaround the eyes, don't look around

the eyes, look into my eyes, you're

under.>>>

>>> #当希望自动匹配终端大小时，可以使用os.get_terminal_size()方法来获取终端的大小尺寸

...>>> importos>>>os.get_terminal_size().columns196

>>>

fill()方法接受一些其他可选参数来控制tab,语句结尾等。

在字符串里处理 html 和 xml

比如：要将&entity/&code，替换为对应的文本。或者转换文本中特定的字符(比如 , &)

>>> #替换文本字符串的'' 使用html.escape()

...>>> s = 'Elements are written as "text".'

>>> importhtml>>> print(s)

Elements are written as"text".>>> print(html.escape(s))

Elements are written as"<tag>text</tag>".>>> #disable escaping of quotes

...>>> print(html.escape(s,quote=False))

Elements are written as"<tag>text</tag>".>>>

如果再处理ASCII文本，并且想将非ASCII文本对应的编码实体嵌入进去，可以给某些I/O函数传递参数 errors = 'xmlcharrefreplace' 来达到这个目的。

>>> s = 'Spicy Jalapeño'

>>> s.encode('ascii',errors='xmlcharrefreplace')

b'Spicy Jalapeño'

>>>#如果要解释出文本的原码，要使用html/xml的解释器>>> from html.parser importHTMLParser>>> s = 'Spicy Jalapeño'

>>> p =HTMLParser()>>>p.unescape(s)'Spicy Jalapeño'

>>>

>>> t = 'The prompt is >>>'

>>> from xml.sax.saxutils importunescape>>>unescape(t)'The prompt is >>>'

>>>

字符串令牌解析

当你想把一个字符串从左至右将其解析为一个令牌流时。

有如下一个文本字符串：

text = 'foo = 23 + 42 * 10'

为了令牌化字符串，你不仅需要匹配模式，还得指定模式的类型。比如，你可能想将字符串像下面这样转换为序列对：

tokens = [('NAME', 'foo'), ('EQ','='), ('NUM', '23'), ('PLUS','+'),('NUM', '42'), ('TIMES', '*'), ('NUM', 10')]

为了执行如下定义的切分，第一步就得利用命名捕获组的正则表达式来定义所有可能的令牌，包括空格：

>>> importre>>> NAME = r'(?P[a-zA-Z_][a-zA-Z_0-9]*)'

>>> NUM = r'(?P\d+)'

>>> PLUS = r'(?P\+)'

>>> TIMES = r'(?P\*)'

>>> EQ = r'(?P=)'

>>> WS = r'(?P\s+)'

>>> master_pat = re.compile('|'.join([NAME, NUM, PLUS, TIMES, EQ, WS]))>>>

>>>master_pat

re.compile('(?P[a-zA-Z_][a-zA-Z_0-9]*)|(?P\\d+)|(?P\\+)|(?P\\*)|(?P=)|(?P\\s+)')>>> #其中?P用于给一个模式命名，供后面使用

...>>>下一步，为了令牌化，使用模式对象的scanner() 方法。这个方法会创建一个scanner 对象，在这个对象上不断的调用match() 方法会一步步的扫描目标文本，每步一个匹配。下面是演示一个scanner 对象如何工作的交互式例子：>>> scanner = master_pat.scanner('foo = 42')>>>scanner.match()<_sre.sre_match object span="(0," match="foo">

>>>_.lastgroup, _.group()

('NAME', 'foo')>>>scanner.match()<_sre.sre_match object span="(3," match=" ">

>>>_.lastgroup, _.group()

('WS', ' ')>>>scanner.match()<_sre.sre_match object span="(4," match="=">

>>>_.lastgroup, _.group()

('EQ', '=')>>>scanner.match()<_sre.sre_match object span="(5," match=" ">

>>>_.lastgroup, _.group()

('WS', ' ')>>>scanner.match()<_sre.sre_match object span="(6," match="42">

>>>_.lastgroup, _.group()

('NUM', '42')>>>

实际使用这种技术的时候，可以很容易的像将上述代码打包到一个生成器中：

#命名元组如下：

>>> importcollections>>> Person = collections.namedtuple('Person','name age gender')>>> print('Type of Person:',type(Person))

Type of Person:

>>>

>>> Bob = Person(name='Bob', age=30,gender='male')>>> print('Representation:',Bob)

Representation: Person(name='Bob', age=30, gender='male')>>> print(Bob.name)

Bob>>> print(Bob.name,Bob.age,Bob.gender)

Bob30male>>> print("{} is {} years old {}".format(Bob)

...

... )

Traceback (most recent call last):

File"", line 1, in IndexError: tuple index out of range>>>

>>> print("%s is %d years old %s." %Bob)

Bobis 30years old male.>>>

#以下是生成一个解析的生成器

>>> from collections importnamedtuple>>> defgenerate_tokens(pat,text):

... Token= namedtuple('Token',['type','value'])

... scanner=pat.scanner(text)

...for m initer(scanner.match, None):

...yieldToken(m.lastgroup, m.group())

...>>> importre>>> NAME = r'(?P[a-zA-Z_][a-zA-Z_0-9]*)'

>>> NUM = r'(?P\d+)'

>>> PLUS = r'(?P\+)'

>>> TIMES = r'(?P\*)'

>>> EQ = r'(?P=)'

>>> WS = r'(?P\s+)'

>>>

>>> master_pat = re.compile('|'.join([NAME, NUM, PLUS, TIMES, EQ, WS]))>>>

>>> for tok in generate_tokens(master_pat,'foo = 42'):

...print(tok)

...

Token(type='NAME', value='foo')

Token(type='WS', value=' ')

Token(type='EQ', value='=')

Token(type='WS', value=' ')

Token(type='NUM', value='42')>>>

如果一个模式恰好是另一个更长模式的子字符串，那么你需要确定长模式写在前面。比如：

>>> LT = r'(?P

>>> LE = r'(?P<=)'

>>> EQ = r'(?P=)'

>>> master_pat = re.compile('j'.join([LE, LT, EQ])) # 正确

>>> # master_pat = re.compile('j'.join([LT, LE, EQ])) # 错误

字节字符串上的字符串操作

在字节字符串上执行普通的文本操作移除、搜索、替换；支持大部分和文本字符串一样的内置操作：

>>> data = b'Hello World'

>>> data[0:5]

b'Hello'

>>>data.split()

[b'Hello', b'World']>>>

>>> data.replace(b'Hello',b'Hello bad')

b'Hello bad World'

>>> #同样适用于字节数组

...>>> data = bytearray(b'Hello World')>>> data[:5]

bytearray(b'Hello')>>>data.split()

[bytearray(b'Hello'), bytearray(b'World')]>>>

>>> importre>>> data = b'qi:heng:shan'

>>> re.split('[:]',data)

Traceback (most recent call last):

File"", line 1, in File"/usr/local/python3.4/lib/python3.4/re.py", line 200, insplitreturn_compile(pattern, flags).split(string, maxsplit)

TypeError: can't use a string pattern on a bytes-like object

>>>

>>> re.split(b'[:]',data)

[b'qi', b'heng', b'shan']>>>

区别文本字符串的索引操作会返回对应的字符，字节字符串的索引操作则返回整数：

>>> a = 'Hello World'

>>> b = b'Hello World'

>>>a[0]'H'

>>>b[0]72

>>> print(b)

b'Hello World'

>>> print(b.decode('ascii'))

Hello World>>>要先解码成文本字符串，才能正常打印出来#字节字符串没有格式化的操作#如果想格式化字节字符串，得先使用标准的文本字符串，然后将其编码为字节字符串

>>> '{:10s} {:10s} {:>10s}'.format('python','is','good').encode('ascii')

b'python is good'

weixin_39722563

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python字符串出栈方法_python字符串和文本操作

1.需要将一个字符串切割为多个字段，分隔符并不是固定不的(比如空格个数不确定)这时就不能简单的使用string对象的split()方法，需要使用更加灵活的re.split()方法>>> line = 'adaead jilil; sese, lsls,aea, foo'>>> importre>>> re.split(r'[;,\s]\s*'...
复制链接

扫一扫