python文本处理

最新推荐文章于 2024-07-31 14:30:10 发布

码上的生活

最新推荐文章于 2024-07-31 14:30:10 发布

阅读量734

点赞数

文章标签： python 字符串

本文链接：https://blog.csdn.net/sherlockzoom/article/details/48946769

版权

python 专栏收录该内容

15 篇文章 0 订阅

订阅专栏

基本的文本操作：

解析数据并将数据放入程序内部的结构
将数据以某种方式转化为另一种相似的形式，数据本身发生了改变
生成全新的数据

在python中，可以用下列方式表现一个文本字符串'',""：

'this is a literal string'
Out[1]: 'this is a literal string'

"this is a literal string"
Out[2]: 'this is a literal string'

使用3引用符，无须在文本中加入换行和续行。按原貌存储。

bigger = """
this is an even
bigger string that
spans three lines
"""

在字符串面前加r或R，表示该字符串是一个真正的“原”字符串，在字符串前面加一个u或U使之成为一个unicode

big = r"This is a long string \
    with a backslash and a newline in it"
hello = u'Hello\u0020World'

s.isdigit()
s.upper()
s.count('needle')

1.1 每次处理一个字符

调用内建list，用字符串作为参数： thelist = list(thestring)
for语句完成遍历：

for c in thestring:
    do_something_with(c)

results = [do_something_with(c) for c in thestring]

用map函数处理：results = map(do_something, thestring)

内建函数chr,str,ord,unichr

print ord('a')
97

print chr(97)
a

print ord(u'\u2002')
8194

print repr(unichr(8224))
u'\u2020'

print map(ord, 'dxlmaoe')
[100, 120, 108, 109, 97, 111, 101]

测试一个对象是否是类字符串

利用内建的isinstance和basestring 来简单快速检查某个对象是否是字符串或者Unicode对象

def isAstring(anobj):
    return isinstance(anobj, basestring)

或许还有类型测试的方法：

def isExactlyAString(anobj):
    return type(anobj) is type('')

然而Unicode对象无法通过。basestring 是 str和unicode类型的共同基类。但是python标准库中的UserString类的实例，是无能为力的。

def isStringLike(anobj):
    try: anobj + ''
    except: return False
    else: return True

Python中通常的类型检查方法是所谓的鸭子判断法：如果它走路像鸭子，叫声也像鸭子，那么对于我们的应用而言，就可以认为它是鸭子。

字符串对齐

实现字符串对齐：左对齐，居中对齐，或者右对齐

print '|','hej'.ljust(20),'|','hej'.rjust(20),'|','hej'.center(20),'|'
| hej                  |                  hej |         hej          |

print 'hej'.center(20, '+')
++++++++hej+++++++++

去除字符串两端的空格

获得一个开头和末尾都没有多余空格的字符串

x = '  hej  '

print '|',x.lstrip(), '|', x.rstrip(), '|', x.strip(), '|'
| hej   |   hej | hej |

x = 'xyxxyy hejyx yyx'

print '|'+x.strip('xy')+'|'
| hejyx |

合并字符串

有一些小的字符串，想把这些字符串合并成一个大字符串。

largeString = ''.join(pieces)
largeString = '%s%s something %s yet more'%(smal11,small2, sma113)
largeString = sma111+sma112+'something' + smal13+'yet more'

import operator
largeString = reduce(operator.add, pieces, '')

将字符串逐字符或逐词反转

把字符串逐字符或逐词反转过来。

revchars = astring[::-1]

revwords = astring.split()
revwords.reverse()
revwords = ''.join(revwords)

revwords = ' '.join(astring.split()[::-1])
如果想逐词反转但又同时不改变原先的空格，可以用正则表达式来分隔原字符串：

import re
revwords = re.split(r'(\s+)', astring)
revwords.reverse()
revwords = ''.join(revwords)

或者也可以这样写:revwords = ''.join(re.split(r'(\s+)', astring)[::-1])

检查字符串是否包含某字符集合中的字符

def containsAny(seq, aset):
    for c in seq:
        if c in aset: return True
    return False

import itertools
def containsAny(seq, aset):
    for item in itertools.ifilter(aset.__contains__, seq):
        return True
    return False

码上的生活

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python文本处理

基本的文本操作：解析数据并将数据放入程序内部的结构将数据以某种方式转化为另一种相似的形式，数据本身发生了改变生成全新的数据在python中，可以用下列方式表现一个文本字符串'',""：'this is a literal string'Out[1]: 'this is a literal string'"this is a literal string"Out[2]: 'this is
复制链接

扫一扫

专栏目录