[转载] Python字符串操作方法详解

参考链接: Python字符串方法| 2(len,count,center,ljust,rjust,isalpha,isalnum,isspace和join)

最近处理近10万条数据,大量字符串需要处理,各种特殊符号,空格,导致各种隐秘BUG!所以写了这篇文章!深入学习下str操作函数! 

class str(basestring)  |  str(object='') -> string # 由对象返回一个string  |    |  Return a nice string representation of the object.  |  If the argument is a string, the return value is the same object.  |    |  Method resolution order:  |      str  |      basestring  |      object  

Methods defined here: 

 |  capitalize(...)  |      S.capitalize() -> string  

  |     

 # 返回一个首字母大写的字符串!  |      Return a copy of the string S with only its first character

  |      capitalized.                                 

 

>>> test = "abc" 

>>> test.capitalize()

'Abc' 

 

 |  center(...)   |      S.center(width[, fillchar]) -> string

  |      

# 返回一个把原字符串移到中间,默认两边添加空格的字符串,也可以自己指定填充物。

  |      Return S centered in a string of length width. Padding is done using the specified fill character (default is a space)

 

>>> a = "abcd"

>>> a.center(8)

'  abcd  '

>>> a.center(8,"*")

'**abcd**'

>>> a.center(3)

'abcd' # 小于字符串长度不会变 

 |  count(...)  |      S.count(sub[, start[, end]]) -> int

  |      

# 返回子字符串在S中出现的次数,可以指定起始位置 

     Return the number of non-overlapping occurrences of substring sub in string S[start:end].  Optional arguments start and end are interpreted as in slice notation. 

 |  decode(...)  |      S.decode([encoding[,errors]]) -> object

  |      

# 重要!

  |      Decodes S using the codec registered for encoding. encoding defaults to the default encoding. errors may be given to set a different error handling scheme. Default is 'strict' meaning that encoding errors raise a UnicodeDecodeError. Other possible values are 'ignore' and 'replace' as well as any other name registered with codecs.register_error that is able to handle UnicodeDecodeErrors.

 

 |  encode(...)  |      S.encode([encoding[,errors]]) -> object

  |      

# 重要!  |      Encodes S using the codec registered for encoding. encoding defaults to the default encoding. errors may be given to set a different error handling scheme. Default is 'strict' meaning that encoding errors raise a UnicodeEncodeError. Other possible values are 'ignore', 'replace' and 'xmlcharrefreplace' as well as any other name registered with codecs.register_error that is able to handle UnicodeEncodeErrors.

 

 |  endswith(...)  |      S.endswith(suffix[, start[, end]]) -> bool

  |    

  # 检查是否以suffix结尾,可以指定位置。做循环的判定条件很有用,免去==!

  |      Return True if S ends with the specified suffix, False otherwise. With optional start, test S beginning at that position. With optional end, stop comparing S at that position. suffix can also be a tuple of strings to try.

 

 |  expandtabs(...)  |      S.expandtabs([tabsize]) -> string

  |      

# 把字符串中的制表符tab转换为tabsize个空格,默认为8个字符  |      Return a copy of S where all tab characters are expanded using spaces. If tabsize is not given, a tab size of 8 characters is assumed.

 

 |  find(...)  |      S.find(sub [,start [,end]]) -> int

 

 |      # 在S中寻找sub,并可以指定起始位置,返回sub在S中的下标index,找不到返回-1  |      Return the lowest index in S where substring sub is found, such that sub is contained within S[start:end].  Optional arguments start and end are interpreted as in slice notation.  Return -1 on failure.

 

>>> a = "abcdabcd"

>>> a.find("b")

1

>>> a.find("b", 2, 7)

5

>>> a.find("b", 3, 7)

5

>>> a.find("b", 6, 7)

-1 

 |  format(...)   |      S.format(*args, **kwargs) -> string

  |      # 字符串的格式化输出!# 例子太多了,官方文档很多:

点击打开链接,我只给出我用过的例子。 

  Return a formatted version of S, using substitutions from args and kwargs.The substitutions are identified by braces ('{' and '}').【2014.06.28特别标注,format中转义用的是{}而不是\】 

 >>> '{:,}'.format(1234567890) # Using the comma as a thousands separator 

'1,234,567,890'

>>> 'Correct answers: {:.2%}'.format(19.5/22) # Expressing a percentage

'Correct answers: 88.64%'

>>> import datetime

>>> d = datetime.datetime(2010, 7, 4, 12, 15, 58)

>>> '{:%Y-%m-%d %H:%M:%S}'.format(d)

'2010-07-04 12:15:58'

>>> # 替代center等方法的功能

>>> '{:<30}'.format('left aligned')

'left aligned                  '

>>> '{:>30}'.format('right aligned')

'                 right aligned'

>>> '{:^30}'.format('centered')

'           centered           '

>>> '{:*^30}'.format('centered')  # use '*' as a fill char

'***********centered***********

>>> # Accessing arguments by position:

>>> '{0}, {1}, {2}'.format('a', 'b', 'c')

'a, b, c'

>>> '{}, {}, {}'.format('a', 'b', 'c')  # 2.7+ only

'a, b, c'

>>> '{2}, {1}, {0}'.format('a', 'b', 'c')

'c, b, a'

>>> '{2}, {1}, {0}'.format(*'abc')      # unpacking argument sequence

'c, b, a'

>>> '{0}{1}{0}'.format('abra', 'cad')   # arguments' indices can be repeated

'abracadabra' 

【更新于:2014.06.28。Python format()怎么输出{}的问题,请看下面代码,很奇葩,学的还很浅显啊! 

 

>>> print "hi {{}} {{key}}".format(key = "string")

hi {} {key} 

 

】 

 |  index(...)   |      S.index(sub [,start [,end]]) -> int

  |      

# 类似find寻找下标,找不到会报错,find找不到返回-1

  |      Like S.find() but raise ValueError when the substring is not found.

 

 |  isalnum(...)  |      S.isalnum() -> bool

  |      

# 判断S中是否全为数字或者字母【并至少有一个字符】,是则返回True。有中文或者符号或者没有字符返回False

  |      Return True if all characters in S are alphanumeric and there is at least one character in S, False otherwise.

 

>>> a = "adad1122"

>>> a.isalnum()

True

>>> a = "3123dddaw''[]"

>>> a.isalnum()

False

>>> a = "你好hello"

>>> a.isalnum()

False 

 |  isalpha(...)   |      S.isalpha() -> bool

  |      

# 判断是否全为字母【并至少有一个字符】 

 

 |      Return True if all characters in S are alphabetic  and there is at least one character in S, False otherwise.  

 |  isdigit(...)  |      S.isdigit() -> bool

  |      #  判断是否全为数字

【并至少有一个字符】

  |      Return True if all characters in S are digits and there is at least one character in S, False otherwise.

 

 |  islower(...)  |      S.islower() -> bool

  |      # 判断字母是否全为小写(有数字不影响)

【并至少有一个字符】

  |      Return True if all cased characters in S are lowercase and there is at least one cased character in S, False otherwise.

 

 |  isspace(...)  |      S.isspace() -> bool

  |      # 判断是否全为空白字符

【并至少有一个字符】

  |      Return True if all characters in S are whitespace and there is at least one character in S, False otherwise.

 

 |  istitle(...)  |      S.istitle() -> bool

  |      

# 判断S中

每个单词是否首字母大写,

并且后面字母都为小写!【并至少有一个字符】

# 很多bolg无限复制的都是错的。实践很重要!  |      Return True if S is a titlecased string and there is at least one character in S, i.e. uppercase characters may only follow uncased characters and lowercase characters only cased ones. Return False otherwise.

 

>>> a = "Abc"

>>> a.istitle()

True

>>> a = "aBc"

>>> a.istitle()

False

>>> a = "AbC"

>>> a.istitle()

False

>>> a = "Abc Cbc"

>>> a.istitle()

True

>>> a = "Abc cbc"

>>> a.istitle()

False

 

 |  isupper(...)  |      S.isupper() -> bool

  |      

# 判断字母是否是全大写(有数字不影响)【并至少有一个字符】 

  |      Return True if all cased characters in S are uppercase and there is at least one cased character in S, False otherwise.

 

 |  join(...)  |      S.join(iterable) -> string

  |     

 

# 经常使用!

把迭代器中的内容用S作为连接符连接起来!迭代器中内容必须也为子符串(以前没留意)!

  |      Return a string which is the concatenation of the strings in the iterable.  The separator between elements is S.

 

 |  ljust(...)   |      S.ljust(width[, fillchar]) -> string

  |      

#输出width个字符,S左对齐,不足部分用fillchar填充,默认的为空格。 

  |      Return S left-justified in a string of length width. Padding is done using the specified fill character (default is a space).

 

 |  lower(...)  |      S.lower() -> string

  |      

# 返回一个全部变为小写的字符串。

  |      Return a copy of the string S converted to lowercase.

 

 |  lstrip(...)  |      S.lstrip([chars]) -> string or unicode

  |      

# 去掉字符串左边的空格或者删除掉指定的chars(如果有的话)。  |      Return a copy of the string S with leading whitespace removed. If chars is given and not None, remove characters in chars instead. If chars is unicode, S will be converted to unicode before stripping

 

 |  partition(...)  |      S.partition(sep) -> (head, sep, tail)

 

 |      # 接受一个字符串参数,并返回一个3个元素的 tuple 对象。如果sep没出现在母串中,返回值是 (sep, ‘’, ‘’);否则,返回值的第一个元素是 sep 左端的部分,第二个元素是 sep 自身,第三个元素是 sep 右端的部分。  |      Search for the separator sep in S, and return the part before it, the separator itself, and the part after it.  If the separator is not found, return S and two empty strings.

 

 |  replace(...)  |      S.replace(old, new[, count]) -> string

  |      

# 替换!没有给定count时,默认替换所有字符串,如果给定了count,则只替换指定count个!

  |      Return a copy of string S with all occurrences of substring old replaced by new.  If the optional argument count is given, only the first count occurrences are replaced.

 

 

>>> a = "  213  213 1312  "

>>> a.replace(" ", "")

'2132131312'

>>> a.replace(" ", "", 3)

'213 213 1312  '

>>> a.replace(" ", "", 5)

'2132131312  '

>>> a

'  213  213 1312  ' 

【2014.05.22更新】 

初学时候一直觉得str方法操作不怎么合理。因为a.replace()操作过后a是不会变的。刚开始使用的时候很不习惯。现在想想这么设计是很合理的! 

为什么呢?在学习tuple和dict的时候大家会学习不可变的对象!其中就会说到str。Python这样设计的目的就是保证a不会改变!!!保证不可变对象自身永不可变。 

 |  rfind(...)  |      S.rfind(sub [,start [,end]]) -> int

  |     

 # 查找,返回最大的index,也可以指定位置(切片中)查找,找不到返回-1

  |      Return the highest index in S where substring sub is found, such that sub is contained within S[start:end].  Optional arguments start and end are interpreted as in slice notation.

  |      

  |      Return -1 on failure.

 

 |  rindex(...)  |      S.rindex(sub [,start [,end]]) -> int

 

 |      # 同rfind,没找到报错。  |      Like S.rfind() but raise ValueError when the substring is not found.

 

 |  rjust(...)  |      S.rjust(width[, fillchar]) -> string

  |      

#输出width个字符,S右对齐,不足部分用fillchar填充,默认的为空格。

  |      Return S right-justified in a string of length width. Padding is done using the specified fill character (default is a space)

 

 |  rpartition(...)  |      S.rpartition(sep) -> (head, sep, tail)

  |      

  |      Search for the separator sep in S, starting at the end of S, and return the part before it, the separator itself, and the part after it.  If the separator is not found, return two empty strings and S.

 

 |  rsplit(...)  |      S.rsplit([sep [,maxsplit]]) -> list of strings

 

 |      # 和split()相同,只不过从尾部开始分割  |      Return a list of the words in the string S, using sep as the delimiter string, starting at the end of the string and working  to the front.  If maxsplit is given, at most maxsplit splits are done. If sep is not specified or is None, any whitespace string is a separator.

 

 |  rstrip(...)  |      S.rstrip([chars]) -> string or unicode

  |      

# 去掉字符串s右变的空格或者指定的chars  |      Return a copy of the string S with trailing whitespace removed. If chars is given and not None, remove characters in chars instead. If chars is unicode, S will be converted to unicode before stripping

 

 |  split(...)  |      S.split([sep [,maxsplit]]) -> list of strings

  |      

# 经常使用!用sep作为标记把S切分为list(sep在S中),和join()配合使用。  |      Return a list of the words in the string S, using sep as the delimiter string.  If maxsplit is given, at most maxsplit splits are done. If sep is not specified or is None, any whitespace string is a separator and empty strings are removed from the result.

 

 |  splitlines(...)  |      S.splitlines(keepends=False) -> list of strings

  |      

  |      Return a list of the lines in S, breaking at line boundaries. Line breaks are not included in the resulting list unless keepends is given and true.

 

 |  startswith(...)  |      S.startswith(prefix[, start[, end]]) -> bool

  |    

  # 判断s是否以prefix开头,s的切片字符串是否以prefix开头  |      Return True if S starts with the specified prefix, False otherwise. With optional start, test S beginning at that position. With optional end, stop comparing S at that position. prefix can also be a tuple of strings to try.

 

 |  strip(...)  |      S.strip([chars]) -> string or unicode

  |     

 # 去掉字符串s两端的空格或者指定的chars  

 |      Return a copy of the string S with leading and trailing whitespace removed. If chars is given and not None, remove characters in chars instead. If chars is unicode, S will be converted to unicode before stripping 

【更新于2014.07.31:去掉指定的字符串不要使用replace("", "'),strip快!】  

 |  swapcase(...)  |      S.swapcase() -> string

  |    

  # 大小写互换

  |      Return a copy of the string S with uppercase characters converted to lowercase and vice versa.

 

 |  title(...)  |      S.title() -> string

  |      

# 返回一个每个单词首字母大写,其他小写的字符串。  |      Return a titlecased version of S, i.e. words start with uppercase characters, all remaining cased characters have lowercase.

 

 |  translate(...)  |      S.translate(table [,deletechars]) -> string

  |      

  |      Return a copy of the string S, where all characters occurring in the optional argument deletechars are removed, and the remaining characters have been mapped through the given  translation table, which must be a string of length 256 or None. If the table argument is None, no translation is applied and the operation simply removes the characters in deletechars.

 

 |  upper(...)  |      S.upper() -> string

  |      

# 小写字母变为大写

  |      Return a copy of the string S converted to uppercase.

 

 |  zfill(...)  |      S.zfill(width) -> string

 

 |      # zfill()左侧以字符0进行填充,在输出数值时常用!  |      Pad a numeric string S with zeros on the left, to fill a field of the specified width.  The string S is never truncated.

  |  

  |  ---------------------------------------------------------------------- 

 

 

str切片操作: 

 

 str[0:5] 截取第一位到第四位的字符 str[:] 截取字符串的全部字符 str[4:] 截取第五个字符到结尾 str[:-3] 截取从头开始到倒数第三个字符之前 str[2] 截取第三个字符 str[::-1] 创造一个与原字符串顺序相反的字符串  字符串的反转

  

 

 

  

 

decode&encode 

 

 暂时不更新

 

 

 不定期更新,转载请带上本文地址:

 http://blog.csdn.net/zhanh1218/article/details/21826239

  

 

 

   

   

  本文由@The_Third_Wave原创。不定期更新,有错误请指正。 

  Sina微博关注:@The_Third_Wave  

  如果这篇博文对您有帮助,为了好的网络环境,不建议转载,建议收藏!如果您一定要转载,请带上后缀和本文地址。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值