利用Python进行数据分析——字符串操作（10）

最新推荐文章于 2024-08-22 00:03:42 发布

bwqiang

最新推荐文章于 2024-08-22 00:03:42 发布

阅读量179

点赞数

分类专栏： Python

本文链接：https://blog.csdn.net/bwqiang/article/details/107861899

版权

Python 专栏收录该内容

32 篇文章 2 订阅

订阅专栏

字符串操作

字符串对象方法

In [124]: val
Out[124]: 'a, b,     guido'

In [125]: val.split(',')
Out[125]: ['a', ' b', '     guido']

# 根据','划分出来的子串，使用x.strip()去掉空格部分
In [126]: pieces  =[x.strip() for x in val.split(', ')]

In [127]: pieces
Out[127]: ['a', 'b', 'guido']

# 这些子字符串可以使用加法与两个冒号分隔符连接在一起：
In [143]: first, second, third = pieces

In [144]: first + '::' + second + '::' + third
Out[144]: 'a::b::guido'

# 在字符串’ : : ’的join方法中传入一个列表或元组是一种更快且更加Pythonic（Python风格化）的方法：

In [146]: pieces
Out[146]: ['a', 'b', 'guido']

In [147]: '::'.join(pieces)
Out[147]: 'a::b::guido'

# 使用Python的in关键字可以检验子字符串

In [148]: 'guido' in val
Out[148]: True
# index 和  find  也可以实现该功能
In [149]: val.index(',')
Out[149]: 1

In [150]: val.find('p')
Out[150]: -1

# index 在找不到的时候会抛出异常
In [151]: val.index('w')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-151-b08f5379f9bf> in <module>()
----> 1 val.index('w')

ValueError: substring not found


# count返回的是某个特定的子字符串在字符串中出现的次数：

In [152]: val.count(',')
Out[152]: 2

# replace函数做替换
In [153]: val.replace(', ', '::')
Out[153]: 'a::b::    guido'

正则表达式

# 描述一个或多个空白字符的正则表达式是\s+
In [154]: import re

In [155]: text = "foo    bar\tbaz \tqux"

In [156]: text
Out[156]: 'foo    bar\tbaz \tqux'

In [157]: re.split('\s+', text)
Out[157]: ['foo', 'bar', 'baz', 'qux']

考虑下一个可以识别大部分电子邮件地址的正则表达式：

In [158]: text = """Dave dave@google.com Steve steve@gmail.com Rob rob@gmail.com"""

In [159]: text
Out[159]: 'Dave dave@google.com Steve steve@gmail.com Rob rob@gmail.com'

In [160]: pattern = r'[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}'

 # re.IGNORECASE 可以使正则表达式不区分大小写
In [161]: regex  =re.compile(pattern, flags=re.IGNORECASE)

In [162]: regex.findall(text)
Out[162]: ['dave@google.com', 'steve@gmail.com', 'rob@gmail.com']