pandas处理字符串1

最新推荐文章于 2024-06-23 08:40:09 发布

alokalalala

最新推荐文章于 2024-06-23 08:40:09 发布

阅读量1.1w

点赞数 4

分类专栏：数据分析文章标签： pandas 序列字符串

本文链接：https://blog.csdn.net/haiyu94/article/details/75575163

版权

数据分析专栏收录该内容

2 篇文章 0 订阅

订阅专栏

pandas官网对序列字符串操作的说明
这篇博客主要是对一些基本的函数进行了一些实践和说明，如有不当，还希望大家指出来。

1.Series.str.capitalize()

将序列索引中的字符串或索引转成大写
返回:转换序列后的对象或索引

示例：

>>> import pandas as pd
>>> df = {'a':[1,2,3],'b':[4,5,6]}
>>> df = pd.DataFrame(df)
>>> print(df)
   a  b
0  1  4
1  2  5
2  3  6
>>> df.index=['f','g','h']
>>> df
   a  b
f  1  4
g  2  5
h  3  6
>>> df.index.str.capitalize()
Index(['F', 'G', 'H'], dtype='object')
>>> df
   a  b
f  1  4
g  2  5
h  3  6

2.Series.str.cat(others=None, sep=None, na_rep=None)

用给定的字符链接序列或索引中的字符串
返回：合并后的序列
注：当na_rep为None，序列中的nan值将被忽略，如果指定，将用该字符代替

示例：

>>> df.index.str.cat(['1','2','3'],sep = ',')
Index(['f,1', 'g,2', 'h,3'], dtype='object')
>>> df.index.str.cat([['1','2','3'],['4','5','6']],sep='*')
Index(['f*1*4', 'g*2*5', 'h*3*6'], dtype='object')
>>> df.index.str.cat(sep='*')
'f*g*h'
>>>

3.Series.str.center(width, fillchar=’ ‘)

用一个另外的字符对序列或者索引的左边和右边填充指定长度的字符
width：填充后的总长度
返回：填充后的序列或者索引对象

示例：

>>> df['hh'] = ['d','f','g']
>>> df
   a  b hh
f  1  4  d
g  2  5  f
h  3  6  g
>>> df['hh'].str.center(width = 3,fillchar='&')
f    &d&
g    &f&
h    &g&
Name: hh, dtype: object

4.Series.str.contains(pat, case=True, flags=0, na=nan, regex=True)

判断给定的字符串或者正则表达式是否在序列或者索引中
返回：bool

示例：

>>> df
   a  b hh
f  1  4  d
g  2  5  f
h  3  6  g
>>> df['hh'].str.contains('f')
f    False
g     True
h    False
Name: hh, dtype: bool

5.Series.str.count(pat, flags=0, **kwargs)

计算pat在序列或者索引字符串中出现的次数

示例：

>>> df['count']=['hello','hello','hel']
>>> df
   a  b hh  count
f  1  4  d  hello
g  2  5  f  hello
h  3  6  g    hel
>>> df['count'].str.count('hel')
f    1
g    1
h    1
Name: count, dtype: int64
>>> df['count'].str.count('l')
f    2
g    2
h    1
Name: count, dtype: int64

6.Series.str.decode(encoding, errors=’strict’)

对指定的编码方式进行解码（‘utf-8’,'gbk'等等）

7.Series.str.encode(encoding, errors=’strict’)

编码

8.Series.str.endswith(pat, na=nan)

判断是否已给定的pat结尾

示例：

>>> df['count'].str.endswith('lo')
f     True
g     True
h    False
Name: count, dtype: bool

9.Series.str.extract(pat, flags=0, expand=None)

对给定的正则表达式进行提取
expand : bool, default False
    If True, return DataFrame.
    If False, return Series/Index/DataFrame.

示例：
>>> s = Series(['a1', 'b2', 'c3'])
>>> s.str.extract('([ab])(\d)')
     0    1
0    a    1
1    b    2
2  NaN  NaN
>>> s.str.extract('([ab])?(\d)')
     0  1
0    a  1
1    b  2
2  NaN  3
>>> s.str.extract('[ab](\d)', expand=True)
     0
0    1
1    2
2  NaN
>>> s.str.extract('[ab](\d)', expand=False)
0      1
1      2
2    NaN
dtype: object

10.Series.str.extractall(pat, flags=0)

对于本序列中的每一个字符串，从正则表达式的所有匹配中提取组。

示例：
>>> s= pd.Series(["a1a2", "b1", "c1"], index=["A", "B", "C"])
>>> s
A    a1a2
B      b1
C      c1
dtype: object
>>> s.str.extractall('[ab](\d)')
         0
  match   
A 0      1
  1      2
B 0      1
>>> s.str.extractall("[ab](?P<digit>\d)")
        digit
  match      
A 0         1
  1         2
B 0         1
>>> s.str.extractall("(?P<letter>[ab])(?P<digit>\d)")
        letter digit
  match             
A 0          a     1
  1          a     2
B 0          b     1
>>> s.str.extractall("(?P<letter>[ab])?(?P<digit>\d)")
        letter digit
  match             
A 0          a     1
  1          a     2
B 0          b     1
C 0        NaN     1