Numpy字符串数组总结

原创已于 2023-09-15 12:31:15 修改 · 5.6k 阅读

13 ·

CC 4.0 BY-SA版权

文章标签：

#numpy #python #数据分析 #字符串 #字符串数组

于 2022-11-21 09:08:46 首次发布

Numpy 专栏收录该内容

40 篇文章

订阅专栏

文章目录

- 字符串函数列表
- 函数说明

Numpy基础：数学计算🔥 逻辑运算

numpy中的char模块中，封装了一些处理字符串数组的函数

字符串函数列表

类别	方法
创建	array, asarray, chararray
运算	add, multiply
填充	center, ljust, rjust, zfill
大小写转换	lower, upper, capitalize, title, swapcase
去除	lstrip, rstrip, strip
替换	expandtabs, replace, translate
分割	lsplit, rsplit, split, splitlines
编解码	decode, encode
比较	equal, not_equal, greater, less greater_equal, less_equal
类别判断	isalpha, isalnum, isdecimal, isdigit, islower, isspace, isnumeric, istitle, isupper
首尾判断	endswith, startswith
统计	len, count
查找	find, index, rfind, rindex

这些函数与string中自带的那些函数有着高度的重合，其最大的区别是，string针对单个字符串进行操作，而numpy中封装的这些函数，均以字符串数组作为操作对象。对于string中存在的函数XX，numpy.char中的同名函数，相当于对字符串数组中的每个字符串，都调用一次string.XX。

字符串函数的名字也有一点规律，l和r前缀，分别表示从左执行还是从右执行；is前缀表示对字符串类别进行判断，返回值均为布尔型数组。

下面对单个函数进行简略的解读。

函数说明

array和asarray均为格式转换函数，将输入的字符串列表什么的转成字符串数组；chararray则可以指定shape来创建字符串数组，

a = npc.chararray(charar.shape, itemsize=5)
a[:] = 'abc'
print(a)
'''
[[b'abc' b'abc' b'abc']
 [b'abc' b'abc' b'abc']
 [b'abc' b'abc' b'abc']]
'''

add和multiply相当于数组版本的字符串拼接和重复。

用于填充的函数，以center为例，其调用形式为center(a, width[, fillchar])，其中width为填充宽度，fillchar为填充字符。zfill稍微特殊，其功能是用0填充数字字符串。

from numpy import char as npc
x = ["tiny", "cool"]
xc = npc.center(x,15)
print(xc)
# 输出为['      tiny     ' '      cool     ']

用于去除的函数，以strip为例，其调用形式为strip(a[, chars])，其中chars为待删除字符，默认为空格。正好刚才生成了前后为空格的字符串，可以体验一下strip的功能

>>> print(npc.strip(xc))
['tiny' 'cool']
>>> print(npc.lstrip(xc))
['tiny     ' 'cool     ']

用于分割的函数，功能是将字符串数组中的每个字符串，都拆分成字符串列表，相应地原来的字符串数组，也就变成了列表数组。除了splitlines对字符串按行划分，其余函数均可指定分隔符，以split为例，其调用形式为

>>> x = np.array(['abc','cde','def'])
>>> npc.split(x,'a')
array([list(['', 'bc']), list(['cde']), list(['def'])], dtype=object)