pandas数据操作

pandas数据操作

字符串方法

Series对象在其str属性中配备了一组字符串处理方法,可以很容易的应用到数组中的每个元素

import numpy as np
import pandas as pd

t = pd.Series(['a_b_c_d','c_d_e',np.nan,'f_g_h'])
t
0    a_b_c_d
1      c_d_e
2        NaN
3      f_g_h
dtype: object
t.str.cat(['A','B','C','D'],sep=',') # 拼接字符串
0    a_b_c_d,A
1      c_d_e,B
2          NaN
3      f_g_h,D
dtype: object
t.str.split('_') # 切分字符串
0    [a, b, c, d]
1       [c, d, e]
2             NaN
3       [f, g, h]
dtype: object
t.str.get(0) # 获取指定位置的字符串
0      a
1      c
2    NaN
3      f
dtype: object
t.str.replace("_", ".") # 替换字符串
0    a.b.c.d
1      c.d.e
2        NaN
3      f.g.h
dtype: object
t.str.pad(10, fillchar="?") # 左补齐
0    ???a_b_c_d
1    ?????c_d_e
2           NaN
3    ?????f_g_h
dtype: object
t.str.pad(10, side="right", fillchar="?") # 右补齐
0    a_b_c_d???
1    c_d_e?????
2           NaN
3    f_g_h?????
dtype: object
t.str.center(10, fillchar="?") #中间补齐
0    ?a_b_c_d??
1    ??c_d_e???
2           NaN
3    ??f_g_h???
dtype: object
t.str.find('d') # 查找给定字符串的位置,左边开始
0    6.0
1    2.0
2    NaN
3   -1.0
dtype: float64
t.str.rfind('d') # 查找给定字符串的位置,右边开始
0    6.0
1    2.0
2    NaN
3   -1.0
dtype: float64

数据转置(行列转换)

dates = pd.date_range('20130101',periods=10)
dates
DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
               '2013-01-05', '2013-01-06', '2013-01-07', '2013-01-08',
               '2013-01-09', '2013-01-10'],
              dtype='datetime64[ns]', freq='D')
df = pd.DataFrame(np.random.randn(10,4),index=dates,columns=['A','B','C','D'])
df.head()
ABCD
2013-01-01-0.6651730.5168130.745156-0.303295
2013-01-02-0.9535742.1251470.238382-0.400209
2013-01-03-0.2339662.0666620.331000-2.802471
2013-01-042.0382730.982127-1.096000-1.051818
2013-01-05-1.438657-1.208042-0.3756730.384522
df.head().T # 行列转换
2013-01-01 00:00:002013-01-02 00:00:002013-01-03 00:00:002013-01-04 00:00:002013-01-05 00:00:00
A-0.665173-0.953574-0.2339662.038273-1.438657
B0.5168132.1251472.0666620.982127-1.208042
C0.7451560.2383820.331000-1.096000-0.375673
D-0.303295-0.400209-2.802471-1.0518180.384522

对数据应用function

df.head().apply(np.cumsum) # cumsum 累加
ABCD
2013-01-01-0.6651730.5168130.745156-0.303295
2013-01-02-1.6187472.6419600.983537-0.703504
2013-01-03-1.8527134.7086221.314537-3.505975
2013-01-040.1855605.6907490.218537-4.557793
2013-01-05-1.2530984.482707-0.157135-4.173271

频率

计算值出现的次数,类似直方图

s = pd.Series(np.random.randint(0, 7, size=10))
s
0    3
1    3
2    1
3    6
4    3
5    3
6    5
7    2
8    1
9    0
dtype: int32
s.value_counts()
3    4
1    2
6    1
5    1
2    1
0    1
dtype: int64
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值