Python-----Pandas基础知识

最新推荐文章于 2024-01-31 10:45:00 发布

coconut milk?

最新推荐文章于 2024-01-31 10:45:00 发布

阅读量164

点赞数

本文链接：https://blog.csdn.net/supersusususu/article/details/107224353

版权

索引

1.查找是否有重复索引：

s = pd.Series(np.random.randn(5),index = list('abcda'))

Ans=s.index.is_unique

2.处理重复索引：

求第一项：

s.groupby(s.index).first()

求和：

s.groupby(s.index).sum()

3.多级索引：把三维数据用二维来表达

a = [['a','a','a','b','b','c','c',],[1,2,3,1,2,2,3]]
t = list(zip(*a))

print (t)

index = pd.MultiIndex.from_tuples(t,names=['level1','level2'])
print(index)

s = pd.Series(np.random.rand(7),index=index)
print(s)

4.索引的交换

df2 = df.swaplevel('row-1','row-2')

5.索引的排序

df2.sortlevel(1)

6.把行/列设置成索引以及返回：

df.set_index('c')

df.reset_index()

分组运算

拆分-->应用-->合并：

df.['data1'].groupby(key).mean()

mapping = {'a':'red','b'='red','c'='blue','d'='orange','e'='blue'}
grouped = df.groupby(mapping,axis=1)

用函数进行分组：

通过函数返回值进行分组

def _group_key(idx):
    print idx
    return idx
df.groupby(_group_key)

聚合函数

内置聚合函数：

df.groupby('key1').describe()

自定义聚合函数：

def peak_range(s):
    print type(s)
    return s,max()-s.min()
grouped.agg(peak_range)

读取和写入文件

1.不规则的分隔符：

pd.read_table('ch04/ex3.csv', sep='\s+')

2.缺失值的处理：

pd.read_table('ch04/ex5.csv', na_values=['NA','NULL','foo'])

3.分块读取：

tr = pd.red_csv('ch04/ex6.csv',chunksize=1000)
result = pd.Series([])
for chunk in tr:
    result = result.add(chunk['key'].value_counts(),fill_value=0)

4.数据写入（不写入索引值）

df.to_csv('ch04/ex_out.csv',index =False)

时间序列

时间戳的时间序列转变为时间期的时间序列：

s= pd.Series(np.random.randn(5),index=pd.date_range('2020-07-09',period=5,freq='M'))

s.to_period

注意时间期转换为时间戳，具体日期会丢失

时间重采样：

ts.resample('5min',how='sum')

coconut milk?

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Python-----Pandas基础知识

索引1.查找是否有重复索引：s = pd.Series(np.random.randn(5),index = list('abcda'))Ans=s.index.is_unique2.处理重复索引：求第一项：s.groupby(s.index).first()求和：s.groupby(s.index).sum()3.多级索引：把三维数据用二维来表达a = [['a','a','a','b','b','c','c',],[1,2,3,1,2,2,3]]t =
复制链接

扫一扫