pandas 用法整理

最新推荐文章于 2024-07-13 11:04:09 发布

hhggggghhh

最新推荐文章于 2024-07-13 11:04:09 发布

阅读量553

点赞数

分类专栏： python

本文链接：https://blog.csdn.net/weixin_35834894/article/details/108930322

版权

python 专栏收录该内容

61 篇文章 4 订阅

订阅专栏

本文详细介绍了Pandas库在日期处理、采样、分组聚合、数据更改和图表绘制等方面的应用。包括如何将时间戳转换为本地日期，设置日期索引，创建日期列表，进行降采样和上采样操作。此外，还展示了如何利用Pandas进行数据分组，如单列和多列分组，并进行聚合操作。同时，文章提到了如何修改数据框的列类型，填充缺失值，以及如何通过merge函数进行数据合并。

摘要由CSDN通过智能技术生成

pandas 用法整理

参考资料

pandas中文

日期

把时间戳变为本地日期

pd.to_datetime(list(data['timestamp']), unit='s', utc=True).tz_convert('Asia/Shanghai').strftime("%Y-%m-%d %H:%M:%S")

把索引改为日期索引

pd.DatetimeIndex(data.index为日期)

制作日期列表

pd.date_range(data['timestamp'].min(),data['timestamp'].max(),freq='T')

采样使用

csdn相关资料

降采样

data.resample('H').mean()

如果想要某个间隔取数值，pandas自身带的有这种方法，还有一种就是用.loc，来进行，但是明显我，下边的这种方法比较好用

data.asfreq('30T').head()
	timestamp	value
2016-09-24 00:00:00	1474646400	0.254939
2016-09-24 00:30:00	1474648200	0.233063
2016-09-24 01:00:00	1474650000	0.177441
2016-09-24 01:30:00	1474651800	0.118127
2016-09-24 02:00:00	1474653600	0.097237

data.head()
timestamp	value
2016-09-24 00:00:00	1474646400	0.254939
2016-09-24 00:05:00	1474646700	0.247444
2016-09-24 00:10:00	1474647000	0.238677
2016-09-24 00:15:00	1474647300	0.240315
2016-09-24 00:20:00	1474647600	0.218478

上采样

画图

调整子图间距以及子图大小，子图大小是根据父图大小进行调整的

plt.subplots_adjust(wspace =0, hspace =3)#调整子图间距
plt.figure(figsize=(20, 10))
plt.subplot(311)
plt.title('every hour will get a data mean point')
data.resample('H').mean()['value'].plot()
plt.subplot(312)
plt.title('every hour will get a data point')
data.loc[data.index[::60],'value'].plot()
plt.subplot(313)
plt.title('raw data')
data['value'].plot()

分组，聚合

分组

import pandas as pd
 
df = pd.DataFrame({'Country':['China','China', 'India', 'India', 'America', 'Japan', 'China', 'India'], 
                   'Income':[10000, 10000, 5000, 5002, 40000, 50000, 8000, 5000],
                    'Age':[5000, 4321, 1234, 4010, 250, 250, 4500, 4321]})

 Age  Country  Income
0  5000    China   10000
1  4321    China   10000
2  1234    India    5000
3  4010    India    5002
4   250  America   40000
5   250    Japan   50000
6  4500    China    8000
7  4321    India    5000

单列分组

df_gb = df.groupby('Country')
for index, data in df_gb:
    print(index)
    print(data)
输出
America
   Age  Country  Income
4  250  America   40000
China
    Age Country  Income
0  5000   China   10000
1  4321   China   10000
6  4500   China    8000
India
    Age Country  Income
2  1234   India    5000
3  4010   India    5002
7  4321   India    5000
Japan
   Age Country  Income
5  250   Japan   50000

多列分组

df_gb = df.groupby(['Country', 'Income'])
for (index1, index2), data in df_gb:
    print((index1, index2))
    print(data)
 
输出
 
('America', 40000)
   Age  Country  Income
4  250  America   40000
('China', 8000)
    Age Country  Income
6  4500   China    8000
('China', 10000)
    Age Country  Income
0  5000   China   10000
1  4321   China   10000
('India', 5000)
    Age Country  Income
2  1234   India    5000
7  4321   India    5000
('India', 5002)
    Age Country  Income
3  4010   India    5002
('Japan', 50000)
   Age Country  Income
5  250   Japan   50000

聚合

df_agg = df.groupby('Country').agg(['min', 'mean', 'max'])
print(df_agg)
输出
   Age                    Income                     
          min         mean   max    min          mean    max
Country                                                     
America   250   250.000000   250  40000  40000.000000  40000
China    4321  4607.000000  5000   8000   9333.333333  10000
India    1234  3188.333333  4321   5000   5000.666667   5002
Japan     250   250.000000   250  50000  50000.000000  50000

num_agg = {'Age':['min', 'mean', 'max'], 'Income':['min', 'max']}
print(df.groupby('Country').agg(num_agg))
输出
      Age                    Income       
          min         mean   max    min    max
Country                                       
America   250   250.000000   250  40000  40000
China    4321  4607.000000  5000   8000  10000
India    1234  3188.333333  4321   5000   5002
Japan     250   250.000000   250  50000  50000

num_agg = {'Age':['min', 'mean', 'max']}
print(df.groupby('Country').agg(num_agg))
输出
  Age                   
          min         mean   max
Country                         
America   250   250.000000   250
China    4321  4607.000000  5000
India    1234  3188.333333  4321
Japan     250   250.000000   250

合并

merge

data1.merge(data,left_on='timestamp',right_on='timestamp',how='left')

对表数据进行更改

更改表某列数据类型

import pandas as pd
import matplotlib.pyplot as plt
import read_data1

data_read=read_data1.ReadData()

data=data_read.read_data(10427,'2019-07-25 12:00:00','2019-09-20 15:00:00')
print(data.shape)
data['value']=data['value'].astype('float64')
print(data.dtypes)

对某列数据进行差值填充

%matplotlib inline
data2['value'].plot()
plt.show()

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-0ZOap1R3-1601888341418)(./1568703059438.png)]

data2['value'].interpolate().plot()
plt.show()

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-vZ4XWuam-1601888341420)(./1568703105420.png)]

差值填充还能够根据时间进行填充

hhggggghhh

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
pandas 用法整理

pandas 用法整理文章目录pandas 用法整理参考资料日期把时间戳变为本地日期把索引改为日期索引制作日期列表采样使用降采样上采样画图分组，聚合分组单列分组多列分组聚合合并merge对表数据进行更改更改表某列数据类型对某列数据进行差值填充参考资料pandas中文日期把时间戳变为本地日期pd.to_datetime(list(data['timestamp']), unit='s', utc=True).tz_convert('Asia/Shanghai').strftime("%Y-%m-%
复制链接

扫一扫