合并不同时间频率的数据 pandas join实现

Lupin123123

已于 2022-04-18 08:06:46 修改

阅读量1.8k

点赞数 1

分类专栏： data science 文章标签： pandas 数据科学大数据 python

于 2022-04-18 01:14:16 首次发布

本文链接：https://blog.csdn.net/qq_28972011/article/details/124240180

版权

data science 专栏收录该内容

8 篇文章 0 订阅

订阅专栏

import pandas as pd
import numpy as np
import datetime

GitHub代码链接

先给数据都加上以月为单位的时间index

times = pd.date_range(periods=165, freq='M', end='2020/5')

times = times.to_list()
times.reverse()

times =pd.to_datetime(times)
times = times.strftime('%Y-%m')

df = pd.read_excel(r'data by month.xlsx')
df = df.set_index(times)

df.to_excel(r'./1.xlsx')

对于合并的基准df，需要新增加一列作为排序的标准

如果不这样的话，join后的结果顺序会出问题

df1 = pd.read_excel(r'./data by week.xlsx', header=1, usecols=range(0,4))

tmp = pd.to_datetime(df1["时间(该日期所在“周”的煤炭价格)年/月/日"])
tmp = tmp.dt.strftime('%Y-%m-%d')

df1["时间(该日期所在“周”的煤炭价格)年/月/日"]= pd.to_datetime(df1["时间(该日期所在“周”的煤炭价格)年/月/日"])
df1["时间(该日期所在“周”的煤炭价格)年/月/日"] = df1["时间(该日期所在“周”的煤炭价格)年/月/日"].dt.strftime('%Y-%m')
df1['num'] = pd.Series(data=np.arange(0,len(df1)), index=df1.index)
df1.set_index("时间(该日期所在“周”的煤炭价格)年/月/日", inplace=True)

df.head(3)

	山西	内蒙古	陕西	发电量
2020-04	8595.4	8527.6	5600.6	190.1
2020-03	9479.4	8906.1	5710.2	201.1
2020-02	7596.0	8422.0	5490.0	225.0

df1.head(3)

	价格低值(元/吨)	价格高值(元/吨)	价格平均值(元/吨)	num
时间(该日期所在“周”的煤炭价格)年/月/日
2020-04	470	480	475.0	0
2020-04	475	485	480.0	1
2020-04	490	495	492.5	2

利用df.join()合并，之后排序

result = df1.join(df)
result.sort_values(by='num', ascending=True, inplace=True)
result.drop(columns='num', axis=1, inplace=True)
result.set_index(tmp, inplace=True)
result.to_excel(r'./2.xlsx')
result.head()

	价格低值(元/吨)	价格高值(元/吨)	价格平均值(元/吨)	山西	内蒙古	陕西	发电量
时间(该日期所在“周”的煤炭价格)年/月/日
2020-04-30	470	480	475.0	8595.4	8527.6	5600.6	190.1
2020-04-24	475	485	480.0	8595.4	8527.6	5600.6	190.1
2020-04-17	490	495	492.5	8595.4	8527.6	5600.6	190.1
2020-04-10	505	515	510.0	8595.4	8527.6	5600.6	190.1
2020-04-03	530	535	532.5	8595.4	8527.6	5600.6	190.1

Lupin123123

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
合并不同时间频率的数据 pandas join实现

import pandas as pdimport numpy as npimport datetimeGitHub代码链接先给数据都加上以月为单位的时间indextimes = pd.date_range(periods=165, freq='M', end='2020/5')times = times.to_list()times.reverse()times =pd.to_datetime(times)times = times.strftime('%Y-%m')df =
复制链接

扫一扫