python画多个箱型图_在Python中连续数据的箱形图

I have a csv file with 2 columns:

col1- Timestamp data(yyyy-mm-dd hh:mm:ss.ms (8 months data))

col2 : Heat data (continuous variable) .

Since there are almost 50k record, I would like to partition the col1(timestamp col) into months or weeks and then apply box plot on the heat data w.r.t timestamp.

I tried in R,it takes a long time. Need help to do in Python. I think I need to use seaborn.boxplot.

Please guide.

解决方案

Group by Frequency then plot groups

import numpy as np

import Pandas as pd

from matplotlib import pyplot as plt

# assumes NO header line in csv

df = pd.read_csv('\file\path', names=['time','temp'], parse_dates=[0])

I will use some fake data, 30 days of hourly samples.

heat = np.random.random(24*30) * 100

dates = pd.date_range('1/1/2011', periods=24*30, freq='H')

df = pd.DataFrame({'time':dates,'temp':heat})

Set the timestamps as the DataFrame's index

df = df.set_index('time')

Now group by by the period you want, seven days for this example

gb = df.groupby(pd.Grouper(freq='7D'))

Now you can plot each group separately

for g, week in gb2:

#week.plot()

week.boxplot()

plt.title(f'Week Of {g.date()}')

plt.show()

plt.close()

And... I didn't realize you could do this but it is pretty cool

ax = gb.boxplot(subplots=False)

plt.setp(ax.xaxis.get_ticklabels(),rotation=30)

plt.show()

plt.close()

heat = np.random.random(24*300) * 100

dates = pd.date_range('1/1/2011', periods=24*300, freq='H')

df = pd.DataFrame({'time':dates,'temp':heat})

df = df.set_index('time')

To partition the data in five time periods then get weekly boxplots of each:

Determine the total timespan; divide by five; create a frequency alias; then groupby

dt = df.index[-1] - df.index[0]

dt = dt/5

alias = f'{dt.total_seconds()}S'

gb = df.groupby(pd.Grouper(freq=alias))

Each group is a DataFrame so iterate over the groups; create weekly groups from each and boxplot them.

for g,d_frame in gb:

gb_tmp = d_frame.groupby(pd.Grouper(freq='7D'))

ax = gb_tmp.boxplot(subplots=False)

plt.setp(ax.xaxis.get_ticklabels(),rotation=90)

plt.show()

plt.close()

There might be a better way to do this, if so I'll post it or maybe someone will fill free to edit this. Looks like this could lead to the last group not having a full set of data. ...

If you know that your data is periodic you can just use slices to split it up.

n = len(df) // 5

for tmp_df in (df[i:i+n] for i in range(0, len(df), n)):

gb_tmp = tmp_df.groupby(pd.Grouper(freq='7D'))

ax = gb_tmp.boxplot(subplots=False)

plt.setp(ax.xaxis.get_ticklabels(),rotation=90)

plt.show()

plt.close()

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值