Forecast at energy(Smart meters in London)

在这里插入图片描述

To better follow the energy consumption, the government wants energy suppliers to install smart meters in every home in England, Wales and Scotland. There are more than 26 million homes for the energy suppliers to get to, with the goal of every home having a smart meter by 2020.
This roll out of meter is lead by the European Union who asked all member governments to look at smart meters as part of measures to upgrade our energy supply and tackle climate change. After an initial study, the British government decided to adopt smart meters as part of their plan to update our ageing energy system.
In this dataset, you will find a refactorised version of the data from the London data store, that contains the energy consumption readings for a sample of 5,567 London Households that took part in the UK Power Networks led Low Carbon London project between November 2011 and February 2014. The data from the smart meters seems associated only to the electrical consumption.


目标: 综合考虑气候、时间、季节、节日以及需求响应等因素,实现负荷预测功能。

数据源: https://www.kaggle.com/jeanmidev/smart-meters-in-london

处理流程:

  1. 将所有数据进行合并
  2. 根据每户每天的能源消耗数据,对不一致的住户统计数据进行规范化处理
  3. 探索天气状况等因素和能源消耗之间的关系
  4. 将英国假日数据添加到日水平数据中作为指标
  5. 拟合SARIMAX模型
  6. 拟合LSTM模型

目录结构:
在这里插入图片描述

环境:

Keras==2.0.2
TensorFlow==1.15.5
scikit-learn==0.24.1


数据整合

def first():
    for num in range(0,112):
        df = pd.read_csv("data/daily_dataset/block_"+str(num)+".csv")
        df = df[['day','LCLid','energy_sum']]
        df.reset_index()
        df.to_csv("data/hc/hc_"+str(num)+".csv")

    fout= open("data/energy.csv","a")
    for line in open("data/hc/hc_0.csv"):
        fout.write(line)

    ###TODO 能源数据
    # 预测未来的能源需求,因此只取能源总量,即给定家庭每天的总能源使用量。
    for num in range(0,112):
        f = open("data/hc/hc_"+str(num)+".csv")
        f.readline() # skip the header
        for line in f:
             fout.write(line)
        f.close()
    fout.close()
first()

各个家庭的数据收集是不同的,因此我们将使用“每个家庭的能源”作为预测的目标,而不是仅仅使用能源。
然而有相当多的独特的家庭,所以需要多次出来,我们的最终目标是预测整体消费预测,而不是在家庭水平。

energy = pd.read_csv('data/energy.csv')
housecount = energy.groupby('day')[['LCLid']].nunique()
#print(housecount.head(4))
energy = energy.groupby('day')[['energy_sum']].sum()
energy = energy.merge(housecount, on = ['day'])
energy = energy.reset_index()
energy.count()
energy.day = pd.to_datetime(energy.day,format='%Y-%m-%d').dt.date
energy['avg_energy'] = energy['energy_sum']/energy['LCLid']
#print(energy.describe())

在这里插入图片描述

天气信息

weather = pd.read_csv('data/weather_daily_darksky.csv')
print(weather.head(4))

在这里插入图片描述

# 每日级别的天气信息是使用数据集中的 darksky api
weather['day']=  pd.to_datetime(weather['time']) # day is given as timestamp
weather['day']=  pd.to_datetime(weather['day'],format='%Y%m%d').dt.date
# 选择数字变量
weather = weather[['temperatureMax', 'windBearing', 'dewPoint', 'cloudCover', 'windSpeed',
       'pressure', 'apparentTemperatureHigh', 'visibility', 'humidity',
       'apparentTemperatureLow', 'apparentTemperatureMax', 'uvIndex',
       'temperatureLow', 'temperatureMin', 'temperatureHigh',
       'apparentTemperatureMin', 'moonPhase','day']]
weather = weather.dropna()
### 天气状况与用电量的关系
weather_energy = energy.merge(weather,on='day')
#print(weather_energy.head(2))

在这里插入图片描述

#***1.  温度 ***
# 我们可以看到能量和温度有一个反比关系
# 在低温时,很可能通过加热器等增加能源消耗。
plt.rcParams['font.sans-serif'] = ['SimHei'] # 步骤一(替换sans-serif字体)
plt.rcParams['axes.unicode_minus'] = False   # 步骤二(解决坐标轴负数的负号显示问题)

def drow1():
    fig, ax1 = plt.subplots(figsize = (20,5))
    ax1.plot(weather_energy.day, weather_energy.temperatureMax, color = 'tab:orange')
    ax1.plot(weather_energy.day, weather_energy.temperatureMin, color = 'tab:pink')
    ax1.set_ylabel('Temperature')
    ax1.legend()
    ax2 = ax1.twinx()
    ax2.plot(weather_energy.day,weather_energy.avg_energy,color = 'tab:blue')
    ax2.set_ylabel('Average Energy/Household',color = 'tab:blue')
    ax2.legend(bbox_to_anchor=(0.0, 1.02, 1.0, 0.102))
    plt.title('能耗和温度')
    fig.tight_layout()
    plt.show()

在这里插入图片描述

#***2.  湿度 ***
# 湿度和平均能耗似乎有相同的趋势。
def drow2():
    fig, ax1 = plt.subplots(figsize = (20,5))
    ax1.plot(weather_energy.day, weather_energy.humidity, color = 'tab:orange')
    ax1.set_ylabel('Humidity',color = 'tab:orange')
    ax2 = ax1.twinx()
    ax2.plot(weather_energy.day,weather_energy.avg_energy,color = 'tab:blue')
    ax2.set_ylabel('Average Energy/Household',color = 'tab:blue')
    plt.title('能耗和湿度')
    fig.tight_layout()
    plt.show()

在这里插入图片描述

#***3.云层值  Cloud Cover***
def drow3():
    fig, ax1 = plt.subplots(figsize = (20,5))
    ax1.plot(weather_energy.day, weather_energy.cloudCover, color = 'tab:orange')
    ax1.set_ylabel('Cloud Cover',color = 'tab:orange')
    ax2 = ax1.twinx()
    ax2.plot(weather_energy.day,weather_energy.avg_energy,color = 'tab:blue')
    ax2.set_ylabel('Average Energy/Household',color = 'tab:blue')
    plt.title('Energy Consumption and Cloud Cover')
    fig.tight_layout()
    plt.show()

在这里插入图片描述

#***4.能见度 Visibility***
#> 能见度因素似乎不会影响能源消耗,
# 因为能见度很可能是一个户外因素,它的增加或减少不太可能影响家庭内部的能源消耗。
def drow4():
    fig, ax1 = plt.subplots(figsize = (20,5))
    ax1.plot(weather_energy.day, weather_energy.visibility, color = 'tab:orange')
    ax1.set_ylabel('Visibility',color = 'tab:orange')
    ax2 = ax1.twinx()
    ax2.plot(weather_energy.day,weather_energy.avg_energy,color = 'tab:blue')
    ax2.set_ylabel('Average Energy/Household',color = 'tab:blue')
    plt.title('Energy Consumption and Visibility')
    fig.tight_layout()
    plt.show()

在这里插入图片描述

#***5. 风速 Wind Speed***
#>  像能见度一样,风速似乎是一个户外因素,并不影响能源消耗。
def drow5():
    fig, ax1 = plt.subplots(figsize = (20,5))
    ax1.plot(weather_energy.day, weather_energy.windSpeed, color = 'tab:orange')
    ax1.set_ylabel('Wind Speed',color = 'tab:orange')
    ax2 = ax1.twinx()
    ax2.plot(weather_energy.day,weather_energy.avg_energy,color = 'tab:blue')
    ax2.set_ylabel('Average Energy/Household',color = 'tab:blue')
    plt.title('Energy Consumption and Wind Speed')
    fig.tight_layout()
    plt.show()

在这里插入图片描述


#***6. UV Index 紫外线指数***
#> 紫外线指数与能源消耗成反比关系
def drow6():
    fig, ax1 = plt.subplots(figsize=(20, 5))
    ax1.plot(weather_energy.day, weather_energy.uvIndex, color='tab:orange')
    ax1.set_ylabel('UV Index', color='tab:orange')
    ax2 = ax1.twinx()
    ax2.plot(weather_energy.day, weather_energy.avg_energy, color='tab:blue')
    ax2.set_ylabel('Average Energy/Household', color='tab:blue')
    plt.title('Energy Consumption and UV Index')
    fig.tight_layout()
    plt.show()

在这里插入图片描述

#***7. dewPoint 露点***
#> 露点是湿度和温度的函数,因此它有类似的关系。
def drow7():
    fig, ax1 = plt.subplots(figsize = (20,5))
    ax1.plot(weather_energy.day, weather_energy.dewPoint, color = 'tab:orange')
    ax1.set_ylabel('Dew Point',color = 'tab:orange')
    ax2 = ax1.twinx()
    ax2.plot(weather_energy.day,weather_energy.avg_energy,color = 'tab:blue')
    ax2.set_ylabel('Average Energy/Household',color = 'tab:blue')
    plt.title('Energy Consumption and Dew Point')
    fig.tight_layout()
    plt.show()

在这里插入图片描述

天气变量和能源消耗之间的相关性:

  • 能量与湿度高度正相关,与温度高度负相关。
  • 露点、紫外线指数显示与温度多重共线性,故弃用
  • 云层和能见度显示与湿度多重共线性,故弃用
  • 压力和月相与能量的相关性最小,故弃用
  • 风速与能量相关性较低

聚类分析

因为天气信息有很多变量,但不是所有的变量都有用。
通过聚类分析,是否可以基于温度、降水等颗粒天气数据定义一天的天气。

# scaling
scaler = MinMaxScaler()
weather_scaled = scaler.fit_transform(weather_energy[['temperatureMax','humidity','windSpeed']])

# optimum K
Nc = range(1, 20)
kmeans = [KMeans(n_clusters=i) for i in Nc]
score = [kmeans[i].fit(weather_scaled).score(weather_scaled) for i in range(len(kmeans))]
# 肘部曲线
def drow8():
    plt.plot(Nc,score)
    plt.xlabel('Number of Clusters')
    plt.ylabel('Score')
    plt.title('Elbow Curve')
    plt.show()

在这里插入图片描述

kmeans = KMeans(n_clusters=3, max_iter=600, algorithm = 'auto')
kmeans.fit(weather_scaled)
weather_energy['weather_cluster'] = kmeans.labels_

# 天气变量的关系
def drow9():
    plt.figure(figsize=(20,5))
    plt.subplot(1, 3, 1)
    plt.scatter(weather_energy.weather_cluster,weather_energy.temperatureMax)
    plt.title('Weather Cluster vs. Temperature')
    plt.subplot(1, 3, 2)
    plt.scatter(weather_energy.weather_cluster,weather_energy.humidity)
    plt.title('Weather Cluster vs. Humidity')
    plt.subplot(1, 3, 3)
    plt.scatter(weather_energy.weather_cluster,weather_energy.windSpeed)
    plt.title('Weather Cluster vs. WindSpeed')
    plt.show()

在这里插入图片描述

def drow10():
    fig, ax1 = plt.subplots(figsize = (10,7))
    ax1.scatter(weather_energy.temperatureMax,
                weather_energy.humidity,
                s = weather_energy.windSpeed*10,
                c = weather_energy.weather_cluster)
    ax1.set_xlabel('Temperature')
    ax1.set_ylabel('Humidity')
    plt.show()

在这里插入图片描述

ARIMAX

### UK Bank Holidays
holiday = pd.read_csv('data/uk_bank_holidays.csv')
holiday['Bank holidays'] = pd.to_datetime(holiday['Bank holidays'],format='%Y-%m-%d').dt.date
#holiday.head(4)
weather_energy = weather_energy.merge(holiday, left_on = 'day',right_on = 'Bank holidays',how = 'left')
weather_energy['holiday_ind'] = np.where(weather_energy['Bank holidays'].isna(),0,1)
### ARIMAX
weather_energy['Year'] = pd.DatetimeIndex(weather_energy['day']).year
weather_energy['Month'] = pd.DatetimeIndex(weather_energy['day']).month
weather_energy.set_index(['day'],inplace=True)
#** 7-3 train-test split**
model_data = weather_energy[['avg_energy','weather_cluster','holiday_ind']]

train = model_data.iloc[0:(len(model_data) - 30)]
test = model_data.iloc[len(train):(len(model_data) - 1)]

train['avg_energy'].plot(figsize=(25, 4))
test['avg_energy'].plot(figsize=(25, 4))
test.head(1)
# **ACF PACF **
plot_acf(train.avg_energy,lags=100)
plt.show()

plot_pacf(train.avg_energy,lags=50)
plt.show()

在这里插入图片描述
在这里插入图片描述

t = sm.tsa.adfuller(train.avg_energy, autolag='AIC')
pd.Series(t[0:4], index=['Test Statistic','p-value','#Lags Used','Number of Observations Used'])


# 函数差分
def difference(dataset, interval):
    diff = list()
    for i in range(interval, len(dataset)):
        value = dataset.iloc[i] - dataset.iloc[i - interval]
        diff.append(value)
    return diff
t  = sm.tsa.adfuller(difference(train.avg_energy,1), autolag='AIC')
pd.Series(t[0:4], index=['Test Statistic','p-value','#Lags Used','Number of Observations Used'])


# 季节性成分相当低,但趋势相当强劲,夏季(4月至9月)用电量有明显下降。这可能是由于夏季白天变长。

s = sm.tsa.seasonal_decompose(train.avg_energy,freq=12)

s.seasonal.plot(figsize=(20,5))
s.trend.plot(figsize=(20,5))
s.resid.plot(figsize=(20,5))


endog = train['avg_energy']
exog = sm.add_constant(train[['weather_cluster','holiday_ind']])

#mod = sm.tsa.statespace.SARIMAX(endog=endog, exog=exog, order=(7,1,1),seasonal_order=(1,1, 0, 12),trend='c')
mod = sm.tsa.statespace.SARIMAX(endog=endog, order=(7,1,1),seasonal_order=(1,1, 0, 12),trend='c')
model_fit = mod.fit()
model_fit.summary()


#**Model Fit**
train['avg_energy'].plot(figsize=(25,10))
model_fit.fittedvalues.plot()
plt.show()

在这里插入图片描述

#**预测**

predict = model_fit.predict(start = len(train),end = len(train)+len(test)-1)
#predict = model_fit.predict(start = len(train),end = len(train)+len(test)-1,exog = sm.add_constant(test[['weather_cluster','holiday_ind']]))
test['predicted'] = predict.values
#print(test.tail(5))
'''
            avg_energy  weather_cluster  holiday_ind  predicted
day                                                            
2014-02-23   11.673756                0            0  11.556584
2014-02-24   10.586235                0            0  10.708762
2014-02-25   10.476498                0            0  11.445902
2014-02-26   10.375366                0            0  11.869046
2014-02-27   10.537250                0            0  11.484142

'''
test['residual'] = abs(test['avg_energy']-test['predicted'])
MAE = test['residual'].sum()/len(test)
MAPE = (abs(test['residual'])/test['avg_energy']).sum()*100/len(test)
print("MAE:", MAE) #平均绝对误差: 0.5851807015623742 值越小,说明预测模型拥有更好的精确度。
print("MAPE:", MAPE) #平均绝对百分比误差: 5.237486909890385  预测结果较真实结果平均偏离 5%

model_fit.resid.plot(figsize= (30,5))
model_fit.fittedvalues.plot(figsize = (30,5))
test.predicted.plot()
#print(test['predicted'].tail(5))
'''
2014-02-23    11.556584
2014-02-24    10.708762
2014-02-25    11.445902
2014-02-26    11.869046
2014-02-27    11.484142
'''

在这里插入图片描述

LSTM


dataframe = weather_energy.loc[:,'avg_energy']
dataset = dataframe.values
dataset = dataset.astype('float32')

# convert series to supervised learning
def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):
    n_vars = 1
    df = pd.DataFrame(data)
    cols, names = list(), list()
    # input sequence (t-n, ... t-1)
    for i in range(n_in, 0, -1):
        cols.append(df.shift(i))
        names += [('var%d(t-%d)' % (j+1, i)) for j in range(n_vars)]
    # forecast sequence (t, t+1, ... t+n)
    for i in range(0, n_out):
        cols.append(df.shift(-i))
        if i == 0:
            names += [('var%d(t)' % (j+1)) for j in range(n_vars)]
        else:
            names += [('var%d(t+%d)' % (j+1, i)) for j in range(n_vars)]
    # put it all together
    agg = pd.concat(cols, axis=1)
    agg.columns = names
    # drop rows with NaN values
    if dropnan:
        agg.dropna(inplace=True)
    return agg

reframed = series_to_supervised(dataset, 7,1)
print(reframed.head(3))

reframed['weather_cluster'] = weather_energy.weather_cluster.values[7:]
reframed['holiday_ind']= weather_energy.holiday_ind.values[7:]

reframed = reframed.reindex(['weather_cluster', 'holiday_ind','var1(t-7)', 'var1(t-6)', 'var1(t-5)', 'var1(t-4)', 'var1(t-3)','var1(t-2)', 'var1(t-1)', 'var1(t)'], axis=1)
reframed = reframed.values

#Normalization

scaler = MinMaxScaler(feature_range=(0, 1))
reframed = scaler.fit_transform(reframed)

# 分割训练集和测试集
train = reframed[:(len(reframed)-30), :]
test = reframed[(len(reframed)-30):len(reframed), :]

train_X, train_y = train[:, :-1], train[:, -1]
test_X, test_y = test[:, :-1], test[:, -1]

# 将输入重塑为3D[samples, timesteps, features]
train_X = train_X.reshape((train_X.shape[0], 1, train_X.shape[1]))
test_X = test_X.reshape((test_X.shape[0], 1, test_X.shape[1]))
print(train_X.shape, train_y.shape, test_X.shape, test_y.shape)
#**Modelling**

# design network
model = Sequential()
model.add(LSTM(50, input_shape=(train_X.shape[1], train_X.shape[2])))
model.add(Dense(1))
model.compile(loss='mae', optimizer='adam')
# fit network
history = model.fit(train_X, train_y, epochs=50, batch_size=72, verbose=2, shuffle=False)
# plot history
pyplot.plot(history.history['loss'], label='train')
pyplot.legend()
pyplot.show()

在这里插入图片描述

# make a prediction
yhat = model.predict(test_X)

test_X = test_X.reshape(test_X.shape[0], test_X.shape[2])

# invert scaling for forecast
inv_yhat = np.concatenate((yhat, test_X), axis=1)
inv_yhat = scaler.inverse_transform(inv_yhat)


# invert scaling for actual
test_y = test_y.reshape((len(test_y), 1))
inv_y = np.concatenate((test_y, test_X), axis=1)
inv_y = scaler.inverse_transform(inv_y)


#Performance
act = [i[9] for i in inv_y] # last element is the predicted average energy
pred = [i[9] for i in inv_yhat] # last element is the actual average energy

# calculate RMSE
import math
rmse = math.sqrt(mean_squared_error(act, pred))
print('Test RMSE: %.3f' % rmse)

predicted_lstm = pd.DataFrame({'predicted':pred,'avg_energy':act})
predicted_lstm['avg_energy'].plot(figsize=(25,10),color = 'red')
predicted_lstm['predicted'].plot(color = 'blue')
plt.show()

在这里插入图片描述

  • 4
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

考古学家lx(李玺)

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值