电力预测模型(线性模型)

电力预测

#加载对应的库
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os
#读取数据
os.chdir("D:\\LengPY\\AI电力能耗预测")
data = pd.read_csv('zhenjiang_power.csv')  # 读取训练数据
data_9 = pd.read_csv('zhenjiang_power_9.csv')  # 读取训练数据
print(data.shape)
print(data_9.shape)
(885486, 3)
(43620, 3)
data.head()
user_idrecord_datepower_consumption
012015-01-011135.0
112015-01-02570.0
212015-01-033418.0
312015-01-043968.0
412015-01-053986.0
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 885486 entries, 0 to 885485
Data columns (total 3 columns):
user_id              885486 non-null int64
record_date          885486 non-null object
power_consumption    885486 non-null float64
dtypes: float64(1), int64(1), object(1)
memory usage: 20.3+ MB
data.describe()
user_idpower_consumption
count885486.0000008.854860e+05
mean727.5000002.619980e+03
std419.7337833.154743e+04
min1.0000001.000000e+00
25%364.0000004.200000e+01
50%727.5000002.610000e+02
75%1091.0000008.250000e+02
max1454.0000001.310016e+06

拼接data和data_9

train_df = pd.concat([data,data_9])
D:\Anaconda\lib\site-packages\ipykernel_launcher.py:1: FutureWarning: Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.

To retain the current behavior and silence the warning, pass 'sort=True'.

  """Entry point for launching an IPython kernel.
#查看拼接后
train_df.shape
(929106, 3)
train_df.head()
power_consumptionrecord_dateuser_id
01135.02015-01-011
1570.02015-01-021
23418.02015-01-031
33968.02015-01-041
43986.02015-01-051

查看user_id的种类,即有多少公司使用电

len(train_df['user_id'].unique())
#train_df['user_id'].nunique()
1454

目标:预测未来整个高新区,每一天的总用电量

先将record_date改成datetime格式¶

train_df.loc[:,'record_date'] = pd.to_datetime(train_df['record_date'])
train_df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 929106 entries, 0 to 43619
Data columns (total 3 columns):
power_consumption    929106 non-null float64
record_date          929106 non-null datetime64[ns]
user_id              929106 non-null int64
dtypes: datetime64[ns](1), float64(1), int64(1)
memory usage: 28.4 MB

把数据集按照‘日期’进行分组后,求每天的电力总和

train_df = train_df[['record_date','power_consumption']].groupby('record_date').agg('sum')
train_df.head()
power_consumption
record_date
2015-01-012900575.0
2015-01-023158211.0
2015-01-033596487.0
2015-01-043939672.0
2015-01-054101790.0

恢复索引

train_df = train_df.reset_index()
train_df.head()
record_datepower_consumption
02015-01-012900575.0
12015-01-023158211.0
22015-01-033596487.0
32015-01-043939672.0
42015-01-054101790.0

快速查看总时间轴上的电力消耗

%matplotlib inline
train_df['power_consumption'].plot()
<matplotlib.axes._subplots.AxesSubplot at 0x20e6f763c18>

在这里插入图片描述

train_df[(train_df['record_date']>='2015-09-01')&(train_df['record_date']<='2015-10-31')]['power_consumption'].plot()
<matplotlib.axes._subplots.AxesSubplot at 0x20e6f820be0>

在这里插入图片描述

细化x轴的信息
%matplotlib inline
tmp_df = train_df[(train_df['record_date']>='2015-09-01')&(train_df['record_date']<='2015-10-31')].copy()
tmp_df = tmp_df.set_index(['record_date'])
tmp_df['power_consumption'].plot()
<matplotlib.axes._subplots.AxesSubplot at 0x20e6f89dda0>

在这里插入图片描述

结论: 可以发现2015年10月1号,电力消耗比较少

添加测试数据
train_df.tail()
record_datepower_consumption
6342016-09-264042132.0
6352016-09-274287965.0
6362016-09-284086998.0
6372016-09-293941842.0
6382016-09-303783264.0
创建测试集,时间为2016年10月1号,总共31天
test_df = pd.date_range('2016-10-01',periods=31,freq='D')
test_df = pd.DataFrame(test_df)
test_df.head()
0
02016-10-01
12016-10-02
22016-10-03
32016-10-04
42016-10-05
test_df.columns = ['record_date']
#设定初始值
test_df.loc[:,'power_consumption']=0
test_df.head(15)
record_datepower_consumption
02016-10-010
12016-10-020
22016-10-030
32016-10-040
42016-10-050
52016-10-060
62016-10-070
72016-10-080
82016-10-090
92016-10-100
102016-10-110
112016-10-120
122016-10-130
132016-10-140
142016-10-150

拼成一整份数据,做特征工程

total_df = pd.concat([train_df,test_df])
#观察拼接后末尾数据情况
total_df.tail()
record_datepower_consumption
262016-10-270.0
272016-10-280.0
282016-10-290.0
292016-10-300.0
302016-10-310.0

构造时间特征

构造一些强时间指代特征:

  1. 星期几
  2. 一个月当中的第几天(月初还是月末)
  3. 一年当中的第几天(季节信息)
  4. 一年当中的第几个月(季节)、哪一年
total_df.loc[:,'week'] = total_df['record_date'].apply(lambda x:x.dayofweek)
total_df.loc[:,'day'] = total_df['record_date'].apply(lambda x:x.day)
total_df.loc[:,'month'] = total_df['record_date'].apply(lambda x:x.month)
total_df.loc[:,'year'] = total_df['record_date'].apply(lambda x:x.year)
total_df.head()
record_datepower_consumptionweekdaymonthyear
02015-01-012900575.03112015
12015-01-023158211.04212015
22015-01-033596487.05312015
32015-01-043939672.06412015
42015-01-054101790.00512015

添加周末特征

#判断是否周末和周六周日
total_df.loc[:,'weekend']=0
total_df.loc[:,'weekend_sat']=0
total_df.loc[:,'weekend_sun']=0
total_df.head(10)
record_datepower_consumptionweekdaymonthyearweekendweekend_satweekend_sun
02015-01-012900575.03112015000
12015-01-023158211.04212015000
22015-01-033596487.05312015000
32015-01-043939672.06412015000
42015-01-054101790.00512015000
52015-01-064149164.01612015000
62015-01-074161928.02712015000
72015-01-084182622.03812015000
82015-01-094153509.04912015000
92015-01-103913704.051012015000
total_df.loc[(total_df['week']>4),'weekend']=1
total_df.loc[(total_df['week']==5),'weekend_sat']=1
total_df.loc[(total_df['week']==6),'weekend_sun']=1
total_df.head(10)
record_datepower_consumptionweekdaymonthyearweekendweekend_satweekend_sun
02015-01-012900575.03112015000
12015-01-023158211.04212015000
22015-01-033596487.05312015110
32015-01-043939672.06412015101
42015-01-054101790.00512015000
52015-01-064149164.01612015000
62015-01-074161928.02712015000
72015-01-084182622.03812015000
82015-01-094153509.04912015000
92015-01-103913704.051012015110

添加一个月4周的信息

def week_of_month(day):
    if day in range(1,8):
        return 1
    if day in range(8,15):
        return 2
    if day in range(15,22):
        return 3
    else:
        return 4
total_df.loc[:,'week_of_month'] = total_df['week'].apply(lambda x:week_of_month(x))
total_df.head()
record_datepower_consumptionweekdaymonthyearweekendweekend_satweekend_sunweek_of_month
02015-01-012900575.031120150001
12015-01-023158211.042120150001
22015-01-033596487.053120151101
32015-01-043939672.064120151011
42015-01-054101790.005120150004

添加月的上中下旬

def period_of_month(day):
    if day in range(1,11):
        return 1
    if day in range(11,21):
        return 2
    else:
        return 3
total_df.loc[:,'period_of_month'] = total_df['week'].apply(lambda x:period_of_month(x))

添加上半月和下半月

def period2_of_month(day):
    if day in range(1,16):
        return 1
    else:
        return 2
total_df.loc[:,'period2_of_month'] = total_df['week'].apply(lambda x:period2_of_month(x))
total_df.head()
record_datepower_consumptionweekdaymonthyearweekendweekend_satweekend_sunweek_of_monthperiod_of_monthperiod2_of_month
02015-01-012900575.03112015000111
12015-01-023158211.04212015000111
22015-01-033596487.05312015110111
32015-01-043939672.06412015101111
42015-01-054101790.00512015000432

填充法定节假日

total_df.loc[:,'festival'] = 0
total_df.loc[(total_df.month==10)&(total_df.day<8), 'festival']=1
total_df.head(15)
record_datepower_consumptionweekdaymonthyearweekendweekend_satweekend_sunweek_of_monthperiod_of_monthperiod2_of_monthfestival
02015-01-012900575.031120150001110
12015-01-023158211.042120150001110
22015-01-033596487.053120151101110
32015-01-043939672.064120151011110
42015-01-054101790.005120150004320
52015-01-064149164.016120150001110
62015-01-074161928.027120150001110
72015-01-084182622.038120150001110
82015-01-094153509.049120150001110
92015-01-103913704.0510120151101110
102015-01-113635468.0611120151011110
112015-01-124011329.0012120150004320
122015-01-133969860.0113120150001110
132015-01-144225259.0214120150001110
142015-01-154106437.0315120150001110
total_df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 670 entries, 0 to 30
Data columns (total 13 columns):
record_date          670 non-null datetime64[ns]
power_consumption    670 non-null float64
week                 670 non-null int64
day                  670 non-null int64
month                670 non-null int64
year                 670 non-null int64
weekend              670 non-null int64
weekend_sat          670 non-null int64
weekend_sun          670 non-null int64
week_of_month        670 non-null int64
period_of_month      670 non-null int64
period2_of_month     670 non-null int64
festival             670 non-null int64
dtypes: datetime64[ns](1), float64(1), int64(11)
memory usage: 73.3 KB
total_df.columns
Index(['record_date', 'power_consumption', 'week', 'day', 'month', 'year',
       'weekend', 'weekend_sat', 'weekend_sun', 'week_of_month',
       'period_of_month', 'period2_of_month', 'festival'],
      dtype='object')
var_to_encoding = [ 'week', 'day', 'month', 'year',
       'weekend', 'weekend_sat', 'weekend_sun', 'week_of_month',
       'period_of_month', 'period2_of_month']
var_to_encoding
['week',
 'day',
 'month',
 'year',
 'weekend',
 'weekend_sat',
 'weekend_sun',
 'week_of_month',
 'period_of_month',
 'period2_of_month']
dummy_df = pd.get_dummies(total_df,columns=var_to_encoding)
dummy_df.head()
#重新编码,将1-5的类型转化为01类型
record_datepower_consumptionfestivalweek_0week_1week_2week_3week_4week_5week_6...weekend_sat_0weekend_sat_1weekend_sun_0weekend_sun_1week_of_month_1week_of_month_4period_of_month_1period_of_month_3period2_of_month_1period2_of_month_2
02015-01-012900575.000001000...1010101010
12015-01-023158211.000000100...1010101010
22015-01-033596487.000000010...0110101010
32015-01-043939672.000000001...1001101010
42015-01-054101790.001000000...1010010101

5 rows × 67 columns

dummy_df.columns
Index(['record_date', 'power_consumption', 'festival', 'week_0', 'week_1',
       'week_2', 'week_3', 'week_4', 'week_5', 'week_6', 'day_1', 'day_2',
       'day_3', 'day_4', 'day_5', 'day_6', 'day_7', 'day_8', 'day_9', 'day_10',
       'day_11', 'day_12', 'day_13', 'day_14', 'day_15', 'day_16', 'day_17',
       'day_18', 'day_19', 'day_20', 'day_21', 'day_22', 'day_23', 'day_24',
       'day_25', 'day_26', 'day_27', 'day_28', 'day_29', 'day_30', 'day_31',
       'month_1', 'month_2', 'month_3', 'month_4', 'month_5', 'month_6',
       'month_7', 'month_8', 'month_9', 'month_10', 'month_11', 'month_12',
       'year_2015', 'year_2016', 'weekend_0', 'weekend_1', 'weekend_sat_0',
       'weekend_sat_1', 'weekend_sun_0', 'weekend_sun_1', 'week_of_month_1',
       'week_of_month_4', 'period_of_month_1', 'period_of_month_3',
       'period2_of_month_1', 'period2_of_month_2'],
      dtype='object')
#划分测试机训练集
train_X = dummy_df[dummy_df.record_date<'2016-10-01']
train_y = dummy_df[dummy_df.record_date<'2016-10-01']['power_consumption']
test_X = dummy_df[dummy_df.record_date>='2016-10-01']
train_X.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 639 entries, 0 to 638
Data columns (total 67 columns):
record_date           639 non-null datetime64[ns]
power_consumption     639 non-null float64
festival              639 non-null int64
week_0                639 non-null uint8
week_1                639 non-null uint8
week_2                639 non-null uint8
week_3                639 non-null uint8
week_4                639 non-null uint8
week_5                639 non-null uint8
week_6                639 non-null uint8
day_1                 639 non-null uint8
day_2                 639 non-null uint8
day_3                 639 non-null uint8
day_4                 639 non-null uint8
day_5                 639 non-null uint8
day_6                 639 non-null uint8
day_7                 639 non-null uint8
day_8                 639 non-null uint8
day_9                 639 non-null uint8
day_10                639 non-null uint8
day_11                639 non-null uint8
day_12                639 non-null uint8
day_13                639 non-null uint8
day_14                639 non-null uint8
day_15                639 non-null uint8
day_16                639 non-null uint8
day_17                639 non-null uint8
day_18                639 non-null uint8
day_19                639 non-null uint8
day_20                639 non-null uint8
day_21                639 non-null uint8
day_22                639 non-null uint8
day_23                639 non-null uint8
day_24                639 non-null uint8
day_25                639 non-null uint8
day_26                639 non-null uint8
day_27                639 non-null uint8
day_28                639 non-null uint8
day_29                639 non-null uint8
day_30                639 non-null uint8
day_31                639 non-null uint8
month_1               639 non-null uint8
month_2               639 non-null uint8
month_3               639 non-null uint8
month_4               639 non-null uint8
month_5               639 non-null uint8
month_6               639 non-null uint8
month_7               639 non-null uint8
month_8               639 non-null uint8
month_9               639 non-null uint8
month_10              639 non-null uint8
month_11              639 non-null uint8
month_12              639 non-null uint8
year_2015             639 non-null uint8
year_2016             639 non-null uint8
weekend_0             639 non-null uint8
weekend_1             639 non-null uint8
weekend_sat_0         639 non-null uint8
weekend_sat_1         639 non-null uint8
weekend_sun_0         639 non-null uint8
weekend_sun_1         639 non-null uint8
week_of_month_1       639 non-null uint8
week_of_month_4       639 non-null uint8
period_of_month_1     639 non-null uint8
period_of_month_3     639 non-null uint8
period2_of_month_1    639 non-null uint8
period2_of_month_2    639 non-null uint8
dtypes: datetime64[ns](1), float64(1), int64(1), uint8(64)
memory usage: 59.9 KB
drop_columns = ['record_date','power_consumption']
train_X = train_X.drop(drop_columns, axis=1)
test_X = test_X.drop(drop_columns, axis=1)
train_X.head()

festivalweek_0week_1week_2week_3week_4week_5week_6day_1day_2...weekend_sat_0weekend_sat_1weekend_sun_0weekend_sun_1week_of_month_1week_of_month_4period_of_month_1period_of_month_3period2_of_month_1period2_of_month_2
00000100010...1010101010
10000010001...1010101010
20000001000...0110101010
30000000100...1001101010
40100000000...1010010101

5 rows × 65 columns

建立线性模型

from sklearn.linear_model import RidgeCV
linear_reg = RidgeCV(alphas=[0.2,0.5,0.8], cv=5)
linear_reg.fit(train_X,train_y)
RidgeCV(alphas=array([0.2, 0.5, 0.8]), cv=5, fit_intercept=True,
    gcv_mode=None, normalize=False, scoring=None, store_cv_values=False)

评估模型,这里返回的是R^2分数

linear_reg.score(train_X,train_y)
0.537404260499189
prdictions = linear_reg.predict(test_X)

预测10月份电力

prdictions
array([3213269.92309517, 3123678.42816242, 3488047.72956573,
       3549293.25537305, 3483007.25110293, 3526193.91861047,
       3553449.2940896 , 3822642.14908673, 3647254.75965857,
       3920055.32252978, 3963349.38044719, 3993343.10094771,
       3971175.731758  , 3963793.56136557, 3774211.69037114,
       3657509.20919986, 3949241.60693345, 4011946.71989673,
       4016839.22480092, 3917574.35561122, 3946839.7769619 ,
       3891639.02982068, 3766109.02571362, 4043628.80876831,
       4073053.0043004 , 4071511.5183789 , 4071951.35102406,
       3950797.94209951, 3954010.07971396, 3730478.65367573,
       3908992.01170212])
test_df.head()
record_datepower_consumption
02016-10-010
12016-10-020
22016-10-030
32016-10-040
42016-10-050
test_df.loc[:,'power_consumption'] = prdictions
test_df.head()
record_datepower_consumption
02016-10-013.213270e+06
12016-10-023.123678e+06
22016-10-033.488048e+06
32016-10-043.549293e+06
42016-10-053.483007e+06

  • 3
    点赞
  • 20
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值