电力负荷短期预测模型(基于ARIMA)

利用20天分时段电力数据,通过R语言进行数据预处理、聚类与预测。采用系统聚类法与K-means对客户进行分类,并基于历史数据预测未来电力负荷。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

电力分析与预测

根据提供的客户的20天的分时段数据,进行分析:

要求1:根据数据对客户进行聚类分析;

要求2:根据数据对客户进行负荷预测。

一.导入数据

# 安装库专用

# 通过如下命令设定镜像
options(repos = 'http://mirrors.ustc.edu.cn/CRAN/')
# 查看镜像是否修改
getOption('repos')
# 尝试下载R包
#若有需要,进行安装
#install.packages('forecast')

‘http://mirrors.ustc.edu.cn/CRAN/’

Installing package into 'C:/Users/天涯过客/Documents/R/win-library/4.0'
(as 'lib' is unspecified)

also installing the dependencies 'fracdiff', 'urca'




package 'fracdiff' successfully unpacked and MD5 sums checked
package 'urca' successfully unpacked and MD5 sums checked
package 'forecast' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
	C:\Users\天涯过客\AppData\Local\Temp\Rtmpop8xQR\downloaded_packages
#设置工作路径
setwd("D:/LengPY")
#导入数据
library(readxl)
data1_6<-read_excel("10.1-10.6日数据.xlsx",sheet=1)
data7_13<-read_excel("10.7-10.13日数据.xlsx",sheet=1)
data14_20<-read_excel("10.14-10.20日数据.xlsx",sheet=1)
head(data1_6,3)
A tibble: 3 × 98
iddate00:00-00:1500:15-00:3000:30-00:4500:45-01:0001:00-01:1501:15-01:3001:30-01:4501:45-02:00...21:30-21:4521:45-22:0022:00-22:1522:15-22:3022:30-22:4522:45-23:0023:00-23:1523:15-23:3023:30-23:4523:45-00:00
<chr><chr><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>...<dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>
客户42020-10-010.6190.6190.6190.6190.6190.6190.6190.619...0.5810.5810.5810.5810.5810.5810.6190.6190.6190.619
客户52020-10-013.2103.2103.2103.2103.2103.2103.2103.210...3.2103.2103.2103.2103.2103.2103.2103.2103.2103.210
客户82020-10-010.0000.0000.0000.0000.0000.0000.0000.000...0.0000.0000.0000.0000.0000.0000.0000.0000.0000.000

二.数据的预处理

#head(data7_13,3)
#head(data14_20,3)
### 可发现维度一致,将其合并
data<-rbind(data1_6,data7_13,data14_20)
head(data)
str(data)#查看数据类型

A tibble: 6 × 98
iddate00:00-00:1500:15-00:3000:30-00:4500:45-01:0001:00-01:1501:15-01:3001:30-01:4501:45-02:00...21:30-21:4521:45-22:0022:00-22:1522:15-22:3022:30-22:4522:45-23:0023:00-23:1523:15-23:3023:30-23:4523:45-00:00
<chr><chr><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>...<dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>
客户4 2020-10-01 0.619 0.619 0.619 0.619 0.619 0.619 0.619 0.619... 0.581 0.581 0.581 0.581 0.581 0.581 0.619 0.619 0.619 0.619
客户5 2020-10-01 3.210 3.210 3.210 3.210 3.210 3.210 3.210 3.210... 3.210 3.210 3.210 3.210 3.210 3.210 3.210 3.210 3.210 3.210
客户8 2020-10-01 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000... 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
客户7 2020-10-01 0.625 0.625 0.625 0.625 0.625 0.625 0.625 0.625... 0.635 0.635 0.635 0.635 0.635 0.635 0.625 0.625 0.625 0.625
客户89 2020-10-0122.27822.27822.27822.27822.27822.27822.27822.278...14.47014.47014.47014.47014.47014.47022.27822.27822.27822.278
客户1602020-10-0165.20048.48065.84062.56049.12064.56062.56065.840...84.56085.92085.92083.92084.56067.20098.00083.92084.56098.640
tibble [3,900 x 98] (S3: tbl_df/tbl/data.frame)
 $ id         : chr [1:3900] "客户4" "客户5" "客户8" "客户7" ...
 $ date       : chr [1:3900] "2020-10-01" "2020-10-01" "2020-10-01" "2020-10-01" ...
 $ 00:00-00:15: num [1:3900] 0.619 3.21 0 0.625 22.278 ...
 $ 00:15-00:30: num [1:3900] 0.619 3.21 0 0.625 22.278 ...
 $ 00:30-00:45: num [1:3900] 0.619 3.21 0 0.625 22.278 ...
 $ 00:45-01:00: num [1:3900] 0.619 3.21 0 0.625 22.278 ...
 $ 01:00-01:15: num [1:3900] 0.619 3.21 0 0.625 22.278 ...
 $ 01:15-01:30: num [1:3900] 0.619 3.21 0 0.625 22.278 ...
 $ 01:30-01:45: num [1:3900] 0.619 3.21 0 0.625 22.278 ...
 $ 01:45-02:00: num [1:3900] 0.619 3.21 0 0.625 22.278 ...
 $ 02:00-02:15: num [1:3900] 0.619 3.21 0 0.625 22.278 ...
 $ 02:15-02:30: num [1:3900] 0.619 3.21 0 0.625 22.278 ...
 $ 02:30-02:45: num [1:3900] 0.619 3.21 0 0.625 22.278 ...
 $ 02:45-03:00: num [1:3900] 0.619 3.21 0 0.625 22.278 ...
 $ 03:00-03:15: num [1:3900] 0.619 3.21 0 0.625 22.278 ...
 $ 03:15-03:30: num [1:3900] 0.619 3.21 0 0.625 22.278 ...
 $ 03:30-03:45: num [1:3900] 0.619 3.21 0 0.625 22.278 ...
 $ 03:45-04:00: num [1:3900] 0.619 3.21 0 0.625 22.278 ...
 $ 04:00-04:15: num [1:3900] 0.619 3.21 0 0.625 22.278 ...
 $ 04:15-04:30: num [1:3900] 0.619 3.21 0 0.625 22.278 ...
 $ 04:30-04:45: num [1:3900] 0.619 3.21 0 0.625 22.278 ...
 $ 04:45-05:00: num [1:3900] 0.619 3.21 0 0.625 22.278 ...
 $ 05:00-05:15: num [1:3900] 0.619 3.21 0 0.625 22.278 ...
 $ 05:15-05:30: num [1:3900] 0.619 3.21 0 0.625 22.278 ...
 $ 05:30-05:45: num [1:3900] 0.619 3.21 0 0.625 22.278 ...
 $ 05:45-06:00: num [1:3900] 0.619 3.21 0 0.625 22.278 ...
 $ 06:00-06:15: num [1:3900] 0.619 3.21 0 0.625 22.278 ...
 $ 06:15-06:30: num [1:3900] 0.619 3.21 0 0.625 22.278 ...
 $ 06:30-06:45: num [1:3900] 0.619 3.21 0 0.625 22.278 ...
 $ 06:45-07:00: num [1:3900] 0.619 3.21 0 0.625 22.278 ...
 $ 07:00-07:15: num [1:3900] 0.581 3.21 0 0.635 14.47 ...
 $ 07:15-07:30: num [1:3900] 0.581 3.21 0 0.635 14.47 ...
 $ 07:30-07:45: num [1:3900] 0.581 3.21 0 0.635 14.47 ...
 $ 07:45-08:00: num [1:3900] 0.581 3.21 0 0.635 14.47 ...
 $ 08:00-08:15: num [1:3900] 0.581 3.21 0 0.635 14.47 ...
 $ 08:15-08:30: num [1:3900] 0.581 3.21 0 0.635 14.47 ...
 $ 08:30-08:45: num [1:3900] 0.581 3.21 0 0.635 14.47 ...
 $ 08:45-09:00: num [1:3900] 0.581 3.21 0 0.635 14.47 ...
 $ 09:00-09:15: num [1:3900] 0.581 3.21 0 0.635 14.47 ...
 $ 09:15-09:30: num [1:3900] 0.581 3.21 0 0.635 14.47 ...
 $ 09:30-09:45: num [1:3900] 0.581 3.21 0 0.635 14.47 ...
 $ 09:45-10:00: num [1:3900] 0.581 3.21 0 0.635 14.47 ...
 $ 10:00-10:15: num [1:3900] 0.581 3.21 0 0.635 14.47 ...
 $ 10:15-10:30: num [1:3900] 0.581 3.21 0 0.635 14.47 ...
 $ 10:30-10:45: num [1:3900] 0.581 3.21 0 0.635 14.47 ...
 $ 10:45-11:00: num [1:3900] 0.581 3.21 0 0.635 14.47 ...
 $ 11:00-11:15: num [1:3900] 0.608 3.611 0 0.624 9.262 ...
 $ 11:15-11:30: num [1:3900] 0.608 3.611 0 0.624 9.262 ...
 $ 11:30-11:45: num [1:3900] 0.608 3.611 0 0.624 9.262 ...
 $ 11:45-12:00: num [1:3900] 0.608 3.611 0 0.624 9.262 ...
 $ 12:00-12:15: num [1:3900] 0.608 3.611 0 0.624 9.262 ...
 $ 12:15-12:30: num [1:3900] 0.608 3.611 0 0.624 9.262 ...
 $ 12:30-12:45: num [1:3900] 0.608 3.611 0 0.624 9.262 ...
 $ 12:45-13:00: num [1:3900] 0.608 3.611 0 0.624 9.262 ...
 $ 13:00-13:15: num [1:3900] 0.608 3.611 0 0.624 9.262 ...
 $ 13:15-13:30: num [1:3900] 0.608 3.611 0 0.624 9.262 ...
 $ 13:30-13:45: num [1:3900] 0.608 3.611 0 0.624 9.262 ...
 $ 13:45-14:00: num [1:3900] 0.608 3.611 0 0.624 9.262 ...
 $ 14:00-14:15: num [1:3900] 0.608 3.611 0 0.624 9.262 ...
 $ 14:15-14:30: num [1:3900] 0.608 3.611 0 0.624 9.262 ...
 $ 14:30-14:45: num [1:3900] 0.608 3.611 0 0.624 9.262 ...
 $ 14:45-15:00: num [1:3900] 0.608 3.611 0 0.624 9.262 ...
 $ 15:00-15:15: num [1:3900] 0.608 3.611 0 0.624 9.262 ...
 $ 15:15-15:30: num [1:3900] 0.608 3.611 0 0.624 9.262 ...
 $ 15:30-15:45: num [1:3900] 0.608 3.611 0 0.624 9.262 ...
 $ 15:45-16:00: num [1:3900] 0.608 3.611 0 0.624 9.262 ...
 $ 16:00-16:15: num [1:3900] 0.608 3.611 0 0.624 9.262 ...
 $ 16:15-16:30: num [1:3900] 0.608 3.611 0 0.624 9.262 ...
 $ 16:30-16:45: num [1:3900] 0.608 3.611 0 0.624 9.262 ...
 $ 16:45-17:00: num [1:3900] 0.608 3.611 0 0.624 9.262 ...
 $ 17:00-17:15: num [1:3900] 0.608 3.611 0 0.624 9.262 ...
 $ 17:15-17:30: num [1:3900] 0.608 3.611 0 0.624 9.262 ...
 $ 17:30-17:45: num [1:3900] 0.608 3.611 0 0.624 9.262 ...
 $ 17:45-18:00: num [1:3900] 0.608 3.611 0 0.624 9.262 ...
 $ 18:00-18:15: num [1:3900] 0.608 3.611 0 0.624 9.262 ...
 $ 18:15-18:30: num [1:3900] 0.608 3.611 0 0.624 9.262 ...
 $ 18:30-18:45: num [1:3900] 0.608 3.611 0 0.624 9.262 ...
 $ 18:45-19:00: num [1:3900] 0.608 3.611 0 0.624 9.262 ...
 $ 19:00-19:15: num [1:3900] 0.581 3.21 0 0.635 14.47 ...
 $ 19:15-19:30: num [1:3900] 0.581 3.21 0 0.635 14.47 ...
 $ 19:30-19:45: num [1:3900] 0.581 3.21 0 0.635 14.47 ...
 $ 19:45-20:00: num [1:3900] 0.581 3.21 0 0.635 14.47 ...
 $ 20:00-20:15: num [1:3900] 0.581 3.21 0 0.635 14.47 ...
 $ 20:15-20:30: num [1:3900] 0.581 3.21 0 0.635 14.47 ...
 $ 20:30-20:45: num [1:3900] 0.581 3.21 0 0.635 14.47 ...
 $ 20:45-21:00: num [1:3900] 0.581 3.21 0 0.635 14.47 ...
 $ 21:00-21:15: num [1:3900] 0.581 3.21 0 0.635 14.47 ...
 $ 21:15-21:30: num [1:3900] 0.581 3.21 0 0.635 14.47 ...
 $ 21:30-21:45: num [1:3900] 0.581 3.21 0 0.635 14.47 ...
 $ 21:45-22:00: num [1:3900] 0.581 3.21 0 0.635 14.47 ...
 $ 22:00-22:15: num [1:3900] 0.581 3.21 0 0.635 14.47 ...
 $ 22:15-22:30: num [1:3900] 0.581 3.21 0 0.635 14.47 ...
 $ 22:30-22:45: num [1:3900] 0.581 3.21 0 0.635 14.47 ...
 $ 22:45-23:00: num [1:3900] 0.581 3.21 0 0.635 14.47 ...
 $ 23:00-23:15: num [1:3900] 0.619 3.21 0 0.625 22.278 ...
 $ 23:15-23:30: num [1:3900] 0.619 3.21 0 0.625 22.278 ...
 $ 23:30-23:45: num [1:3900] 0.619 3.21 0 0.625 22.278 ...
 $ 23:45-00:00: num [1:3900] 0.619 3.21 0 0.625 22.278 ...
#检查是否有缺失值
## 可视化查看数据是否有缺失值
library(VIM)
aggr(data)

在这里插入图片描述

经检查,不存在缺失值,可直接进行分析。

# 按行计算总和
#library(dyplr)
library(tidyverse)
data<-data %>%mutate(rowsum = rowSums(.[3:98]))#添加一列,计算行和

三.基本描述性统计

#不同时间总负荷量对比
date_rowsum<-aggregate(x =data$rowsum, by= list(data$date), FUN =sum)
colnames(date_rowsum)<-c('date','sum')
date_rowsum
A data.frame: 20 × 2
datesum
<chr><dbl>
2020-10-01 93125.13
2020-10-02 89564.60
2020-10-03 89715.30
2020-10-04 91119.79
2020-10-05 95884.30
2020-10-06 95558.53
2020-10-07 97199.34
2020-10-08 97358.87
2020-10-09 91890.29
2020-10-10 94850.10
2020-10-11 95424.02
2020-10-12100501.56
2020-10-13100307.05
2020-10-14101899.14
2020-10-15105264.88
2020-10-16105435.29
2020-10-17100331.14
2020-10-18103870.77
2020-10-19 99513.48
2020-10-20 95456.16
date_rowsum$date<-as.Date(date_rowsum$date)
plot(date_rowsum,type = "o", col = "red", xlab = "date", ylab = "sum",
   main = "date_sum")

在这里插入图片描述

可发现在国庆节期间,电力负荷较低,可能与休假导致用电量下降有关,同时用电之间存在一定的周期性关系,推测是由于周末等因素导致周期性,可对此周期进行提取进一步分析。

#不同用户总负荷量对比
id_rowsum<-aggregate(x =data$rowsum, by= list(data$id), FUN =sum)
colnames(id_rowsum)<-c('id','sum')
head(id_rowsum)
A data.frame: 6 × 2
idsum
<chr><dbl>
1客户10 880.640
2客户100 1343.552
3客户101 6545.280
4客户102 5985.280
5客户103 1501.560
6客户10494735.159
summary(id_rowsum)#计算用户数量
      id                 sum          
 Length:176         Min.   :     0.0  
 Class :character   1st Qu.:   325.4  
 Mode  :character   Median :   892.7  
                    Mean   : 11047.0  
                    3rd Qu.:  3533.8  
                    Max.   :164139.1  

可知:样本有176名用户,其中平均总用电负荷11047,最高164139.1

hist(id_rowsum$sum, col = rgb(1,0,0,0.2))

在这里插入图片描述

可发现,大部分客户消耗电力负荷在20000以内,数据分布呈现偏态分布,用电量高的用户占小部分。可根据电力负荷量使用量等对客户进行分类,实行不同的政策。

#不同用户不同时间负荷量对比
date_id_rowsum<-aggregate(x =data$rowsum, by= list(data$date,data$id), FUN =sum)
colnames(date_id_rowsum)<-c('date','id','sum')
head(date_id_rowsum)
A data.frame: 6 × 3
dateidsum
<chr><chr><dbl>
12020-10-01客户1044.032
22020-10-02客户1044.032
32020-10-03客户1044.032
42020-10-04客户1044.032
52020-10-05客户1044.032
62020-10-06客户1044.032
library(dplyr)
cdata<-as.tibble(data)
#转换格式,便于处理
#计算1-20日各时段总功率情况
hourdata<-data%>%select(-id)%>% summarise(across(contains(":"),sum,na.rm=TRUE))
hourdata<-as.data.frame(hourdata)
#进行转置
hourdatat<-t(hourdata)
hourdatat<-as.data.frame(hourdatat)
colnames(hourdatat)<-c('sum')
hourdatat$time<-rownames(hourdatat)
head(hourdatat)

A data.frame: 6 × 2
sumtime
<dbl><chr>
00:00-00:1522090.4800:00-00:15
00:15-00:3022195.2300:15-00:30
00:30-00:4522047.5300:30-00:45
00:45-01:0022128.3700:45-01:00
01:00-01:1522152.1801:00-01:15
01:15-01:3022407.3601:15-01:30
barplot(hourdatat$sum,names.arg=hourdatat$time,xlab="time",ylab="sum",col="blue",
main="sum_hour",border="red")

在这里插入图片描述

可根据时段统计,确定用电峰谷情况,可根据高峰期与低谷期进行区别定价和计划供电。本例由于样本点较少,故以上信息仅供参考。

四.构建特征,模型准备

#计算每用户各时段平均功率情况
hour_meanid<-cdata%>%select(-date)%>% group_by(id)%>%summarise(across(everything(),mean,na.rm=TRUE))
head(hour_meanid)
A tibble: 6 × 98
id00:00-00:1500:15-00:3000:30-00:4500:45-01:0001:00-01:1501:15-01:3001:30-01:4501:45-02:0002:00-02:15...21:45-22:0022:00-22:1522:15-22:3022:30-22:4522:45-23:0023:00-23:1523:15-23:3023:30-23:4523:45-00:00rowsum
<chr><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>...<dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>
客户10 0.45400 0.45400 0.45400 0.45400 0.45400 0.45400 0.45400 0.45400 0.45400... 0.45400 0.45400 0.45400 0.45400 0.45400 0.45400 0.45400 0.45400 0.45400 44.0320
客户100 0.28375 0.28375 0.28375 0.28375 0.28375 0.28375 0.28375 0.28375 0.28375... 0.63815 0.63815 0.63815 0.63815 0.63815 0.28375 0.28375 0.28375 0.28375 67.1776
客户10110.6960010.36000 9.4080010.4160010.3040010.6400010.3600010.36000 9.96800... 0.00000 0.00000 0.00000 0.00000 0.0000010.3040010.9200010.0240010.80800 327.2640
客户102 3.08600 3.08600 3.08600 3.08600 3.08600 3.08600 3.08600 3.08600 3.08600... 3.08600 3.08600 3.08600 3.08600 3.08600 3.08600 3.08600 3.08600 3.08600 299.2640
客户103 0.83400 0.76200 0.70200 0.80400 0.75600 0.80700 0.73500 0.82500 0.72300... 0.84600 0.78000 0.82200 0.81000 0.82800 0.76200 0.85200 0.77400 0.81000 75.0780
客户10423.3102524.5307521.3847525.4107522.8367524.0247522.1107523.6696023.85660...25.5803024.2603024.7003024.6673024.6673024.7373020.7773024.7373022.097302368.3790

①系统聚类法

## 系统聚类及可视化
hc1 <- hclust(dist(hour_meanid[,-98]),method = "ward.D2")
##  可视化结果
par(family = "STKaiti",cex = 0.45)
Warning message in dist(hour_meanid[, -98]):
"强制改变过程中产生了NA"
plot(hc1,hang = -1)
rect.hclust(hc1, k=3, border="red") 

在这里插入图片描述

library(ggplot2)
library(gridExtra)
library(ggdendro)
library(cluster)
library(ggfortify)
ggdendrogram(hc1, segments = T,rotate = F, theme_dendro = FALSE,size = 4)+
  theme_bw()+theme(axis.text.x = element_text(size = 5,angle = 90))

在这里插入图片描述

在R中运行,可以得到高清图,因为编译器问题,图片可能比较糊

②K-means聚类

## 计算组内平方和  组间平方和
tot_withinss <- vector()
betweenss <- vector()
for(ii in 1:15){
  k1 <- kmeans(hour_meanid[,c(-1,-98)],ii)
  tot_withinss[ii] <- k1$tot.withinss
  betweenss[ii] <- k1$betweenss
}

kmeanvalue <- data.frame(kk = 1:15,
                         tot_withinss = tot_withinss,
                         betweenss = betweenss)


p1 <- ggplot(kmeanvalue,aes(x = kk,y = tot_withinss))+
  theme_bw()+
  geom_point() + geom_line() +labs(y = "value") +
  ggtitle("Total within-cluster sum of squares")+
  theme(plot.title = element_text(hjust = 0.5))+
  scale_x_continuous("kmean 聚类个数",kmeanvalue$kk)

p2 <- ggplot(kmeanvalue,aes(x = kk,y = betweenss))+
  theme_bw()+
  geom_point() +geom_line() +labs(y = "value") +
  ggtitle("The between-cluster sum of squares") +
  theme(plot.title = element_text(hjust = 0.5))+
  scale_x_continuous("kmean 聚类个数",kmeanvalue$kk)

grid.arrange(p1,p2,nrow=2)

在这里插入图片描述

可知,可分为3-4类左右

set.seed(245)
k3 <- kmeans(hour_meanid[,c(-1,-98)],4)
summary(k3)
             Length Class  Mode   
cluster      176    -none- numeric
centers      384    -none- numeric
totss          1    -none- numeric
withinss       4    -none- numeric
tot.withinss   1    -none- numeric
betweenss      1    -none- numeric
size           4    -none- numeric
iter           1    -none- numeric
ifault         1    -none- numeric
k3
K-means clustering with 4 clusters of sizes 6, 14, 12, 144

Cluster means:
  00:00-00:15 00:15-00:30 00:30-00:45 00:45-01:00 01:00-01:15 01:15-01:30
1  62.3225417  61.5598583  63.6116083  60.9971833  63.6551667  63.0593333
2  11.0446536  11.1182536  10.5954179  10.9049750  10.8082393  10.9635143
3  27.1211361  27.7766042  27.0752431  27.9496597  27.0779917  28.0800056
4   0.6796588   0.6750685   0.6726699   0.6789852   0.6641741   0.6706109
  01:30-01:45 01:45-02:00 02:00-02:15 02:15-02:30 02:30-02:45 02:45-03:00
1  62.3378667  61.9536917  62.5135250  63.8845000  61.6807417  62.3915167
2  10.8759214  10.8956643  10.7552357  10.9384250  10.8762357  10.7623750
3  27.1717583  27.4775153  27.5150958  27.8758236  27.9836847  27.5832681
4   0.6772897   0.6643105   0.6722411   0.6603515   0.6549296   0.6562005
  03:00-03:15 03:15-03:30 03:30-03:45 03:45-04:00 04:00-04:15 04:15-04:30
1  60.9340500  62.9090583  61.6450583  60.8104000    61.76591  62.3655333
2  10.8683857  11.0325750  10.7480857  10.8762000    10.91756  10.7136857
3  27.7674792  28.0245264  27.7221507  27.6713938    27.59359  27.7191694
4   0.6507282   0.6617699   0.6492977   0.6531032     0.65239   0.6486241
.....

Clustering vector:
  [1] 4 4 2 4 4 3 3 4 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 1 4 4
 [38] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 2 4 1 4 2 4 4 4 4 4
 [75] 4 4 4 4 4 4 4 4 4 4 2 4 2 4 3 4 4 4 4 4 4 4 4 4 4 3 4 4 4 4 4 4 4 4 4 4 4
[112] 4 4 4 4 4 4 2 2 4 4 4 4 4 4 4 4 4 4 4 4 2 4 4 4 4 4 2 4 4 4 4 4 4 4 3 4 4
[149] 1 3 2 4 3 3 4 4 4 2 2 4 1 4 4 1 3 4 4 4 3 2 3 4 1 4 4 2

Within cluster sum of squares by cluster:
[1] 91916.57 33987.51 60979.15 15200.70
 (between_SS / total_SS =  92.7 %)

Available components:

[1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
[6] "betweenss"    "size"         "iter"         "ifault"      
#将标签写入
hour_meanid$cluster<-k3$cluster
head(hour_meanid)
A tibble: 6 × 99
id00:00-00:1500:15-00:3000:30-00:4500:45-01:0001:00-01:1501:15-01:3001:30-01:4501:45-02:0002:00-02:15...22:00-22:1522:15-22:3022:30-22:4522:45-23:0023:00-23:1523:15-23:3023:30-23:4523:45-00:00rowsumcluster
<chr><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>...<dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><int>
客户10 0.45400 0.45400 0.45400 0.45400 0.45400 0.45400 0.45400 0.45400 0.45400... 0.45400 0.45400 0.45400 0.45400 0.45400 0.45400 0.45400 0.45400 44.03204
客户100 0.28375 0.28375 0.28375 0.28375 0.28375 0.28375 0.28375 0.28375 0.28375... 0.63815 0.63815 0.63815 0.63815 0.28375 0.28375 0.28375 0.28375 67.17764
客户10110.6960010.36000 9.4080010.4160010.3040010.6400010.3600010.36000 9.96800... 0.00000 0.00000 0.00000 0.0000010.3040010.9200010.0240010.80800 327.26402
客户102 3.08600 3.08600 3.08600 3.08600 3.08600 3.08600 3.08600 3.08600 3.08600... 3.08600 3.08600 3.08600 3.08600 3.08600 3.08600 3.08600 3.08600 299.26404
客户103 0.83400 0.76200 0.70200 0.80400 0.75600 0.80700 0.73500 0.82500 0.72300... 0.78000 0.82200 0.81000 0.82800 0.76200 0.85200 0.77400 0.81000 75.07804
客户10423.3102524.5307521.3847525.4107522.8367524.0247522.1107523.6696023.85660...24.2603024.7003024.6673024.6673024.7373020.7773024.7373022.097302368.37903

可发现:1,2,3,4类别以此对应电力负荷从大到小,其中1类的电力负荷用量最大,4类负荷小,且大部分用户都是4类,高用电的客户较少,符合常理。

#查看类别分布情况
table(k3$cluster)
  1   2   3   4 
  6  14  12 144 
## 对聚类结果可视化
clusplot(hour_meanid[,c(-1,-98)],k3$cluster,main = "kmean cluster number=4")

在这里插入图片描述

## 可视化轮廓图,表示聚类效果
sis1 <- silhouette(k3$cluster,dist(hour_meanid[,c(-1,-98)],method = "euclidean"))


plot(sis1,main = " kmean silhouette",
     col = c("red", "green", "blue","orange"))

在这里插入图片描述

根据可视化,可看出分类合理

最后的分类情况为:

cluster<-hour_meanid[,c(1,98,99)]
head(cluster)
A tibble: 6 × 3
idrowsumcluster
<chr><dbl><int>
客户10 44.03204
客户100 67.17764
客户101 327.26402
客户102 299.26404
客户103 75.07804
客户1042368.37903

五.构建特征,建立预测模型

#计算每天各时段总功率情况
hour_perdata<-data%>%select(-id)%>% group_by(date)%>%summarise(across(contains(":"),sum,na.rm=TRUE))
head(hour_perdata)
A tibble: 6 × 97
date00:00-00:1500:15-00:3000:30-00:4500:45-01:0001:00-01:1501:15-01:3001:30-01:4501:45-02:0002:00-02:15...21:30-21:4521:45-22:0022:00-22:1522:15-22:3022:30-22:4522:45-23:0023:00-23:1523:15-23:3023:30-23:4523:45-00:00
<chr><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>...<dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>
2020-10-011072.6391029.3411126.3941006.4681149.8251049.2461081.7041043.2981035.710...903.034925.398 892.032895.551916.666848.7361061.0421039.8321070.7161108.838
2020-10-021053.2301061.4321071.5091066.8051064.5431082.8071077.7491059.8691061.881...854.297890.661 889.655926.648934.576840.2781080.1171047.2431068.4151046.172
2020-10-031066.0821041.9771063.4211035.0971065.6151051.4901083.9081086.6581094.954...887.702882.407 906.146880.330927.592933.9591020.2491063.9681034.5391070.460
2020-10-041020.1911019.6301024.692 998.3851051.1361027.0531039.1581028.7851043.948...888.178872.275 900.867888.164846.700944.6161009.187 990.4221063.6721011.367
2020-10-051045.1601042.0541027.8711040.6561029.1981024.3411063.3391048.0331053.538...949.303986.0681005.798914.553975.165955.135 998.0241213.0501093.7031096.247
2020-10-061135.0021150.8901084.8391116.2691113.4801106.4951085.2471134.3221125.822...889.987898.923 866.585919.430972.184913.5581072.3521091.4111104.9611133.166

从日期中找到特征,如节假日,周末等

library(tseries)
library(forecast)
Warning message:
"package 'forecast' was built under R version 4.0.4"
Registered S3 methods overwritten by 'forecast':
  method                 from     
  autoplot.Arima         ggfortify
  autoplot.acf           ggfortify
  autoplot.ar            ggfortify
  autoplot.bats          ggfortify
  autoplot.decomposed.ts ggfortify
  autoplot.ets           ggfortify
  autoplot.forecast      ggfortify
  autoplot.stl           ggfortify
  autoplot.ts            ggfortify
  fitted.ar              ggfortify
  fortify.ts             ggfortify
  residuals.ar           ggfortify
hour_perdata$date<-as.Date(hour_perdata$date)#转为时间格式
library(lubridate)

hour_perdata$day<-day(hour_perdata$date)#提取天
hour_perdata$week<-weekdays(as.Date(hour_perdata$date))
head(hour_perdata)

A tibble: 6 × 102
date00:00-00:1500:15-00:3000:30-00:4500:45-01:0001:00-01:1501:15-01:3001:30-01:4501:45-02:0002:00-02:15...22:45-23:0023:00-23:1523:15-23:3023:30-23:4523:45-00:00dayweekweek01week02fes
<date><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>...<dbl><dbl><dbl><dbl><dbl><int><chr><dbl><dbl><dbl>
2020-10-011072.6391029.3411126.3941006.4681149.8251049.2461081.7041043.2981035.710...848.7361061.0421039.8321070.7161108.8381星期四001
2020-10-021053.2301061.4321071.5091066.8051064.5431082.8071077.7491059.8691061.881...840.2781080.1171047.2431068.4151046.1722星期五001
2020-10-031066.0821041.9771063.4211035.0971065.6151051.4901083.9081086.6581094.954...933.9591020.2491063.9681034.5391070.4603星期六011
2020-10-041020.1911019.6301024.692 998.3851051.1361027.0531039.1581028.7851043.948...944.6161009.187 990.4221063.6721011.3674星期日101
2020-10-051045.1601042.0541027.8711040.6561029.1981024.3411063.3391048.0331053.538...955.135 998.0241213.0501093.7031096.2475星期一001
2020-10-061135.0021150.8901084.8391116.2691113.4801106.4951085.2471134.3221125.822...913.5581072.3521091.4111104.9611133.1666星期二001
#将是否周末的信息二值化
hour_perdata$week01 <- ifelse(hour_perdata$week =="星期日" ,1,0)
hour_perdata$week02 <- ifelse(hour_perdata$week =="星期六" ,1,0)
#head(hour_perdata)
#将节假日二值化,比如10.1-10.8是国庆节
hour_perdata$fes <- ifelse(hour_perdata$day <=8 ,1,0)
head(hour_perdata)
A tibble: 6 × 102
date00:00-00:1500:15-00:3000:30-00:4500:45-01:0001:00-01:1501:15-01:3001:30-01:4501:45-02:0002:00-02:15...22:45-23:0023:00-23:1523:15-23:3023:30-23:4523:45-00:00dayweekweek01week02fes
<date><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>...<dbl><dbl><dbl><dbl><dbl><int><chr><dbl><dbl><dbl>
2020-10-011072.6391029.3411126.3941006.4681149.8251049.2461081.7041043.2981035.710...848.7361061.0421039.8321070.7161108.8381星期四001
2020-10-021053.2301061.4321071.5091066.8051064.5431082.8071077.7491059.8691061.881...840.2781080.1171047.2431068.4151046.1722星期五001
2020-10-031066.0821041.9771063.4211035.0971065.6151051.4901083.9081086.6581094.954...933.9591020.2491063.9681034.5391070.4603星期六011
2020-10-041020.1911019.6301024.692 998.3851051.1361027.0531039.1581028.7851043.948...944.6161009.187 990.4221063.6721011.3674星期日101
2020-10-051045.1601042.0541027.8711040.6561029.1981024.3411063.3391048.0331053.538...955.135 998.0241213.0501093.7031096.2475星期一001
2020-10-061135.0021150.8901084.8391116.2691113.4801106.4951085.2471134.3221125.822...913.5581072.3521091.4111104.9611133.1666星期二001
#write_csv(hour_perdata,"hour_perdata.csv")

①预测未来一天,各时段的电力负荷

#将时间表转置
hour_perdatat<-t(hour_perdata)
hour_perdatat<-as.data.frame(hour_perdatat)
colnames(hour_perdatat)<-hour_perdatat[1,]
hour_perdatat<-hour_perdatat[-1,]
#hour_perdatat$hour<-rownames(hour_perdatat)
#rownames(hour_perdatat)<-c(1:96)
#head(hour_perdatat)
#更改列名
rownames(hour_perdatat)<-hour_perdatat[,21]
head(hour_perdatat)
A data.frame: 6 × 21
2020-10-012020-10-022020-10-032020-10-042020-10-052020-10-062020-10-072020-10-082020-10-092020-10-10...2020-10-122020-10-132020-10-142020-10-152020-10-162020-10-172020-10-182020-10-192020-10-20hour
<chr><chr><chr><chr><chr><chr><chr><chr><chr><chr>...<chr><chr><chr><chr><chr><chr><chr><chr><chr><chr>
00:00-00:151072.6391053.2301066.0821020.1911045.1601135.0021137.5151152.5251067.732 994.568...1151.4241110.6971115.4001212.6651165.6431182.8251135.2291150.4231071.96600:00-00:15
00:15-00:301029.3411061.4321041.9771019.6301042.0541150.8901184.9081133.8511065.854 956.716...1199.7211162.5431110.8751141.0271207.8531162.5151105.9801180.0141121.97100:15-00:30
00:30-00:451126.3941071.5091063.4211024.6921027.8711084.8391106.9571172.472 995.550 950.803...1120.1321151.4711130.6771203.0131201.9451197.6361152.1691163.5451093.05300:30-00:45
00:45-01:001006.4681066.8051035.097 998.3851040.6561116.2691157.1461102.3311095.6031008.611...1159.1931169.9061062.5711165.5591151.6021166.3221193.8871174.5721171.42300:45-01:00
01:00-01:151149.8251064.5431065.6151051.1361029.1981113.4801092.4361127.4081033.743 961.246...1204.5881095.2891152.5651192.5921277.8481084.2281141.2211189.0061062.35101:00-01:15
01:15-01:301049.2461082.8071051.4901027.0531024.3411106.4951175.8561156.8871083.5121037.840...1198.1931125.7231120.4671232.8791196.5881169.7401212.2201192.1881125.76201:15-01:30
#write_csv(hour_perdatat,"hour_perdatat.csv")
#导入整理后数据
timedata<- read.csv("hour_perdatat.csv",,encoding='UTF-8')
head(timedata)
A data.frame: 6 × 2
hoursum
<chr><dbl>
100:00-00:151072.639
200:15-00:301029.341
300:30-00:451126.394
400:45-01:001006.468
501:00-01:151149.825
601:15-01:301049.246
timedata$sum<- ts(timedata$sum,start = timedata$sum[1],frequency = 96)
## 可视化序列
autoplot(timedata$sum)+ggtitle("电力负荷数量变化趋势")

在这里插入图片描述

auto.arima(timedata$sum)
Series: timedata$sum 
ARIMA(4,1,2)(0,1,0)[96] 

Coefficients:
          ar1      ar2      ar3      ar4      ma1      ma2
      -0.9814  -0.3626  -0.1371  -0.0666  -0.0126  -0.3784
s.e.   0.2001   0.1060   0.0771   0.0390   0.1994   0.1503

sigma^2 estimated as 3525:  log likelihood=-10028.98
AIC=20071.97   AICc=20072.03   BIC=20110.52
## 白噪声检验
Box.test(timedata$sum,type ="Ljung-Box")
	Box-Ljung test

data:  timedata$sum
X-squared = 1198.6, df = 1, p-value < 2.2e-16

p-value < 2.2e-16,说明不是白噪声

## 平稳性检验,单位根检验
adf.test(timedata$sum)
Warning message in adf.test(timedata$sum):
"p-value smaller than printed p-value"




	Augmented Dickey-Fuller Test

data:  timedata$sum
Dickey-Fuller = -8.312, Lag order = 12, p-value = 0.01
alternative hypothesis: stationary

p-value = 0.01,说明数据是平稳的

由之前自动建模可知适合的模型为:
Series: timedata$sum
ARIMA(4,1,2)(0,1,0)[96]

Coefficients:
ar1 ar2 ar3 ar4 ma1 ma2
-0.9814 -0.3626 -0.1371 -0.0666 -0.0126 -0.3784
s.e. 0.2001 0.1060 0.0771 0.0390 0.1994 0.1503

sigma^2 estimated as 3525: log likelihood=-10028.98
AIC=20071.97 AICc=20072.03 BIC=20110.52

## 对数据建立ARIMA(4,1,2)(0,1,0)[96]模型,并预测后面的数据

ARIMA <- arima(timedata$sum, c(4, 1, 2),
              seasonal = list(order = c(0, 1, 0),period = 96))
summary(ARIMA)
Call:
arima(x = timedata$sum, order = c(4, 1, 2), seasonal = list(order = c(0, 1, 
    0), period = 96))

Coefficients:
          ar1      ar2      ar3      ar4      ma1      ma2
      -0.9814  -0.3626  -0.1371  -0.0666  -0.0126  -0.3784
s.e.   0.2001   0.1060   0.0771   0.0390   0.1994   0.1503

sigma^2 estimated as 3513:  log likelihood = -10028.98,  aic = 20071.97

Training set error measures:
                     ME     RMSE      MAE        MPE     MAPE      MASE
Training set 0.09120194 57.75377 39.33659 -0.1172248 3.877619 0.9060558
                     ACF1
Training set 0.0008442811
Box.test(ARIMA$residuals,type ="Ljung-Box")
## p-value = 0.9705,此时,模型的残差已经是白噪声数据,数据中的信息已经充分的提取出来了
	Box-Ljung test

data:  ARIMA$residuals
X-squared = 0.0013707, df = 1, p-value = 0.9705

p-value = 0.9705,此时,模型的残差已经是白噪声数据,数据中的信息已经充分的提取出来了

# 可视化模型的预测值和这是值之间的差距
par(family = "STKaiti")
plot(forecast(ARIMA,h=96),shadecols="oldstyle")
points(timedata$sum,col = "red")
lines(timedata$sum,col = "red")

在这里插入图片描述

#输出未来一天的预测值
fore<-as.data.frame(forecast(ARIMA,h=96))
head(fore)
A data.frame: 6 × 5
Point ForecastLo 80Hi 80Lo 95Hi 95
<dbl><dbl><dbl><dbl><dbl>
1092.6391098.4811022.5231174.438 982.31391214.648
1092.6491155.9221079.9641231.8811039.75341272.091
1092.6601133.8671055.7421211.9921014.38501253.349
1092.6701206.4241126.3091286.5391083.89801328.950
1092.6811102.0901020.1721184.008 976.80671227.373
1092.6911161.5221077.0761245.9691032.37221290.673
label<-as.data.frame(rownames(hour_perdatat))
label<-label[c(-101,-100,-99,-98,-97),]
fore$label<-label
rownames(fore)<-fore$label
fore
A data.frame: 96 × 6
Point ForecastLo 80Hi 80Lo 95Hi 95label
<dbl><dbl><dbl><dbl><dbl><chr>
00:00-00:151098.48081022.52341174.438 982.31391214.64800:00-00:15
00:15-00:301155.92231079.96351231.8811039.75341272.09100:15-00:30
00:30-00:451133.86681055.74191211.9921014.38501253.34900:30-00:45
00:45-01:001206.42391126.30851286.5391083.89801328.95000:45-01:00
01:00-01:151102.08961020.17161184.008 976.80671227.37301:00-01:15
01:15-01:301161.52251077.07561245.9691032.37221290.67301:15-01:30
01:30-01:451191.80771105.80641277.8091060.28001323.33501:30-01:45
01:45-02:001127.94351039.79951216.088 993.13891262.74801:45-02:00
02:00-02:151120.84141030.98791210.695 983.42231258.26002:00-02:15
02:15-02:301163.51821071.77341255.2631023.20661303.83002:15-02:30
02:30-02:451118.05521024.55971211.551 975.06611261.04402:30-02:45
02:45-03:001135.71891040.47171230.966 990.05101281.38702:45-03:00
03:00-03:151141.99141045.02531238.957 993.69451290.28803:00-03:15
03:15-03:301124.36861025.72421223.013 973.50511275.23203:15-03:30
03:30-03:451165.71771065.41041266.0251012.31101319.12403:30-03:45
03:45-04:001168.66681066.73421270.5991012.77431324.55903:45-04:00
04:00-04:151117.27101013.73141220.811 958.92091275.62104:00-04:15
04:15-04:301159.81461054.69691264.932 999.05091320.57804:15-04:30
04:30-04:451108.26681001.59211214.941 945.12201271.41204:30-04:45
04:45-05:001143.08761034.87921251.296 977.59711308.57804:45-05:00
05:00-05:151116.90231007.18151226.623 949.09881284.70605:00-05:15
05:15-05:301167.56181056.34901278.775 997.47651337.64705:15-05:30
05:30-05:451194.98291082.29801307.6681022.64621367.32005:30-05:45
05:45-06:001125.40821011.27011239.546 950.84901299.96705:45-06:00
06:00-06:151178.46061062.88761294.0341001.70701355.21406:00-06:15
06:15-06:301131.31541014.32511248.306 952.39421310.23706:15-06:30
06:30-06:451181.85541063.46491300.2461000.79271362.91806:30-06:45
06:45-07:001130.17751010.40291249.952 946.99811313.35706:45-07:00
07:00-07:151026.6404 905.49771147.783 841.36871211.91207:00-07:15
07:15-07:30 962.4825 839.98691084.978 775.14171149.82307:15-07:30
.....................
16:30-16:451011.7984 846.87201176.725759.56521264.03216:30-16:45
16:45-17:00 951.7214 785.79881117.644697.96461205.47816:45-17:00
17:00-17:15 947.1024 780.18951114.015691.83111202.37417:00-17:15
17:15-17:30 980.5094 812.61201148.407723.73241237.28617:15-17:30
17:30-17:45 983.0574 814.18131151.934724.78361241.33117:30-17:45
17:45-18:00 979.1334 809.28421148.983719.37141238.89517:45-18:00
18:00-18:15 995.6674 824.85061166.484734.42571256.90918:00-18:15
18:15-18:301012.9484 841.16951184.727750.23531275.66218:15-18:30
18:30-18:451025.9084 853.17281198.644761.73211290.08518:30-18:45
18:45-19:001042.0334 868.34631215.721776.40191307.66518:45-19:00
19:00-19:151079.0614 904.42801253.695811.98271346.14019:00-19:15
19:15-19:301010.4204 834.84591185.995741.90231278.93919:15-19:30
19:30-19:451034.4144 857.90371210.925764.46461304.36419:30-19:45
19:45-20:00 997.5734 820.13151175.015726.19941268.94819:45-20:00
20:00-20:151036.5914 858.22311214.960763.80061309.38220:00-20:15
20:15-20:301030.1404 850.85051209.430755.94021304.34120:15-20:30
20:30-20:451031.2764 851.06971211.483755.67401306.87920:30-20:45
20:45-21:001014.4834 833.36441195.602737.48581291.48120:45-21:00
21:00-21:151060.6284 878.60181242.655782.24271339.01421:00-21:15
21:15-21:30 986.6754 803.74561169.605706.90851266.44221:15-21:30
21:30-21:451031.7954 847.96691215.624750.65401312.93721:30-21:45
21:45-22:001006.3614 821.63861191.084723.85221288.87121:45-22:00
22:00-22:151007.0594 821.44661192.672723.18901290.93022:00-22:15
22:15-22:30 992.1524 805.65381178.651706.92731277.37822:15-22:30
22:30-22:45 960.7194 773.33921148.100674.14611247.29322:30-22:45
22:45-23:001027.5134 839.25571215.771739.59811315.42922:45-23:00
23:00-23:151176.6744 987.54331365.806887.42341465.92623:00-23:15
23:15-23:301156.0004 965.99991346.001865.41971446.58123:15-23:30
23:30-23:451201.34741010.48151392.213909.44321493.25223:30-23:45
23:45-00:001162.6794 970.95201354.407869.45761455.90123:45-00:00

预测结果如上

②预测未来几天总体电力负荷

可结合周末,季节,月份,年份,节假日,温度等信息,进行建模,利用支持向量机,随机森林或线性模型等建立回归模型,对某天的总额进行预测建模。

由于本例数据为10.01-10.20,时间跨度短,且数据量较小,从理论上讲,预测结果不会太好,故考虑以其他方法进行建模,如回归分析,灰色预测等

#根据按天统计的电力负荷量可得
date_rowsum
plot(date_rowsum,type = "o", col = "red", xlab = "date", ylab = "sum",
   main = "date_sum")
A data.frame: 20 × 2
datesum
<date><dbl>
2020-10-01 93125.13
2020-10-02 89564.60
2020-10-03 89715.30
2020-10-04 91119.79
2020-10-05 95884.30
2020-10-06 95558.53
2020-10-07 97199.34
2020-10-08 97358.87
2020-10-09 91890.29
2020-10-10 94850.10
2020-10-11 95424.02
2020-10-12100501.56
2020-10-13100307.05
2020-10-14101899.14
2020-10-15105264.88
2020-10-16105435.29
2020-10-17100331.14
2020-10-18103870.77
2020-10-19 99513.48
2020-10-20 95456.16

在这里插入图片描述

在以“天”的维度上,由于国庆节假日影响,不能看出明显的规律,但能看出存在一定周期性,不适用回归模型和灰色预测。

如果需要精确到客户,则提取每个客户每天各时段的数据,套用上述模型即可得到。
但由于单个客户随机性极强,且个人用电较小,细致分析意义不大,可针对高耗电用户进行分析。其余从宏观角度预测以进行电力调配即可。


评论 5
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值