GWR(地理加权回归)预测分析中国各省份开关窗情况(R语言)

英文版链接:https://blog.csdn.net/qq_42686550/article/details/103882263
本文章,使用GIS中的GWR(地理加权回归)(Geographically weighted regression)来预测95个城市的楼宇开关窗情况.
有以下几个步骤:
气泡图,OLS,Moran’s I(莫兰系数)和GWR

气泡图

首先是制作气泡地图的代码 (R语言):

library(ggplot2)
library(ggmap)
library(maptools)
library(maps)
library(dplyr)
library(viridis)
library(mapproj)
library(readr)

China <- map_data("world") %>% filter(region=="China") #Region of China
myData <- read.csv('cw/ex.csv') # Read file
myBreaks <- c(0.1,0.3, 0.5,0.7, 1)

# Plot map
map <- 
  ggplot() +
  geom_polygon(data=China,aes(long, y = lat, group = group), fill="white", alpha=0.3) +
  geom_point(data=myData, aes(x=longitude, y = latitude, size=open.rate, color=open.rate, alpha=open.rate), shape=20, stroke=FALSE) +
  scale_size_continuous(name="Opening rate", trans="sqrt", range=c(1,12), breaks=myBreaks) +
  scale_alpha_continuous(name="Opening rate", trans="sqrt", range=c(0.1, 0.6), breaks=myBreaks) +
  scale_color_viridis(option="magma", trans="sqrt", breaks=myBreaks, name="Opening rate" ) + #Transformation "sqrt"
  theme_void()  + coord_map() +  xlim(70,149)+ ylim(18,55)+
  guides( colour = guide_legend()) +
  ggtitle("Window opening rate in China") +
  theme(
    legend.position = c(0.85, 0.8),
    text = element_text(color = "#f5f5f2"), # Choose colour
    plot.background = element_rect(fill = "#4e4d47", color = NA), 
    panel.background = element_rect(fill = "#4e4d47", color = NA), 
    legend.background = element_rect(fill = "#4e4d47", color = NA),
    plot.title = element_text(size= 16, hjust=0.1, color = "#f5f5f2", margin = margin(b = -0.1, t = 0.4, l = 2, unit = "cm")),
  )
plot(map)

在这里插入图片描述
气泡图展示了全国95个城市全年开关窗的百分比,有四个自变量(温度湿度等)和一个因变量(开窗率)

OLS模型

首先是我们需要的lib:

library(tmap)
library(readr)
library(rgdal)
library(ggplot2)
library(ggmap)
library(maptools)
library(maps)
library(dplyr)
library(viridis)
library(mapproj)
library(spdep)
library(car)
library(spgwr)

如果你需要下载中国的shapefile可以从这里下载.
首先来看一下传统的OLS模型的代码和效果:

chinaSHP <- readOGR("exe/gadm36_CHN_1.shp")# Read SHP file
dataWindow <- read_csv("cw/data.csv") # Read data
model <- lm(`open rate` ~ `Dry-bulb temperature`+`Wind speed`+`External relative humidity`
            +`Atmospheric pressure`,data=dataWindow) # OLS model
summary(model)
plot(model) # Plot result
durbinWatsonTest(model) #Autocorrelation
lm(formula = `open rate` ~ `Dry-bulb temperature` + `Wind speed` + 
    `External relative humidity` + `Atmospheric pressure`, data = dataWindow)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.153347 -0.035615  0.009812  0.031448  0.251990 

Coefficients:
                               Estimate Std. Error t value Pr(>|t|)    
(Intercept)                  -0.2491505  0.0600809  -4.147 7.62e-05 ***
`Dry-bulb temperature`        0.0569267  0.0024493  23.242  < 2e-16 ***
`Wind speed`                  0.0036693  0.0069773   0.526  0.60026    
`External relative humidity`  0.0005466  0.0005036   1.085  0.28060    
`Atmospheric pressure`       -0.0027743  0.0008336  -3.328  0.00127 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.06105 on 90 degrees of freedom
Multiple R-squared:  0.8957,	Adjusted R-squared:  0.8911 
F-statistic: 193.3 on 4 and 90 DF,  p-value: < 2.2e-16

可以看出有些变量 并不是显著的(风速等),但是这些变量是在OLS模型中的也就是在global模型中的,再局部模型中这些变量可能会变成显著的,因此在此我们将这些变量通通保留。这时我们再验证一下遍历将相关性是否独立:

 lag Autocorrelation D-W Statistic p-value
   1      0.06111587      1.846829   0.376
 Alternative hypothesis: rho != 0

可以看出D-W测试结果为1.84,似乎这是OK的。但是!!! 我们要用的是GWR,这个独立性可能在空间上是非独立的,因此我们需要检测残差在空间上的分布:

dataWindow$residuals <- residuals(model) 

#Function to plot residuals
bubblefunc <- function(myData,size,name,myBreaks){
  China <- map_data("world") %>% filter(region=="China")
  map <- 
    ggplot() +
    geom_polygon(data=China,aes(long, y = lat, group = group), fill="white", alpha=0.3) +
    geom_point(data=myData, aes(x=longitude, y = latitude, size=size, color=size, alpha=size), shape=20, stroke=FALSE) +
    scale_size_continuous(name=name,  trans="identity",range=c(1,12),breaks=myBreaks) +
    scale_alpha_continuous(name=name, trans="identity", range=c(0.1, 0.6),breaks=myBreaks) +
    scale_color_viridis(option="magma", trans="identity", name=name,breaks=myBreaks ) + 
    theme_void()  + coord_map() +  xlim(70,149)+ ylim(18,55)+
    guides( colour = guide_legend()) +
    ggtitle(paste("Window opening rate" ,name,"in China")) +
    theme(
      legend.position = c(0.85, 0.8),
      text = element_text(color = "#f5f5f2",size=15), # Choose colour
      plot.background = element_rect(fill = "#4e4d47", color = NA), 
      panel.background = element_rect(fill = "#4e4d47", color = NA), 
      legend.background = element_rect(fill = "#4e4d47", color = NA),
      plot.title = element_text(size= 16, hjust=0.1, color = "#f5f5f2", margin = margin(b = -0.1, t = 0.4, l = 2, unit = "cm")),
    )
  plot(map)
bubblefunc(dataWindow,dataWindow$residuals,"Residuals",c(-0.2,-0.1,0,0.1,0.3)) # Plot 

在这里插入图片描述
可以看出似乎南部地区的残差存在聚集性,也就是说高残差靠近附近的高残差,可能存在空间非独立性。这时我们用莫兰系数来测试一下.

Morans’I

xy=dataWindow[,c(13,14)]
chinasf <- st_as_sf(x=xy,coords = c("longitude", "latitude"))
chinasp <- as(chinasf, "Spatial") # Transform to "spatial"

coordsW <- coordinates(chinasp) # calculate the centroids of all Wards in London
plot(coordsW)
knn_wards <- knearneigh(coordsW, k=4)# nearest neighbours
LWard_knn <- knn2nb(knn_wards)
plot(LWard_knn, coordinates(coordsW), col="blue") #Plot
#create a spatial weights matrix object from  weight
Lward.knn_4_weight <- nb2listw(LWard_knn, style="C")
#moran's I test on the residuals
moran.test(dataWindow$residuals, Lward.knn_4_weight)

在这里插入图片描述

	Moran I test under randomisation

data:  dataWindow$residuals  
weights: Lward.knn_4_weight    

Moran I statistic standard deviate = 2.7069, p-value = 0.003396
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
      0.166649889      -0.010638298       0.004289717 

可以看出global Morans’ I为0.167. 莫兰系数一般在-1到1之间,0表示没用空间聚集。而0.167说明我们在这个模型中,有着较弱的空间相关性,因此我们需要使用GWR来解决这个问题。

GWR模型

#calculate kernel bandwidth
GWRbandwidth <- gwr.sel(`open rate` ~ `Dry-bulb temperature`+`Wind speed`+`External relative humidity`
                     +`Atmospheric pressure`,data=dataWindow,coords=coordsW,adapt=T)
#run the gwr model
GWRModel <- gwr(`open rate` ~ `Dry-bulb temperature`+`Wind speed`+`External relative humidity`
                +`Atmospheric pressure`,data=dataWindow,coords=coordsW,adapt=GWRbandwidth,
                hatmatrix = TRUE,se.fit = TRUE)
GWRModel #print the results of the model
results<-as.data.frame(GWRModel$SDF)
names(results)
gwr(formula = `open rate` ~ `Dry-bulb temperature` + 
    `Wind speed` + `External relative humidity` + 
    `Atmospheric pressure`, data = dataWindow, coords = coordsW, 
    adapt = GWRbandwidth, hatmatrix = TRUE, se.fit = TRUE)
Kernel function: gwr.Gauss 
Adaptive quantile: 0.0333945 (about 3 of 95 data points)
Summary of GWR coefficient estimates at data points:
                                     Min.     1st Qu.      Median     3rd Qu.        Max.  Global
X.Intercept.                  -1.55295044 -0.56679395 -0.30426784 -0.12435906  0.40684111 -0.2492
X.Dry.bulb.temperature.        0.02109119  0.05192010  0.06482929  0.06990573  0.08456590  0.0569
X.Wind.speed.                 -0.09409310 -0.00837505 -0.00171803  0.00572094  0.02819445  0.0037
X.External.relative.humidity. -0.00539565 -0.00085167  0.00093522  0.00260762  0.01414508  0.0005
X.Atmospheric.pressure.       -0.02081367 -0.00529484 -0.00299928  0.00036854  0.01054743 -0.0028
Number of data points: 95 
Effective number of parameters (residual: 2traceS - traceS'S): 55.28598 
Effective degrees of freedom (residual: 2traceS - traceS'S): 39.71402 
Sigma (residual: 2traceS - traceS'S): 0.03666693 
Effective number of parameters (model: traceS): 44.0667 
Effective degrees of freedom (model: traceS): 50.9333 
Sigma (model: traceS): 0.03237767 
Sigma (ML): 0.02370744 
AICc (GWR p. 61, eq 2.33; p. 96, eq. 4.21): -266.3886 
AIC (GWR p. 96, eq. 4.22): -397.3086 
Residual sum of squares: 0.05339408 
Quasi-global R2: 0.9834058 

可以看出,一些因变量存在正负数差异,并随着空间变化而变化,我们用地图来可视化一下。

dataWindow$coeTemperature <- results$X.Dry.bulb.temperature.
bubblefunc(dataWindow,dataWindow$coeTemperature,'coeTemperature',c(0.02,0.05, 0.06,0.07, 0.08))

在这里插入图片描述
可以看出,温度这个因变量再南部有着较高的比重。但是我们仍需考虑这些系数的显著性。在文本中,如果一个系数比标准大2倍,那么我们就认为此时这个系数具有显著性。

#statistically significant test
sigTest_Temp = abs(GWRModel$SDF$"`Dry-bulb temperature`") - 2 * GWRModel$SDF$"`Dry-bulb temperature`_se"
dataWindow$GWRTemp <- sigTest_Temp
bubblefunc(subset(dataWindow,dataWindow$GWRTemp>0),subset(dataWindow$coeTemperature,dataWindow$GWRTemp>0),'GWRTemp ',c(0.02,0.05, 0.06,0.07, 0.08))

在这里插入图片描述
看上去没有太大变化,说明大多数地区的温度变量都是显著的,此时同理我们再来看一下其他变量的分布。
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
很明显有些变量在一些地区并不是显著的,并且存在地区差异,因此我们认为在只考虑这四个因变量的时候,开关窗行为存在地区上的差异,并具体表现如上图所示。

Github链接:https://github.com/Connor666/Geographic-Information-Systems-and-Science-assessment

评论 8
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值