A GWR approach to find influencing factors of window opening in different regions of China (in R)

中文版会在我写完之后发布,现在只有英文版.
请勿转载,谢谢
中文版链接:https://blog.csdn.net/qq_42686550/article/details/104350221

This tutorial is fully made by author, do not reprint without permit
This tutorial aims to use GWR to find the coefficients of influcing factors of window opening in China. It has been divided into these parts :
Bubble map, OLS model, Moran’s I, GWR

Bubble map

The bubble map code in R is shown in the bellow:

library(ggplot2)
library(ggmap)
library(maptools)
library(maps)
library(dplyr)
library(viridis)
library(mapproj)
library(readr)

China <- map_data("world") %>% filter(region=="China") #Region of China
myData <- read.csv('cw/ex.csv') # Read file
myBreaks <- c(0.1,0.3, 0.5,0.7, 1)

# Plot map
map <- 
  ggplot() +
  geom_polygon(data=China,aes(long, y = lat, group = group), fill="white", alpha=0.3) +
  geom_point(data=myData, aes(x=longitude, y = latitude, size=open.rate, color=open.rate, alpha=open.rate), shape=20, stroke=FALSE) +
  scale_size_continuous(name="Opening rate", trans="sqrt", range=c(1,12), breaks=myBreaks) +
  scale_alpha_continuous(name="Opening rate", trans="sqrt", range=c(0.1, 0.6), breaks=myBreaks) +
  scale_color_viridis(option="magma", trans="sqrt", breaks=myBreaks, name="Opening rate" ) + #Transformation "sqrt"
  theme_void()  + coord_map() +  xlim(70,149)+ ylim(18,55)+
  guides( colour = guide_legend()) +
  ggtitle("Window opening rate in China") +
  theme(
    legend.position = c(0.85, 0.8),
    text = element_text(color = "#f5f5f2"), # Choose colour
    plot.background = element_rect(fill = "#4e4d47", color = NA), 
    panel.background = element_rect(fill = "#4e4d47", color = NA), 
    legend.background = element_rect(fill = "#4e4d47", color = NA),
    plot.title = element_text(size= 16, hjust=0.1, color = "#f5f5f2", margin = margin(b = -0.1, t = 0.4, l = 2, unit = "cm")),
  )
plot(map)

在这里插入图片描述
The result of the data distribution. The dataset consists of 95 cities with window opening rate (dependent variable), and four other variables.

OLS model

First the library in R we need:

library(tmap)
library(readr)
library(rgdal)
library(ggplot2)
library(ggmap)
library(maptools)
library(maps)
library(dplyr)
library(viridis)
library(mapproj)
library(spdep)
library(car)
library(spgwr)

If you need to download the shapefile of China, you can find from here.
First let’s look at the OLS model:

chinaSHP <- readOGR("exe/gadm36_CHN_1.shp")# Read SHP file
dataWindow <- read_csv("cw/data.csv") # Read data
model <- lm(`open rate` ~ `Dry-bulb temperature`+`Wind speed`+`External relative humidity`
            +`Atmospheric pressure`,data=dataWindow) # OLS model
summary(model)
plot(model) # Plot result
durbinWatsonTest(model) #Autocorrelation
lm(formula = `open rate` ~ `Dry-bulb temperature` + `Wind speed` + 
    `External relative humidity` + `Atmospheric pressure`, data = dataWindow)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.153347 -0.035615  0.009812  0.031448  0.251990 

Coefficients:
                               Estimate Std. Error t value Pr(>|t|)    
(Intercept)                  -0.2491505  0.0600809  -4.147 7.62e-05 ***
`Dry-bulb temperature`        0.0569267  0.0024493  23.242  < 2e-16 ***
`Wind speed`                  0.0036693  0.0069773   0.526  0.60026    
`External relative humidity`  0.0005466  0.0005036   1.085  0.28060    
`Atmospheric pressure`       -0.0027743  0.0008336  -3.328  0.00127 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.06105 on 90 degrees of freedom
Multiple R-squared:  0.8957,	Adjusted R-squared:  0.8911 
F-statistic: 193.3 on 4 and 90 DF,  p-value: < 2.2e-16

It can be seen that some variable (wind speed,humidity) is not significant,the p-value is higher than 0.05. However, these variable is kept because these variable may not significant in overall model, but significant in each local model in GWR.

 lag Autocorrelation D-W Statistic p-value
   1      0.06111587      1.846829   0.376
 Alternative hypothesis: rho != 0

DW statistics for our model is 1.84, it seems that it shoud be fine. However!!!
WE ARE USING GWR
Thus, there may have spatial-autocorrelation. Let’s have a look:

dataWindow$residuals <- residuals(model) 

#Function to plot residuals
bubblefunc <- function(myData,size,name,myBreaks){
  China <- map_data("world") %>% filter(region=="China")
  map <- 
    ggplot() +
    geom_polygon(data=China,aes(long, y = lat, group = group), fill="white", alpha=0.3) +
    geom_point(data=myData, aes(x=longitude, y = latitude, size=size, color=size, alpha=size), shape=20, stroke=FALSE) +
    scale_size_continuous(name=name,  trans="identity",range=c(1,12),breaks=myBreaks) +
    scale_alpha_continuous(name=name, trans="identity", range=c(0.1, 0.6),breaks=myBreaks) +
    scale_color_viridis(option="magma", trans="identity", name=name,breaks=myBreaks ) + 
    theme_void()  + coord_map() +  xlim(70,149)+ ylim(18,55)+
    guides( colour = guide_legend()) +
    ggtitle(paste("Window opening rate" ,name,"in China")) +
    theme(
      legend.position = c(0.85, 0.8),
      text = element_text(color = "#f5f5f2",size=15), # Choose colour
      plot.background = element_rect(fill = "#4e4d47", color = NA), 
      panel.background = element_rect(fill = "#4e4d47", color = NA), 
      legend.background = element_rect(fill = "#4e4d47", color = NA),
      plot.title = element_text(size= 16, hjust=0.1, color = "#f5f5f2", margin = margin(b = -0.1, t = 0.4, l = 2, unit = "cm")),
    )
  plot(map)
}
bubblefunc(dataWindow,dataWindow$residuals,"Residuals",c(-0.2,-0.1,0,0.1,0.3)) # Plot residuals

在这里插入图片描述
The south part of China has higher residual and the pink areas are close to other pink areas. In other words, we may have some spatial autocorrelation. Let’s use moran’s I to test more systematically.

Moran’s I

xy=dataWindow[,c(13,14)]
chinasf <- st_as_sf(x=xy,coords = c("longitude", "latitude"))
chinasp <- as(chinasf, "Spatial") # Transform to "spatial"

coordsW <- coordinates(chinasp) # calculate the centroids of all Wards in London
plot(coordsW)
knn_wards <- knearneigh(coordsW, k=4)# nearest neighbours
LWard_knn <- knn2nb(knn_wards)
plot(LWard_knn, coordinates(coordsW), col="blue") #Plot
#create a spatial weights matrix object from  weight
Lward.knn_4_weight <- nb2listw(LWard_knn, style="C")
#moran's I test on the residuals
moran.test(dataWindow$residuals, Lward.knn_4_weight)

在这里插入图片描述

	Moran I test under randomisation

data:  dataWindow$residuals  
weights: Lward.knn_4_weight    

Moran I statistic standard deviate = 2.7069, p-value = 0.003396
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
      0.166649889      -0.010638298       0.004289717 

Moran I is within range of -1 to 1, and 0 means no spatial autocorrelation. Here, Moran’s I is 0.167, which implies that we may have some weak spatial autocorrelation in our residuals. Therefore, GWR is used to address this problem.

GWR model

#calculate kernel bandwidth
GWRbandwidth <- gwr.sel(`open rate` ~ `Dry-bulb temperature`+`Wind speed`+`External relative humidity`
                     +`Atmospheric pressure`,data=dataWindow,coords=coordsW,adapt=T)
#run the gwr model
GWRModel <- gwr(`open rate` ~ `Dry-bulb temperature`+`Wind speed`+`External relative humidity`
                +`Atmospheric pressure`,data=dataWindow,coords=coordsW,adapt=GWRbandwidth,
                hatmatrix = TRUE,se.fit = TRUE)
GWRModel #print the results of the model
results<-as.data.frame(GWRModel$SDF)
names(results)
gwr(formula = `open rate` ~ `Dry-bulb temperature` + 
    `Wind speed` + `External relative humidity` + 
    `Atmospheric pressure`, data = dataWindow, coords = coordsW, 
    adapt = GWRbandwidth, hatmatrix = TRUE, se.fit = TRUE)
Kernel function: gwr.Gauss 
Adaptive quantile: 0.0333945 (about 3 of 95 data points)
Summary of GWR coefficient estimates at data points:
                                     Min.     1st Qu.      Median     3rd Qu.        Max.  Global
X.Intercept.                  -1.55295044 -0.56679395 -0.30426784 -0.12435906  0.40684111 -0.2492
X.Dry.bulb.temperature.        0.02109119  0.05192010  0.06482929  0.06990573  0.08456590  0.0569
X.Wind.speed.                 -0.09409310 -0.00837505 -0.00171803  0.00572094  0.02819445  0.0037
X.External.relative.humidity. -0.00539565 -0.00085167  0.00093522  0.00260762  0.01414508  0.0005
X.Atmospheric.pressure.       -0.02081367 -0.00529484 -0.00299928  0.00036854  0.01054743 -0.0028
Number of data points: 95 
Effective number of parameters (residual: 2traceS - traceS'S): 55.28598 
Effective degrees of freedom (residual: 2traceS - traceS'S): 39.71402 
Sigma (residual: 2traceS - traceS'S): 0.03666693 
Effective number of parameters (model: traceS): 44.0667 
Effective degrees of freedom (model: traceS): 50.9333 
Sigma (model: traceS): 0.03237767 
Sigma (ML): 0.02370744 
AICc (GWR p. 61, eq 2.33; p. 96, eq. 4.21): -266.3886 
AIC (GWR p. 96, eq. 4.22): -397.3086 
Residual sum of squares: 0.05339408 
Quasi-global R2: 0.9834058 

It can be seen that the value of coefficient of each variable is different in each localation. sometimes it’s positive and sometimes it’s negative. Let’s make a map for temperature this variable:

dataWindow$coeTemperature <- results$X.Dry.bulb.temperature.
bubblefunc(dataWindow,dataWindow$coeTemperature,'coeTemperature',c(0.02,0.05, 0.06,0.07, 0.08))

在这里插入图片描述
The south part of region has higher coefficient than other areas. However, we still need to take significance into account. In this study, if a coefficient estimate is more than 2 standard errors away from zero, then it is “statistically significant”.

#statistically significant test
sigTest_Temp = abs(GWRModel$SDF$"`Dry-bulb temperature`") - 2 * GWRModel$SDF$"`Dry-bulb temperature`_se"
dataWindow$GWRTemp <- sigTest_Temp
bubblefunc(subset(dataWindow,dataWindow$GWRTemp>0),subset(dataWindow$coeTemperature,dataWindow$GWRTemp>0),'GWRTemp ',c(0.02,0.05, 0.06,0.07, 0.08))

在这里插入图片描述
A lot of varables in temperature seems signficant. Similary, let’s look at other varables:
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
It can be seen that the coefficient have significant different in other varibales. Therefore, the influencing factors of window have spatial variations.

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
GWR是地理加权回归(Geographically Weighted Regression)的缩写,是一种用于分析空间数据的地理统计模型。它考虑了数据在空间上的非平稳性和异质性,能够更准确地揭示变量之间的空间关系。 在R语言中,我们可以使用gwr包来实现GWR模型的随机模拟。首先,我们需要安装gwr包并加载它。然后,我们可以准备我们的空间数据,并选择适当的解释变量和响应变量。 接下来,我们可以使用gwr函数来进行GWR模型的拟合。该函数需要输入解释变量、响应变量和空间坐标数据。我们可以选择一些参数,如核函数类型和带宽选择方法,来完善我们的模型。然后,我们可以运行该函数并得到估计的模型。 在我们得到模型之后,我们可以使用summary函数来查看模型的摘要信息,例如每个变量的估计系数、标准误差和显著性水平。此外,我们还可以通过绘制GWR估计的地图来可视化模型的结果,观察变量在空间上的变化和影响。 需要注意的是,GWR模型的拟合和解释比传统的全局回归模型更复杂,因为它考虑了空间的影响。因此,在使用GWR模型进行分析时,我们需要谨慎地选择参数和解释结果,以获得合理和可靠的结论。 总之,GWR地理加权模型是一种有效的地理统计方法,可以通过R语言gwr包进行随机模拟。通过合理选择参数和解释结果,我们可以利用GWR模型更好地理解变量之间的空间关系。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值