MTHM506统计建模

最新推荐文章于 2024-08-09 17:34:06 发布

SFRGT

最新推荐文章于 2024-08-09 17:34:06 发布

阅读量89

点赞数

文章标签： python Powered by 金山文档

本文链接：https://blog.csdn.net/SFRGT/article/details/129532333

版权

本文介绍了统计建模中的Generalised Additive Models（GAMs），作为Generalised Linear Models（GLMs）的扩展。GAMs允许使用未知的平滑函数来建模数据的均值。通过mgcv包在R中实现GAMs，并通过实例展示了如何选择合适的基函数数量（如使用多项式或立方样条）以及如何用惩罚似然来防止过拟合。此外，还讨论了如何使用交叉验证选择最佳的平滑参数λ，并解释了如何在GAMs中进行推断，包括有效自由度的概念。

摘要由CSDN通过智能技术生成

MTHM506/COMM511 - Statistical Data Modelling

Topic 3 - Introduction

Preliminaries

In this session, we will start Topic 3 and introduce Generalised Additive Models, another, more flexible class

of models that are often seen as an extension to Generalised Linear Models. These notes refer to Topics

3.1-3.8 from the lecture notes. In this session, we need the mgcv package to help us fit Generalised Additive

Models. We use the install.packages() function to download and install the most recent package and use

the library() function to load them into the R library.

# Installing required packages

install.packages("mgcv")

# Loading required packages into the library

library(mgcv)

Introduction

In Topic 2, we learned about Generalised Linear Models (GLMs), a new framework in order to build more

general models and model different data types . In Topic 3, we extend the GLM framework so that we can

model the mean function using smooth functions of the covariates. Let’s formalise this in a similar way to

GLMs. Generalised Additive Modelling (GAM) will have a response variable, Yi which again come from the

exponential family of distributions

Yi ～ EF (θi, φ)

Examples that we have seen of exponential family distributions are Normal, Binomial, Poisson, Negative

Binomial, Exponential and Gamma. Remember, θi is called the location parameter and φ is called the

scale/dispersion parameter. The location parameter relates to the mean of the distributions in this family

and the dispersion relates to the variance. Again we will see that the variance will not be independent of the

mean (see Slides 4-5 in Topic 2 Notes). We’re working within probability distributions for which there might

be a potential mean-variance relationship, therefore the variance is a scaled function of the mean.

In GLMs we specified a function of the mean E(Yi) = μi of the following

g(μi) = ηi = β0 + β1x1,i + · · ·+ βpxp,i

where ηi is called the linear predictor (the part of the model we relate the response yi and the covariates xi).

It relates to the mean of the distribution μi through a function g(·), the “link-function”.

Now, in GAMs, we want to replace this linear predictor with a series of unknown functions of our parameters

g(μi) = ηi =

p∑

fp(xp,i)

where fp(·) are a series of unknown (smooth, continuous) functions of our covariates xpi . The idea of GAMs

is that we want to fit these unknown functions and not individual parameters (β). The easiest way to do this

is express our functions fp(·) in a linear way using basis functions. We have seen an example (see Poisson

GLMs) where we fit a polynomial function as our linear predictor so a function fp(·) with a polynomial basis

function would look like:

fp(xi) = β0 + β1xi + β2x2i + β3x3i + . . .+ βqx

More generally we write fp(·) as a linearly or as a sum of basis functions bj(·)

fp(xi) = βp,0 +

q∑

j=1

βp,jbp,j(xi)

where b1(xi) = xi, b2(xi) = x2i , . . ., bj(xi) = x

i . There are certain questions that arise from this. Is this the

most sensible way of doing this? And how do you decide what q (the number of basis functions) are? Can we

do this as part of the inference? Let’s consider it with an example on some simulated data:

# We wil

最低0.47元/天解锁文章

SFRGT

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
MTHM506统计建模

MTHM506/COMM511 - Statistical Data Modelling
复制链接

扫一扫