MTHM506统计建模

本文介绍了统计建模中的Generalised Additive Models(GAMs),作为Generalised Linear Models(GLMs)的扩展。GAMs允许使用未知的平滑函数来建模数据的均值。通过mgcv包在R中实现GAMs,并通过实例展示了如何选择合适的基函数数量(如使用多项式或立方样条)以及如何用惩罚似然来防止过拟合。此外,还讨论了如何使用交叉验证选择最佳的平滑参数λ,并解释了如何在GAMs中进行推断,包括有效自由度的概念。
摘要由CSDN通过智能技术生成

MTHM506/COMM511 - Statistical Data Modelling

Topic 3 - Introduction

Preliminaries

In this session, we will start Topic 3 and introduce Generalised Additive Models, another, more flexible class

of models that are often seen as an extension to Generalised Linear Models. These notes refer to Topics

3.1-3.8 from the lecture notes. In this session, we need the mgcv package to help us fit Generalised Additive

Models. We use the install.packages() function to download and install the most recent package and use

the library() function to load them into the R library.

# Installing required packages

install.packages("mgcv")

# Loading required packages into the library

library(mgcv)

Introduction

In Topic 2, we learned about Generalised Linear Models (GLMs), a new framework in order to build more

general models and model different data types . In Topic 3, we extend the GLM framework so that we can

model the mean function using smooth functions of the covariates. Let’s formalise this in a similar way to

GLMs. Generalised Additive Modelling (GAM) will have a response variable, Yi which again come from the

exponential family of distributions

Yi ~ EF (θi, φ)

Examples that we have seen of exponential family distributions are Normal, Binomial, Poisson, Negative

Binomial, Exponential and Gamma. Remember, θi is called the location parameter and φ is called the

scale/dispersion parameter. The location parameter relates to the mean of the distributions in this family

and the dispersion relates to the variance. Again we will see that the variance will not be independent of the

mean (see Slides 4-5 in Topic 2 Notes). We’re working within probability distributions for which there might

be a potential mean-variance relationship, therefore the variance is a scaled function of the mean.

In GLMs we specified a function of the mean E(Yi) = μi of the following

g(μi) = ηi = β0 + β1x1,i + · · ·+ βpxp,i

where ηi is called the linear predictor (the part of the model we relate the response yi and the covariates xi).

It relates to the mean of the distribution μi through a function g(·), the “link-function”.

Now, in GAMs, we want to replace this linear predictor with a series of unknown functions of our parameters

g(μi) = ηi =

p∑

i

fp(xp,i)

where fp(·) are a series of unknown (smooth, continuous) functions of our covariates xpi . The idea of GAMs

is that we want to fit these unknown functions and not individual parameters (β). The easiest way to do this

is express our functions fp(·) in a linear way using basis functions. We have seen an example (see Poisson

1

GLMs) where we fit a polynomial function as our linear predictor so a function fp(·) with a polynomial basis

function would look like:

fp(xi) = β0 + β1xi + β2x2i + β3x3i + . . .+ βqx

More generally we write fp(·) as a linearly or as a sum of basis functions bj(·)

fp(xi) = βp,0 +

q∑

j=1

βp,jbp,j(xi)

where b1(xi) = xi, b2(xi) = x2i , . . ., bj(xi) = x

j

i . There are certain questions that arise from this. Is this the

most sensible way of doing this? And how do you decide what q (the number of basis functions) are? Can we

do this as part of the inference? Let’s consider it with an example on some simulated data:

# We wil

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值