Functional Data Analysis Notes - 1: Introduction and Basis Expansion

阅读材料:Ramsay, J.O., Silverman, B.W. (2005) Functional Data Analysis (2nd Edition) Section 1.1, 1.2, 1.3, 1.4, 1.6.1

1 Functional data

Functional Data: Multivariate data with an ordering on the dimensions. (Muller, 2006)

  • Key assumption is smoothness:
    y i j = x i ( t i j ) + ϵ i j y_{ij} = x_i(t_{ij}) + \epsilon_{ij} yij=xi(tij)+ϵij
    with
    y i j y_{ij} yij: the ith individual in the jth time piont;
    t t t: in a continuum (usually time);
    x i ( t ) x_i(t) xi(t): functions for different i i i.
  • Highest quality data from monitoring equipment. Noiser and less frequent data can also be used.

Eg. Handwriting Data: Measures of position of nib of a pen writing “fda”. 20 replications, measurements taken at 200 Hz.
Handwriting Data

Fanctional data is complex.

  • Requires more thought/judgement than a t-test.
  • Data needs pre-processing.
  • Parametric inference is rarely available/appropriate.
  • Most approaches are computationally challenging.

FDA is relatively new.

  • First named in Dalzell & Ramsay, 1991.
  • Relatively little penetration into applied fields.
  • Several competing methodologies from different schools of thought (we focus on one).
  • Limited public software/resourses - changed over the last decade.
  • Initial development on data analysis rather than inference.

Necessities for Functional Data:

  • must believably derive from a smooth process;
  • process should not be easily parameterizable (should not be able to write down a formula);
  • enough data to resolve the essential features of the process (peaks, zero-crossings, speed… will depend on application);
  • some repetition in the process;
  • do not need equally spaced or perfect measurements.

2 From discrete to functional data

2.1 Two methods

  • Representing non-parametric continuous-time functions.
    • Basis-expansion methods:
      x ( t ) = ∑ i = 1 k ϕ i ( t ) c i x(t) = \sum_{i = 1}^k \phi_i (t) c_i x(t)=i=1kϕi(t)ci
      for pre-defined ϕ i ( t ) \phi_i (t) ϕi(t) and coefficient c i c_i ci.
    • Several basis systems available: focus on Fourior and B-splines.
  • Reducing noise in measurements.
    • Smoothing penalties:
      c = a r g m i n ∑ i = 1 n ( y i − x ( t i ) ) 2 + λ ∫ [ L x ( t ) ] 2 d t c = argmin \sum_{i = 1}^n (y_i - x(t_i))^2 + \lambda \int [Lx(t)]^2 dt c=argmini=1n(yix(ti))2+λ[Lx(t)]2dt
    • L x ( t ) Lx(t) Lx(t) measures “roughness” of x x x
    • λ \lambda λ: a “smoothing parameter” that trades-off fit to the y i y_i yi and roughness; must be chosen.

2.2 Basis expansions

Consider only one record

y i = x ( t i ) + ϵ y_i = x(t_i) + \epsilon yi=x(ti)+ϵ

representing

x ( t ) = ∑ j = 1 k c j ϕ j ( t i ) = Φ ( t ) T c x(t) = \sum_{j=1}^k c_j \phi_j (t_i) = \Phi (t)^T \mathbf{c} x(t)=j=1kcjϕj(ti)=Φ(t)Tc

Φ ( t ) \Phi (t) Φ(t) is a basis system for x x x.

Eg. Terms for curvature in linear regression

y i = β 0 + x i β 1 + x i 2 β 2 + x i 3 β 3 + . . . + ϵ i y_i = \beta_0 + x_i \beta_1 + x_i^2 \beta_2 + x_i^3 \beta_3 + ... + \epsilon_i yi=β0+xiβ1+xi2β2+xi3β3+...+ϵi

implies

x ( t ) = β 0 + x i β 1 + x i 2 β 2 + x i 3 β 3 + . . . x(t) = \beta_0 + x_i \beta_1 + x_i^2 \beta_2 + x_i^3 \beta_3 + ... x(t)=β0+xiβ1+xi2β2+xi3β3+...

Polynomials are unstable; Fourier basis and B-splines wil be more useful.

2.2.1 The Fourier basis

Add up these basis plots.

fourier-1

fourier-2

fourier-3

Advantages:

  • Only alternative to monomial bases until the middle of the 20th century;
  • Excellent computational properties, especially if the observations are equally spaced;
  • Natural for describing periodic data.

Disadvantages:

  • The functions are periodc which can be a problem for data with growth curves etc.

2.2.2 B-splines

P ( u ) = ∑ i = 0 k P i B i , j ( u ) ,       u ∈ [ u j − 1 , u k + 1 ] , P(u) = \sum_{i=0}^k P_i B_{i, j} (u), ~~~~~ u \in [u_{j-1}, u_{k+1}], P(u)=i=0kPiBi,j(u),     u[uj1,uk+1],

Splines ( B i , j B_{i, j} Bi,j): polynomial segmetns joined end-to-end.

  • Segments are constrained to be smooth at the join.

Knotes ( P i P_i Pi): the pointes at which the segments join.

The system is difined by

  • the order m m m (order = degree + 1) of the polynomial segments (splines);
  • the location of the knots.

b-splines-1

b-splines-2

b-splines-3

b-spline-local

Properties:

  • Number of basis functions = order ( m m m) + number of interior knots.
  • Derivatives up to m − 2 m - 2 m2 are continuous.
  • B-spline basis functions are positive over at most m m m adjacent intervals → \rightarrow fast computation for even thousands of basis functions.
  • Sum of all B-splines in a basis is always 1; can fit any polynomial of order m m m.
  • Most popular choice is order 4, implying continuous second derivatives. Second derivatives have straight-line segments.

Choosing the number of knots and order:

  • The order of the spline should be at least m + 2 m + 2 m+2 if you are interested in m m m derivatives.
  • Knots are often equally spaced (a useful default).
  • Place more knotes where you know there is strong curvature, and fewer where the function changes slowly.
  • Be sure there is at least one data point in every interval.

2.2.3 Other basis

The fda library in R also allows the following basis:

  • Constant: ϕ ( t ) = 1 \phi (t) = 1 ϕ(t)=1.
  • Power: t λ 1 , t λ 2 , t λ 3 , . . . t^{\lambda_1}, t^{\lambda_2}, t^{\lambda_3}, ... tλ1,tλ2,tλ3,..., powers are distinctbut not necessarily integers or positive.
  • Exponential: e λ 1 t , e λ 2 t , e λ 3 t , . . . e^{\lambda_1 t}, e^{\lambda_2 t}, e^{\lambda_3 t}, ... eλ1t,eλ2t,eλ3t,....

Other possible basis include:

  • Wavelets: especially for sharp, local features.
  • Empirical: functional principal components.
  • Designer: dynamic models, tailoring a basis to data can be more efficient.

2.2.4 Summary

  • Basis expansions: just like adding different independent variables in linear regression
  • Fourier basis: classical, common in signal processing etc. Great for periodic functions. Must be infinitely differentiable.
  • B-spline basis: locally polynomial. Allows control of smoothness and accuracy. Local definition ) good numerics.
  • Other basis systems also exist.
  • What is best depends on the data.

3 R example

3.1 New functions

We need the package "fda". (Package “fda”.pdf)

Functions usedDetials
matplot(matrixA, matrixB)matrixA would be on the x-axis while matrixB on the y-axis. The number of rows of matrixA and matrixB should match.
create.fourier.basis(vectorRange, numberBasis)Create Fourier basis
create.bspline.basis(vectorRange, numberBasis, numberOrder, breaks = numberKnots)Create B-spline basis
eval.basis(vectorTimepoint, basis)Evaluate the basis. Outcome is a vector with vectorTimepoint as rows and basis as colnumns.

library("fda")

# Plot the temperature data of the 35 cities
matplot(CanadianWeather$dailyAv[, , 1], , type = "l", xlab = "Days")

matplot

# Load the dataset "CanadianWeather"
data(CanadianWeather)

# Temperature and precipitation are contained in the dailyAV element, and we
# will extract them.
temp <- CanadianWeather$dailyAv[, , 1]
precip <- CanadianWeather$dailyAv[, , 2]

# We need corresponding time points. I'll put them half-way through a day. 
# This is because the period is over 0:365 and we'd like the 365 data points 
# to be about symmetric in that period.
daytime <- (1:365) - 0.5

# This is a bit fine for plotting purposes, so we'll also create a vector of
# points to use for plotting every 5 days
day5 <- seq(0, 365, 5)


# Plot these data
matplot(daytime, temp, type='l')
matplot(daytime, precip, type='l')

# We can also plot by region; Atlantic, Pacific, Central and North.
matplot(daytime, temp, type = 'l', col = as.factor(CanadianWeather$region))
matplot(daytime, precip, type = 'l',col = as.factor(CanadianWeather$region))
Temperature of the 35 cities
Precipitation of the 35 cities
Grouped by region
# Perform fda

# Define a basis system
# We'll always need the range.
dayrng = c(0, 365)

3.2 Fourier basis

# 1/ Fourier Basis with 365 basis functions
fbasis = create.fourier.basis(dayrng, 365)

# Plot a basis with just 5 components
plot(create.fourier.basis(dayrng, 5))

在这里插入图片描述

# Try a simple linear regression of the first temperature record on the first
# 20 basis functions.
fb.values <- eval.basis(day5, fbasis)
# The 74 by 365 matrix that results has days as rows, bases as columns.
dim(fb.values)
plot(day5, fb.values[, 1])
plot(day5, fb.values[, 2])
# Extract the first temperature record
ex.temp <- temp[, 1]
plot(ex.temp, main = "Daily Temperature of St. Johns")

ex.temp

# Run a linear regression on temperature with the first 5 basis functions.
# In this case I need to evaluate the basis at the observation times.
Xmat <- eval.basis(daytime, fbasis)
Xmat <- Xmat[, 1:5]  # First 5 basis functions
matplot(Xmat, type = "l")

Xmat

ex.temp.mod <- lm(ex.temp ~ Xmat)

# Plot the fitted values
plot(daytime, ex.temp)
lines(daytime, ex.temp.mod$fitted, col = 2, lwd = 2)
title(main = "Daily Temperature of St. Johns with smoothing (5 basis functions")

fit_5

# Check the residuals
plot(daytime, ex.temp.mod$resid)

resid

There’s some clear autocorrelation. More basis functions may be warranted.
Repeat the above with different numbers of basis functions (say 20, 50, 100, 200, 365) to find the one give a reasonable smooth.

Xmat <- eval.basis(daytime, fbasis)
Xmat <- Xmat[, 1:20]  # First 20 basis functions
ex.temp.mod <- lm(ex.temp ~ Xmat)

# Plot the fitted values
plot(daytime, ex.temp)
lines(daytime, ex.temp.mod$fitted, col = 2, lwd = 2)
title(main = "Daily Temperature of St. Johns with smoothing (20 basis functions)")

fit_20

# Check the residuals
plot(daytime, ex.temp.mod$resid)

resid_20

3.3 B-splines

# 2/ B-spline bases with knots every 5 days

# First of all define a knot sequence; this will be the same as day5.
knots <- day5

# We'll use fourth-order B-splines
norder <- 4

# This implies the number of basis functions
nbasis = length(knots) + norder - 2

# Define the basis
bbasis <- create.bspline.basis(dayrng, nbasis, norder, knots)

# If in doubt, we can obtain
bbasis$nbasis  # number of basis functions
bbasis$rangeval  # basis range

# Plot the basis
plot(bbasis)

bbasis

# We can look at a smaller number
plot(create.bspline.basis(dayrng, nbasis = 12, norder))

bbasis-12

# We can also look at the inner product of these
in.mat = inprod(bbasis, bbasis)
par(mfrow=c(1, 1))
image(in.mat)

inner_product

It is zero outside of a diagonal band This can help computation a great deal.

Change the order of the basis and observe how the width of the support of the basis changes and how its smoothness properties change.

bbasis <- create.bspline.basis(dayrng, nbasis = 20, norder)
# Run a linear regression on temperature with the first 20 basis functions. 
# In this case I need to evaluate the basis at the observation times.
Xmat <- eval.basis(daytime, bbasis)
Xmat <- Xmat[, 1:20]  # First 20 basis functions
matplot(Xmat, type = "l")

Xmat_b

ex.temp.mod <- lm(ex.temp ~ Xmat)

# Plot the fitted values
plot(daytime, ex.temp)
lines(daytime, ex.temp.mod$fitted, col = 2, lwd = 2)
title(main="Daily Temperature of St. Johns with smoothing")

b_fitted

# Check the residuals
plot(daytime,ex.temp.mod$resid)

b_residual

  • 4
    点赞
  • 10
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
1 Hahn-Banach Theorems and Introduction to Convex Conjugation 1 1.1 Hahn-Banach Theorem - Analytic Form . . . . . . . . . . . . . . 1 1.1.1 Theorems on Extension of Linear Functionals . . . . . . . 1 1.1.2 Applications of the Hahn-Banach Theorem . . . . . . . . 3 1.2 Hahn-Banach Theorems - Geometric Versions . . . . . . . . . . . 5 1.2.1 Definitions and Preliminaries . . . . . . . . . . . . . . . . 5 1.2.2 Separation of a Point and a Convex Set . . . . . . . . . . 6 1.2.3 Applications (Krein-Milman Theorem) . . . . . . . . . . . 8 1.3 Introduction to the Theory of Convex Conjugate Functions . . . 9 2 Baire Category Theorem and Its Applications 13 2.1 Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.1.1 Reminders on Banach Spaces . . . . . . . . . . . . . . . . 13 2.1.2 Bounded Linear Transformations . . . . . . . . . . . . . . 13 2.1.3 Duals and Double Duals . . . . . . . . . . . . . . . . . . . 15 2.2 The Baire Category Theorem . . . . . . . . . . . . . . . . . . . . 16 2.3 The Uniform Boundedness Principle . . . . . . . . . . . . . . . . 17 2.4 The Open Mapping Theorem and Closed Graph Theorem . . . . 18 3 Weak Topology 21 3.1 General Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.2 Frechet Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.3 Weak Topology in Banach Spaces . . . . . . . . . . . . . . . . . . 24 3.4 Weak-* Topologies (X,X) . . . . . . . . . . . . . . . . . . . . 28 3.5 Reflexive Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.6 Separable Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.7 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.7.1 Lp Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.7.2 PDE’s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4 Bounded (Linear) Operators and Spectral Theory 37 4.1 Topologies on Bounded Operators . . . . . . . . . . . . . . . . . 37 4.2 Adjoint . . .
Scientists today collect samples of curves and other functional observations. This monograph presents many ideas and techniques for such data. Included are expressions in the functional domain of such classics as linear regression, principal components analysis, linear modelling, and canonical correlation analysis, as well as specifically functional techniques such as curve registration and principal differential analysis. Data arising in real applications are used throughout for both motivation and illustration, showing how functional approaches allow us to see new things, especially by exploiting the smoothness of the processes generating the data. The data sets exemplify the wide scope of functional data analysis; they are drwan from growth analysis, meterology, biomechanics, equine science, economics, and medicine.The book presents novel statistical technology while keeping the mathematical level widely accessible. It is designed to appeal to students, to applied data analysts, and to experienced researchers; it will have value both within statistics and across a broad spectrum of other fields. Much of the material is based on the authors' own work, some of which appears here for the first time.Jim Ramsay is Professor of Psychology at McGill University and is an international authority on many aspects of multivariate analysis. He draws on his collaboration with researchers in speech articulation, motor control, meteorology, psychology, and human physiology to illustrate his technical contributions to functional data analysis in a wide range of statistical and application journals.Bernard Silverman, author of the highly regarded "Density Estimation for Statistics and Data Analysis," and coauthor of "Nonparametric Regression and Generalized Linear Models: A Roughness Penalty Approach," is Professor of Statistics at Bristol University. His published work on smoothing methods and other aspects of applied, computational, and theoretical statistics has been recognized by the Presidents' Award of the Committee of Presidents of Statistical Societies, and the award of two Guy Medals by the Royal Statistical Society.

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值