The Model Complexity Myth

The Model Complexity Myth

(or, Yes You Can Fit Models With More Parameters Than Data Points)

An oft-repeated rule of thumb in any sort of statistical model fitting is "you can't fit a model with more parameters than data points". This idea appears to be as wide-spread as it is incorrect. On the contrary, if you construct your models carefully, you can fit models with more parameters than datapoints, and this is much more than mere trivia with which you can impress the nerdiest of your friends: as I will show here, this fact can prove to be very useful in real-world scientific applications.

A model with more parameters than datapoints is known as an under-determined system, and it's a common misperception that such a model cannot be solved in any circumstance. In this post I will consider this misconception, which I call the model complexity myth. I'll start by showing where this model complexity myth holds true, first from from an intuitive point of view, and then from a more mathematically-heavy point of view. I'll build from this mathematical treatment and discuss how underdetermined models may be addressed from a frequentist standpoint, and then from a Bayesian standpoint. (If you're unclear about the general differences between frequentist and Bayesian approaches, I might suggest reading my posts on the subject). Finally, I'll discuss some practical examples of where such an underdetermined model can be useful, and demonstrate one of these examples: quantitatively accounting for measurement biases in scientific data.

The Root of the Model Complexity Myth

While the model complexity myth is not true in general, it is true in the specific case of simple linear models, which perhaps explains why the myth is so pervasive. In this section I first want to motivate the reason for the underdetermination issue in simple linear models, first from an intuitive view, and then from a more mathematical view.

I'll start by defining some functions to create plots for the examples below; you can skip reading this code block for now:

In [1]:
# Code to create figures
%matplotlib inline
import matplotlib.pyplot as plt import numpy as np plt.style.use('ggplot') def plot_simple_line(): rng = np.random.RandomState(42) x = 10 * rng.rand(20) y = 2 * x + 5 + rng.randn(20) p = np.polyfit(x, y, 1) xfit = np.linspace(0, 10) yfit = np.polyval(p, xfit) plt.plot(x, y, 'ok') plt.plot(xfit, yfit, color='gray') plt.text(9.8, 1, "y = {0:.2f}x + {1:.2f}".format(*p), ha='right', size=14); def plot_underdetermined_fits(p, brange=(-0.5, 1.5), xlim=(-3, 3), plot_conditioned=False): rng = np.random.RandomState(42) x, y = rng.rand(2, p).round(2) xfit = np.linspace(xlim[0], xlim[1]) for r in rng.rand(20): # add a datapoint to make model specified b = brange[0] + r * (brange[1] - brange[0]) xx = np.concatenate([x, [0]]) yy 
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值