Are regression models useful?

Lots of businesses try using regression models. The basic idea is great – use the value of some variable(s) which you may be able to control to predict the value of another variable [most likely a metric] for which you’d like to optimize (probably get as high as possible).

analytics - regression model scatter plot

In it’s basic form – a regression model is simply the equation for a line that goes onto a scatter plot. Each value represents a an instance of 1 variable’s response given the state of another variable (in a single regression). For a multiple regression you can’t really see a scatter plot because you have multiple inputs represented.

The special thing about the line is that it’s the line that minimizes the total space between the dots and the line.

regression least squares

Above – think of the green dots as actual points where 1 variable met the other (eg sales was $5,000 and marketing spend was $2,000 – that’s one dot where sales and spend are on different axis). The maroon line minimizes the distance between the points and the said line. How does it do this you wonder?

http://mathworld.wolfram.com/LeastSquaresFitting.html

Well – you need some partial derivatives and fairly sophisticated linear algebra. I doubt most applied math professors can derive the equations without referring to them beforehand. In reality you use MS Excel or R, or SAS or some other statistical analysis software. Below is a good video by Khan Academy that gives some good grounding to the concept.

So great – why aren’t we using regressions to predict everything about any business, precisely forecast sales, budgets and resource allocation and the whole 9 yards?

Well – think about that line for a second. How many points are actually on the line? There’s a good chance that in a fairly sparse plot, only a handful are. If you spend $2,000 on marketing and only come up with $4,000 in sales instead of $5,000…the model can say nothing about this shortfall. If the model is only giving you estimates – might you fare better if you just looked at what happened in the recent past and made an educated guess?

PROBLEMS WITH REGRESSION MODELS:

  1. The relationship between the variables must be linear – you’re not going to find a good y = a + bx model for x and x^2, as the curve is exponential. The scatterplot usually tells you whether the relationship is linear. Common sense prevails most of the time though.
  2. There’s no way to truly understand whether the variable you’re trying to predict actually relates to the variable you expect to predict it. This is the old correlation vs causation issue. Sometimes the relationships may be obvious, but in a multiple regression model it may not be as clear-cut and some of the variables may be codependent which throws off some key assumptions.
  3. The model error (the distance between the points and the line, more or less) should be normally distributed. This means outliers can ruin the model depending on their position on the plot. Are those outliers relevant? Your call.
  4. Regression models use historical data to predict future data – in a growing business many things are changing including (marketing spend, competition, margins, macroeconomic factors, site design, etc…).  Looking at a regression model to forecast sales may be less useful than looking at the same month last year and multiplying it by the % you want to grow by.
  5. Before statistical software packages, running a regression was a laborious processes (remember that thing about partial derivatives and matrices?). People only inspected relationships that were really ‘worth inspecting’. Nowadays there’s quite a bit of data snooping and simply running regressions because one can. The odds of finding a reasonably good model the more variables you inspect are well…reasonably good!
  6. Again – they are imprecise. While the algorithm minimizes the distance between the points and the line…the line cannot go through all the points since it’s impossible to account for all the conditions that led to each particular observation in the data set (eg, the site went down one day and ad spend did not).
I think regression is a great tool for exploring conditional relationships between variables, but don’t use the model blindly. Ask yourself, ‘why does it predict this…?’. Why does it under-predict in Q1 and over-predict in Q2. You may find interesting insights in evaluating the causes of its shortcomings more than whether it’s accurate or not. If a stupid algorithm could predict what will happen with your business…why does it need you!?

Time for dinner (on average anyway),

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值