R squared or coefficient of determination

R squared or coefficient of determination

This is my study notes in machine learning, writing articles in English because I want to improve my writing skills. Anyway, thanks for watching and if I made some mistakes, let me know please.

R squared, or which names coefficient of determination, is an automated way of discovering how good for best fit line actually is.

During data analysis or machine learning, it is usually define error is the distance between data’s y value(reality) and regression’s y value(predicted).

Why uses squared?

First, we don’t want to see any negative number because when adding errors we may get errors equate zero, but it is absolutely ridiculous.
The other reason is squared can punish outliers, using the power of 4,6,8 even larger numbers will be OK. The number larger, the constraint for outliers tougher, but it consumes more computational recourses and times.

Why uses y-mean?

Y-mean symbols the data’s variation. Informally, it measures how far a set of (random) numbers are spread out from their average value.
https://en.wikipedia.org/wiki/Variance

Equation

R 2 = 1 − S E y ^ S E y ˉ R^2 = 1 - \frac{SE \hat y}{SE \bar y} R2=1SEyˉSEy^
S S S is squared operation and E E E is sum operation.

Drawback

R squared is not always a useful way to measure error. It may depend on your goals. If you care about predicting exact future values, r squared is very useful. If you interested in predicting motion or direction, r squared should not carry as much weight. Besides, R squared also bases on the variance from value to value.

If variance is low, the R squared is perfect. We can improve it by simply codes, just make a random dataset generator to test it. The dataset generator allow us change the variance indeed. Correlation is describe data relations in our graph, False means data have no correlation.

def create_dataset(hm, variance, step=2, correlation=False):
    val = 1
    ys = []
    for i in range(hm):
        y = val + random.randrange(-variance,variance)
        ys.append(y)
        if correlation and correlation == 'pos':
            val+=step
        elif correlation and correlation == 'neg':
            val-=step

    xs = [i for i in range(len(ys))]
    return np.array(xs, dtype=np.float64), np.array(ys, dtype=np.float64)

Just use this function in our simple regression program, we can test it easily. Click here to see the program and my another article.

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值