数据挖掘day14、15-CS229-WEEK2 Linera Regression

最新推荐文章于 2022-01-06 20:57:52 发布

偲偲粑

最新推荐文章于 2022-01-06 20:57:52 发布

阅读量175

点赞数

分类专栏：数据挖掘

本文链接：https://blog.csdn.net/weixin_43329319/article/details/97782966

版权

数据挖掘专栏收录该内容

23 篇文章 1 订阅

订阅专栏

1、本节思维导图

在这里插入图片描述

2、Python实现梯度下降

参考文章
数据使用课程的课后练习，J的图像：

# Use these for your excerise 
theta0s = np.linspace(-1,1,50)
theta1s = np.linspace(0,1.5,50)
COST = np.empty(shape=(50,50))
# Meshgrid for paramaters 
T0S, T1S = np.meshgrid(theta0s, theta1s)
# for each parameter combination compute the cost
for i in range(50):
    for j in range(50):
        COST[i,j] = cost(T0S[0,i], T1S[j,0], df.Population, df.Profit)

# make 3d plot
fig2 = plt.figure(figsize=(15,10))
ax = fig2.gca(projection='3d')
ax.plot_surface(X=T0S,Y=T1S,Z=COST)
plt.show()

在这里插入图片描述
最后，是参考了各类实现代码，弄了一个可以实现多个参数的线性回归。
有如下的心得，也是不动手不记得的细节：
批量梯度下降，首次需要遍历全部项
使用numpy的数组或矩阵计算梯度方便，但是维度搞得有点昏
特征值缩放及均一化，要在分离出Y之后进行，Y值不需要均一化
这个代码，按照课程的思路，可以对 $\alpha$ 进行尝试。可以输入多个值，逗号隔开，默认如果输入一个就输出最终结果。

ef Nomalize_data(data):
    for i in range(data.shape[1]):
        data[:,i]=(data[:,i]-data[:,i].mean())/data[:,i].std()
    return data

#cost函数求导
def bath_cost_theta(x,y,theta):
    h_y=x.dot(theta.T)-y.T
    k=np.ones(x.shape, dtype=np.float32)
    patial=(h_y*k.T*x.T).sum(axis=1)
    return patial/x.shape[0]

#cost
def cost(x,y,theta):
    h_y=x.dot(theta.T)-y.T
    J0=h_y*h_y
    J=J0.sum()/x.shape[0]/2
    return J


def gradient_descent(x,y, alpha=0.1):
    max_epochs = 1000 # 最大迭代次数
    counter = 0       # 次数记录
    #初始化theta,及当前代价函数
    theta = np.zeros(x.shape[1], dtype=np.float32)
    J=cost(x,y,theta)
    costs = [J]     # 损失值都记录
    # 设置一个收敛的阈值 (两次迭代目标函数值相差没有相差多少,就可以停止了)
    convergence_thres = 0.000001
    cprev = J+10  # 初始判断值
    while (np.abs(cprev - J) > convergence_thres) and (counter < max_epochs):
        cprev = J
        # 更新theta
        theta=theta-bath_cost_theta(x,y,theta)*alpha
        J=cost(x,y,theta)
        # 记录J
        costs.append(J) 
        counter += 1   # 增加迭代次数
    return {'theta': theta, "costs": costs}

def main(data):
    #输入
    alist=input('请输入测试alpha值（list）：').split(',')
#     data=Nomalize_data(data)
    x=data[:,:-1]
    y=data[:,-1]
#     x=Nomalize_data(x)
    #在x前面插入1列1
    x=np.insert(x, 0, 1, axis=1)
    plt.figure(figsize=(15,10))
    alist_len=len(alist)
    #按输入的a的数量布局绘图
    for i in range(alist_len):
        if alist_len<=1:
            n=1
            m=1
        elif alist_len<=4:
            n=2
            m=2
        else:
            n=3
            m=math.ceil(alist_len/3)
        #按输入的a的数量布局绘图
        descend = gradient_descent(x,y, alpha=float(alist[i]))
        plt.subplot(n,m,i+1)
        plt.scatter(range(len(descend["costs"])), descend["costs"])
        plt.ylabel('J')
        plt.title('alpha =%f' % float(alist[i]))
        #如果只输入1个a值，则输出theta
        if alist_len==1:
            print(descend["theta"])
    plt.show()
    return descend["theta"]
if __name__ == '__main__': 

    data=np.loadtxt('ex1data2.txt',delimiter=',')
    main(data)
# 如果只有一个参数，可以画出拟合图
#     x=data[:,:-1]
#     y=data[:,-1]
#     theta=main(data)
#     plt.scatter(x,y)
#     z=theta[0]+theta[1]*x
#     plt.plot(x,z,color='red')
#     plt.show()

3、法方程实现

from numpy.linalg import inv
data=np.loadtxt('ex1data1.txt',delimiter=',')
x=data[:,:-1]
y=data[:,-1]
x=np.insert(x, 0, 1, axis=1)
k=inv(x.T.dot(x))
k.dot(x.T).dot(y)

偲偲粑

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
数据挖掘day14、15-CS229-WEEK2 Linera Regression

1、本节思维导图2、Python实现参考文章数据使用课程的课后练习，J的图像：# Use these for your excerise theta0s = np.linspace(-1,1,50)theta1s = np.linspace(0,1.5,50)COST = np.empty(shape=(50,50))# Meshgrid for paramaters T0S,...
复制链接

扫一扫

专栏目录