机器学习学习汇报1

WriseW

于 2024-05-11 22:40:49 发布

阅读量505

点赞数 15

文章标签：学习

本文链接：https://blog.csdn.net/weixin_73614125/article/details/138465722

版权

一、单变量线性回归

1.模型：

$fw,b(x) = wx + b$

2.参数

𝑚 代表训练集中实例的数量

𝑥 代表特征/输入变量

𝑦 代表目标变量/输出变量

(𝑥, 𝑦) 代表训练集中的实例

(𝑥 (𝑖) , 𝑦 (𝑖) ) 代表第𝑖 个观察实例

3.代价函数

$J(w,b)=\frac{1}{2m}\sum_{i=1}^{m}(fw,b(x^{i})-y^{i})^{2}$

4.目标

找到最小的J作为w和b的函数

5.梯度下降

$w = w -\alpha \frac{\partial }{\partial w}J(w,b)$ $\frac{\partial }{\partial w} = \frac{1}{m}\sum_{i=1}^{m}(fw,b(x^{i})-y^{i})\cdot x^{i}$

$b = b -\alpha \frac{\partial }{\partial b}J(w,b)$ $\frac{\partial }{\partial b} = \frac{1}{m}\sum_{i=1}^{m}(fw,b(x)-y^{i})$

同步更新数据

$tmp_w = w-\alpha \frac{\partial }{\partial w}J(w,b)$

$tmp_b = b-\alpha \frac{\partial }{\partial w}J(w,b)$

$w = tmp_w$

$b = tmp_b$

6.算法流程

(1)选取特征值，设计假设函数

(2)代价函数

(3)进行梯度下降

7.吴恩达机器学习第一周作业

假设你是一个连锁食品店的CEO你要通过文本文件里的一个由城市人口和每卖一个食物所能获得的利润以及该城市的总利润所组成的数据表，做出线性回归，预测在某个地方开一个新店所能获得的利润是多少。

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

#数据集处理
data = pd.read_csv('ex1data1.txt', header=None, names=['population', 'profit'])
# print(data)
# data.plot(kind='scatter', x='population', y='profit', figsize=(12, 8))
# plt.show()
data.insert(0, 'ones', 1)
# print(data)

x = data.iloc[:, 0:2]  # 取出第一列和第二列（人口）
y = data.iloc[:, 2:3]  # 取出最后一列（利润）
x = np.matrix(x.values)   # 将x转化为矩阵
y = np.matrix(y.values)   # 将y转化为矩阵
# print(x)
# print(y)
theta = np.zeros((2, 1))  # 空矩阵
# print(theta)
# print(x @ theta)

# 代价函数
def cost_func(x, y, theta):
    cost_martrix = np.power((x @ theta - y), 2)
    return np.sum(cost_martrix) / (2 * len(x))
# print(cost_func(x, y, theta))

# 下降梯度算法
def gradient_descent(x, y, theta, alpha, update_times):
    cost_list = []
    for i in range(update_times):
        temp0 = theta[0] - alpha * (np.sum(x @ theta - y)) / len(x)
        temp1 = theta[1] - alpha * (np.multiply((x @ theta - y), x[:, 1]).sum()) / len(x)
        theta[0] = temp0
        theta[1] = temp1
        cost = cost_func(x, y, theta)
        cost_list.append(cost)
        pass
    return theta, cost


theta_last, cost_last = gradient_descent(x, y, theta, 0.01, 10000)
# print(theta_last)
print(theta[1], theta[0])

data.plot(kind="scatter", x="population", y="profit", figsize=(8, 5),)
x = np.linspace(data.population.min(), data.population.max(), 100)
y = theta[1]*x + theta[0]
plt.plot(x, y)
plt.show()

结果：

二、多元线性回归

1.模型

$fw,b(x) = w_{1}x_{1}+w_{2}x_{2}+...+w_{n}x_{n}+b$

(与单变量线性回归差不多，难点在于数据处理)

2.对特征值进行缩放

目的：获得一个梯度下降更快的代价函数

（1）采用数值除以最大值的方法缩放

（2）均值归一化

$x_{i} = \frac{x_{i}-u_{i}}{s_{i}}$

$x_{i}$ ：当前特征的原数值

$u_{i}$ ：当前特征的原数值的平均值

$s_{i}$ ：当前特征最大值 - 当前特征最小值

3.吴恩达机器学习第二周作业

使用多个变量实现线性回归，以预测房价

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# -*- coding: utf-8 -*-
import pandas as pd
import matplotlib.pyplot as plt

plt.rcParams["font.sans-serif"] = ['Microsoft YaHei']  
plt.rcParams["axes.unicode_minus"] = False  

path = 'ex1data2.txt'
data2 = pd.read_csv(path, header=None, names=['size', 'Bedrooms', 'Price'])
# print(data2)
# data2.head()

# 归一化
data2 = (data2 - data2.mean()) / data2.std()
# print(data2)
data2.insert(0, 'one', 1)
cols = data2.shape[1]
# print(cols)
x2 = data2.iloc[:, 0:cols-1]
y2 = data2.iloc[:, cols-1:cols]
x2 = np.matrix(x2)
y2 = np.matrix(y2)
theta = np.matrix(np.zeros((cols-1, 1)))
# print(x2)
# print(y2)
# print(theta)

# 代价函数
def cost_func(x2, y2, theta):
    temp = np.power(x2@theta - y2, 2)
    cost = np.sum(temp)/(2*len(x2))
    return cost
    pass

# print(x2, y2, theta)

# 梯度下降算法
def gradient_descent(x2, y2, theta, alpha, iters):
    cost_list = []
    for i in range(iters):
        # temp0 = theta - (alpha/len(x2))*np.dot((np.dot(x2, theta)-y2).T, x2[:]).T
        # theta = temp0
        temp0 = theta[0] - (x2 @ theta - y2).sum() * (alpha / len(x2))
        temp1 = theta[1] - (np.multiply((x2 @ theta - y2),x2[:, 1:2])).sum() * (alpha / len(x2))
        temp2 = theta[2] - (np.multiply((x2 @ theta - y2),x2[:, 2:3])).sum() * (alpha / len(x2))
        theta[0] = temp0
        theta[1] = temp1
        theta[2] = temp2
        cost_list.append(cost_func(x2, y2, theta))

    return theta, cost_list
    pass

alpha_list = [0.003, 0.03, 0.3]
iters = 200

fig, ax = plt.subplots()
for alpha in alpha_list:
    theta = np.matrix(np.zeros((3, 1)))
    _, costs = gradient_descent(x2, y2, theta, alpha, iters)
    ax.plot(np.arange(iters), costs, label=alpha)
    ax.legend()

ax.set(xlabel='迭代次数',
       ylabel='代价函数值',
       title='代价函数与迭代次数的关系')
plt.show()

三、逻辑回归

1.逻辑函数

$f\vec{w},b(\vec{x}) = g(z)= g(\vec{w}\cdot \vec{x}+b)=\frac{1}{1+e^{-(\vec{w}\cdot \vec{x}+b)}}$

在逻辑回归中，预测：当 $f\vec{w},b(\vec{x})$ >= 0.5时，预测 𝑦 = 1。当 $f\vec{w},b(\vec{x})$ < 0.5时，预测 𝑦 = 0 。

根据上面绘制出的 S 形函数图像，我们知道当 𝑧 = 0 时 𝑔(𝑧) = 0.5

𝑧 > 0 时 𝑔(𝑧) > 0.5 𝑧 < 0 时 𝑔(𝑧) < 0.5

又 𝑧 = $\vec{w}\cdot \vec{x}+b$ ，即： $\vec{w}\cdot \vec{x}+b$ >= 0 时，预测 𝑦 = 1

$\vec{w}\cdot \vec{x}+b$ < 0 时，预测 𝑦 = 0

例子：

$f\vec{w},b(\vec{x}) = g(\vec{w1}\cdot \vec{x1}+\vec{w2}\cdot \vec{x2}+b)$

并且参数w 是向量[-3 1 1]。则当−3 + 𝑥1 + 𝑥2 ≥ 0，即𝑥1 + 𝑥2 ≥ 3时，模型将预测 𝑦 = 1。我们可以绘制直线𝑥1 + 𝑥2 = 3，这条线便是我们模型的分界线，将预测为 1 的区域和预测为 0 的区域分隔开。

2.代价函数

$J(\vec{w},b) = -\frac{1}{m}\sum_{i=1}^{m}[y^{i}log(f\vec{w},b(\vec{x}))+(1-y^{i})log(1-f\vec{w},b(\vec{x}))]$

3.梯度下降

$w = w -\alpha \frac{\partial }{\partial w}J(w,b)$ $\frac{\partial }{\partial w} = \frac{1}{m}\sum_{i=1}^{m}(fw,b(x^{i})-y^{i})\cdot {x_{j}}^{i}$

$b = b -\alpha \frac{\partial }{\partial b}J(w,b)$ $\frac{\partial }{\partial b} = \frac{1}{m}\sum_{i=1}^{m}(fw,b(x)-y^{i})$

同步更新数据

$tmp_w = w-\alpha \frac{\partial }{\partial w}J(w,b)$

$tmp_b = b-\alpha \frac{\partial }{\partial w}J(w,b)$

$w = tmp_w$

$b = tmp_b$

4.可能出现的问题：过拟合

(1)收集更多数据

(2)只选择部分数据

(3)减小参数大小

(4)正则化

$J(\vec{w},b) = -\frac{1}{2m}\sum_{i=1}^{m}[y^{i}log(f\vec{w},b(\vec{x}))+(1-y^{i})log(1-f\vec{w},b(\vec{x}))]+\frac{\lambda \AA }{2m}\sum_{j=1}^{n}{w_{j}}^{2}$

$w = w -\alpha \frac{\partial }{\partial w}J(w,b)$ $\frac{\partial }{\partial w} = \frac{1}{m}\sum_{i=1}^{m}(fw,b(x^{i})-y^{i})\cdot {x_{j}}^{i}+\frac{\lambda }{m}w_{j}$

$b = b -\alpha \frac{\partial }{\partial b}J(w,b)$ $\frac{\partial }{\partial b} = \frac{1}{m}\sum_{i=1}^{m}(fw,b(x)-y^{i})$

正则化的主要思想就是给代价函数加上一个有关参数的项，并赋予一个lamuda权重。通过这种方式，在保证代价函数收敛的同时，参数值尽量的小。

如果我们令 𝜆 的值很大的话，为了使代价函数尽可能的小，所有的 w 的值（不包括b）都会在一定程度上减小。但若 λ 的值太大了，那么w（不包括b）都会趋近于 0，这样我们所得到的只能是一条平行于𝑥轴的直线。所以对于正则化，我们要取一个合理的 𝜆 的值，这样才能更好的应用正则化。

5.吴恩达机器学习第二周作业

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

path = 'ex2data1.txt'
data = pd.read_csv(path, header=None, names=['exam 1', 'exam 2', 'admitted'])
data.head()
print(data.head())

positive = data[data['admitted'].isin([1])]
negative = data[data['admitted'].isin([0])]
fig, ax = plt.subplots(figsize=(12, 8))
ax.scatter(positive['exam 1'], positive['exam 2'], s=50, c='b', marker='o', label='admitted')
ax.scatter(negative['exam 1'], negative['exam 2'], s=50, c='r', marker='x', label='not admitted')
ax.legend()
ax.set_xlabel('exam 1 score')
ax.set_ylabel('exam 2 score')
# plt.show()


def sigmoid(z):
    return 1 / (1 + np.exp(-z))

nums = np.arange(-10, 10, step=1)
fig, ax = plt.subplots(figsize=(12, 8))
ax.plot (nums, sigmoid(nums), 'r')
# plt.show()

def cost(theta, x, y):
    theta = np.matrix(theta)
    x = np.matrix(x)
    y = np.matrix(y)
    first = np.multiply(-y, np.log(sigmoid(x * theta.T)))
    second = np.multiply((1 - y), np.log(1 - sigmoid(x * theta.T)))
    return np.sum(first - second) / (len(x))

def gradientdescent(x, y, theta, alpha, epoch):
    temp = np.matrix(np.zeros(theta.shape))
    m = x.shape[0]
    costs = np.zeros(epoch)

    for i in range(epoch):
        A = sigmoid(np.dot(x, theta.T))
        temp = theta - (alpha / m) * (A - y).T * x
        theta = temp
        costs[i] = cost(theta, x, y)

    return theta, costs

data.insert(0, 'Ones', 100)
cols = data.shape[1]  # 列数
x = data.iloc[:, 0:cols - 1]
print(x.head())
y = data.iloc[:, cols - 1:cols]
print(y.head())
# theta = np.zeros(X.shape[1])
theta = np.ones(3)
print(theta)
x = np.matrix(x)
x = x / 100  # 归一化
y = np.matrix(y)
theta = np.matrix(theta)
print(theta)

alpha = 0.3
epoch = 10000
origin_cost = cost(theta, x, y)
final_theta, costs = gradientdescent(x, y, theta, alpha, epoch)
print(final_theta)
print(costs)

def predict(theta, X):
    probability = sigmoid(np.dot(X, theta.T))
    return [1 if x >= 0.5 else 0 for x in probability]

predictions = predict(final_theta, x)
correct = [1 if a == b else 0 for (a, b) in zip(predictions, y)]
accuracy = sum(correct) / len(x)
print(accuracy)