统计学习方法02—感知机

Top Secret

已于 2023-04-12 18:45:34 修改

阅读量300

点赞数

分类专栏：机器学习文章标签： java servlet html

于 2022-06-21 10:31:37 首次发布

本文链接：https://blog.csdn.net/m0_55196097/article/details/125386347

版权

机器学习专栏收录该内容

10 篇文章 0 订阅

订阅专栏

4.1 用sklearn.linear_model中自带的感知机模型

4.1.1 导包

4.1.2 加载数据和预处理

4.1.3 应用 sklearn.linear_model 的感知机模型

1. 简单了解感知机

理论也可以看一下下文：
(3条消息) 统计学习方法——感知机（perceptron）_happy__19的博客-CSDN博客

2. 从简博士处学习整理的笔记（感知机）

2.1 模型介绍与学习策略

2.2 梯度下降算法

2.2.1 随机梯度下降代码

    # 随机梯度下降法:每次随机选择一个误分类点
    def fit(self, X_train, y_train):
        is_wrong = False
        while not is_wrong:
            wrong_count = 0
            for d in range(len(X_train)):
                X = X_train[d]
                y = y_train[d]
                if y * self.sign(X, self.w, self.b) <= 0: # yi(w*xi+b)<=0 的点即为分类点
                    self.w = self.w + self.l_rate * np.dot(y, X) # w=w+学习率*y*x
                    self.b = self.b + self.l_rate * y # b=b+学习率*y
                    wrong_count += 1
            if wrong_count == 0:
                is_wrong = True
        return 'Perceptron Model!'

2.3 感知机的原始形式

2.4 感知机的对偶形式

3. 感知机实践

3.1 加载鸢尾花数据集

from sklearn.datasets import load_iris  # 使用sklearn.datasets自带的鸢尾花数据集
#数据分析常用包
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt


iris = load_iris() # load data 加载数据
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['label'] = iris.target  # 以iris.target作为标签

df.columns = [
    'sepal length', 'sepal width', 'petal length', 'petal width', 'label'
]  #鸢尾花的四个特征

print(df.label.value_counts()) #输出鸢尾花的种类及其对应的数量
"""
print(df.label.value_counts())
#如下输出的label：0,1,2代表的是3种类型的鸢尾花
0    50
1    50
2    50
Name: label, dtype: int64
"""

plt.scatter(df[:50]['sepal length'], df[:50]['sepal width'], label='0')
plt.scatter(df[50:100]['sepal length'], df[50:100]['sepal width'], label='1')
plt.xlabel('sepal length')
plt.ylabel('sepal width')
plt.legend()
plt.show()

注：

"""
print(df.label.value_counts())
#如下输出的label：0,1,2代表的是3种类型的鸢尾花
0 50
1 50
2 50
Name: label, dtype: int64
"""

3.1.1 鸢尾花数据集分析

(1条消息) python数据分析03—Pandas_Top Secret的博客-CSDN博客

(3条消息) 《机器学习》分析鸢尾花数据集_perfect_young的博客-CSDN博客_鸢尾花数据集

4.程序实现代码

4.1 一个简单的复现

from sklearn.datasets import load_iris  # 使用sklearn.datasets自带的鸢尾花数据集
#数据分析常用包
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt


iris = load_iris() # load data 加载数据
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['label'] = iris.target  # 以iris.target作为标签

df.columns = [
    'sepal length', 'sepal width', 'petal length', 'petal width', 'label'
]  #鸢尾花的四个特征,设置列标签

print(df.label.value_counts()) #输出鸢尾花的种类及其对应的数量
"""
print(df.label.value_counts())
#如下输出的label：0,1,2代表的是3种类型的鸢尾花
0    50
1    50
2    50
Name: label, dtype: int64
"""
# 可视化数据
plt.scatter(df[:50]['sepal length'], df[:50]['sepal width'], label='0')
plt.scatter(df[50:100]['sepal length'], df[50:100]['sepal width'], label='1')
plt.xlabel('sepal length')
plt.ylabel('sepal width')
plt.legend()
plt.show()

#数据预处理
data = np.array(df.iloc[:100, [0, 1, -1]])
X, y = data[:,:-1], data[:,-1]
print(len(y))
y = np.array([1 if i == 1 else -1 for i in y])

print("len(data[0]):",len(data[0])) # len(data[0]): 3
w = np.ones(len(data[0]) - 1, dtype=np.float32)
print("w:",w)  # w: [1. 1.]

#感知机模型
# 数据线性可分，二分类数据
# 此处为一元一次线性方程
class Model:
    def __init__(self):
        self.w = np.ones(len(data[0]) - 1, dtype=np.float32) # 初始化权重 w: [1. 1.]
        self.b = 0  #初始化偏置b=0
        self.l_rate = 0.1 #学习率
        # self.data = data

# 计算目标函数的值
    def sign(self, x, w, b):
        y = np.dot(x, w) + b #计算 y=w*x+b
        return y

    # 随机梯度下降法:每次随机选择一个误分类点
    def fit(self, X_train, y_train):
        is_wrong = False
        while not is_wrong:
            wrong_count = 0
            for d in range(len(X_train)):
                X = X_train[d]
                y = y_train[d]
                if y * self.sign(X, self.w, self.b) <= 0: # yi(w*xi+b)<=0 的点即为分类点
                    self.w = self.w + self.l_rate * np.dot(y, X) # w=w+学习率*y*x
                    self.b = self.b + self.l_rate * y # b=b+学习率*y
                    wrong_count += 1
            if wrong_count == 0:
                is_wrong = True
        return 'Perceptron Model!'

    def score(self):
        pass

#实例化感知机模型
perceptron = Model()
perceptron.fit(X, y) #调用梯度下降算法

#训练数据
x_points = np.linspace(4, 7, 10)
y_ = -(perceptron.w[0] * x_points + perceptron.b) / perceptron.w[1]
plt.plot(x_points, y_)

plt.plot(data[:50, 0], data[:50, 1], 'bo', color='blue', label='0')
plt.plot(data[50:100, 0], data[50:100, 1], 'bo', color='orange', label='1')
plt.xlabel('sepal length')
plt.ylabel('sepal width')
plt.legend()
plt.show()

4.1 用sklearn.linear_model中自带的感知机模型

4.1.1 导包

import sklearn
from sklearn.linear_model import Perceptron #用sklearn.linear_model中自带的感知机模型
from sklearn.datasets import load_iris  # 使用sklearn.datasets自带的鸢尾花数据集

#数据分析常用包
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

4.1.2 加载数据和预处理

iris = load_iris() # load data 加载数据
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['label'] = iris.target  # 以iris.target作为标签

df.columns = [
    'sepal length', 'sepal width', 'petal length', 'petal width', 'label'
]  #鸢尾花的四个特征,设置列标签

#数据预处理
data = np.array(df.iloc[:100, [0, 1, -1]])
X, y = data[:,:-1], data[:,-1]
print(len(y))
y = np.array([1 if i == 1 else -1 for i in y])

4.1.3 应用 sklearn.linear_model 的感知机模型

clf = Perceptron(fit_intercept=True,
                 max_iter=1000,
                 shuffle=True)
clf.fit(X, y) #梯度下降算法更新参数

画感知机的线：

# 画感知机的线
x_ponits = np.arange(4, 8)
y_ = -(clf.coef_[0][0]*x_ponits + clf.intercept_)/clf.coef_[0][1]
plt.plot(x_ponits, y_)

代码：

import sklearn
from sklearn.linear_model import Perceptron #用sklearn.linear_model中自带的感知机模型
from sklearn.datasets import load_iris  # 使用sklearn.datasets自带的鸢尾花数据集

#数据分析常用包
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

iris = load_iris() # load data 加载数据
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['label'] = iris.target  # 以iris.target作为标签

df.columns = [
    'sepal length', 'sepal width', 'petal length', 'petal width', 'label'
]  #鸢尾花的四个特征,设置列标签

#数据预处理
data = np.array(df.iloc[:100, [0, 1, -1]])
X, y = data[:,:-1], data[:,-1]
print(len(y))
y = np.array([1 if i == 1 else -1 for i in y])


print(sklearn.__version__) #查看sklearn的版本

clf = Perceptron(fit_intercept=True,
                 max_iter=1000,
                 shuffle=True)
clf.fit(X, y) #梯度下降算法更新参数

# Weights assigned to the features.
print(clf.coef_)

# 截距 Constants in decision function.
print(clf.intercept_)

##可视化操作
# 画布大小
plt.figure(figsize=(10,10))

# 中文标题
plt.rcParams['font.sans-serif']=['SimHei']
plt.rcParams['axes.unicode_minus'] = False
plt.title('鸢尾花线性数据示例')
plt.scatter(data[:50, 0], data[:50, 1], c='b', label='Iris-setosa',)
plt.scatter(data[50:100, 0], data[50:100, 1], c='orange', label='Iris-versicolor')

# 画感知机的线
x_ponits = np.arange(4, 8)
y_ = -(clf.coef_[0][0]*x_ponits + clf.intercept_)/clf.coef_[0][1]
plt.plot(x_ponits, y_)

# 其他部分
plt.legend()  # 显示图例
plt.grid(False)  # 不显示网格
plt.xlabel('sepal length')
plt.ylabel('sepal width')
plt.legend()
plt.show()

Top Secret

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
打赏
0
评论
统计学习方法02—感知机

目录1. 简单了解感知机 2. 从简博士处学习整理的笔记（感知机）2.1 模型介绍与学习策略3. 感知机实践 3.1 加载鸢尾花数据集 3.1.1 鸢尾花数据集分析4.程序实现代码理论也可以看一下下文：(3条消息) 统计学习方法——感知机（perceptron）_happy__19的博客-CSDN博客 2.3 感知机的原始形式 {我加强pandas的学习去啦，学完再补一篇笔记}(3条消息) 《机器学习》分析鸢尾花数据集_perfect_young的博客-CSDN博客_
复制链接

扫一扫