回归分类学习

最新推荐文章于 2023-03-28 15:25:28 发布

kb_pycittate

最新推荐文章于 2023-03-28 15:25:28 发布

阅读量696

点赞数

分类专栏：统计学习导论文章标签： python markdown 机器学习回归分析分类器训练

本文链接：https://blog.csdn.net/weixin_40360666/article/details/78208811

版权

本文介绍了Python科学计算背景，讲解了Markdown语法、Python环境配置，重点探讨了数据集、简单与多元线性回归，并涉及分类问题，包括KNN和SVM模型的运用。同时，文章提供了代码示例和性能评估。

摘要由CSDN通过智能技术生成

安装python和回归分类实验

目前Python的科学计算功能非常强大，R语言也是一门科学计算语言，但鉴于Python的普适性，将使用Python语言来描述统计学习课程的内容

Markdown和扩展Markdown简洁的语法
Python的安装和环境配置
数据集
简单线性回归
多元线性回归
分类
丰富的快捷键

快捷键

加粗 Ctrl + B

斜体 Ctrl + I

引用 Ctrl + Q

插入链接 Ctrl + L

插入代码 Ctrl + K

插入图片 Ctrl + G

提升标题 Ctrl + H

有序列表 Ctrl + O

无序列表 Ctrl + U

横线 Ctrl + R

撤销 Ctrl + Z

重做 Ctrl + Y

Markdown及扩展

Markdown 是一种轻量级标记语言，它允许人们使用易读易写的纯文本格式编写文档，然后转换成格式丰富的HTML页面。 —— [ 维基百科 ]

使用简单的符号标识不同的标题，将某些文字标记为粗体或者斜体，创建一个链接等，详细语法参考帮助？。

本编辑器支持 Markdown Extra , 　扩展了很多好用的功能。具体请参考[Github][2].

Python的安装和环境配置

Windows：

1、下载安装包
https://www.python.org/downloads/ （根据自己电脑配置选择32位或者64位）

2、安装
默认安装路径：C:\python27

3、配置环境变量
【右键计算机】–》【属性】–》【高级系统设置】–》【高级】–》【环境变量】–》【在第二个内容框中找到变量名为Path 的一行，双击】 –> 【Python安装目录追加到变值值中，用；分割】

Linux（Ubuntu）：

无需安装，原装Python环境

Python的包管理工具pip的安装与使用

Windows：
下载pip的安装包get-pip.py，下载地址：https://pip.pypa.io/en/latest/installing.html#id7
Linux（Ubuntu）： sudo apt-get install python-pip

交互运算环境

• IPython, an advanced Python consolehttp://ipython.org/ • Jupyter,
notebooks in the browser http://jupyter.org/ • Anaconda
（https://www.anaconda.com/download/）
• WinPython（https://winpython.github.io/） • Spyder • Pycharm
收费，比较好用一款IDE。建议用免费community版常用的库： • pandas, statsmodels, seaborn for
statistics • sympy for symbolic computing • scikit-image for image
processing • scikit-learn for machine learning

注意：个人推荐使用anaconda2,它集成了jupyter,notebooks和Spyder，是非常好用的Python IDE，特别是jupyter notebook。

数据集

数据集可参考： https://archive.ics.uci.edu/ml/，即UCI库，常用机器学习数据集有iris和Boston。 scikit-learn是Python的一个开源机器学习模块（http://scikit-learn.org/dev/）

机器学习的核心是 Use Data Answer Question~

导入波士顿房价数据集并查看描述：

from sklearn import datasets 
iris = datasets.load_iris()
#print iris.data

import sklearn.datasets
boston = sklearn.datasets.load_boston()
print boston.DESCR

boston.data 获取这506 * 13的特征数据；
boston.target 获取对应的506 * 1的对应价格

简单线性回归

用Boston数据集进行简单线性回归。

1) 库

sklearn.datasets, sklearn.linear_model, numpy（ numpy.random，numpy.linalg ）lm(), matplotlib

2) 要求及步骤

a) 划分数据集，分训练集和测试集；用sklearn.linear_model.LinearRegression()完成一个简单线性回归，了解预测变量和响应变量之间的关系，关系强弱，正负相关性。
b) 绘制响应变量和预测变量关系图，绘制最小二乘回归线。
c) 使用LinearRegression模型自带的评估模块，并输出评估结果。
from sklearn.metrics import r2_score,mean_squared_error,mean_absolute_error
d) 使用线性回归模型LinearRegression和SGDRegressor分别对波士顿房价数据进行训练及预测，给出评估结果

划分数据集，简单线性回归
首先学习线性回归库：
help(linear_model.LinearRegression)
class LinearRegression(LinearModel, sklearn.base.RegressorMixin)

 |  Parameters
 |  ----------
 |  fit_intercept : boolean, optional
 |      whether to calculate the intercept for this model. If set
 |      to false, no intercept will be used in calculations
 |      (e.g. data is expected to be already centered).
 |  
 |  normalize : boolean, optional, default False
 |      If True, the regressors X will be normalized before regression.
 |      This parameter is ignored when `fit_intercept` is set to False.
 |      When the regressors are normalized, note that this makes the
 |      hyperparameters learnt more robust and almost independent of the number
 |      of samples. The same property is not valid for standardized data.
 |      However, if you wish to standardize, please use
 |      `preprocessing.StandardScaler` before calling `fit` on an estimator
 |      with `normalize=False`.
 |  
 |  copy_X : boolean, optional, default True
 |      If True, X will be copied; else, it may be overwritten.
 |  
 |  n_jobs : int, optional, default 1
 |      The number of jobs to use for the computation.
 |      If -1 all CPUs are used. This will only provide speedup for
 |      n_targets > 1 and sufficient large problems.
 |  
 |  Attributes
 |  ----------
 |  coef_ : array, shape (n_features, ) or (n_targets, n_features)
 |      Estimated coefficients for the linear regression problem.
 |      If multiple targets are passed during the fit (y 2D), this
 |      is a 2D array of shape (n_targets, n_features), while if only
 |      one target is passed, this is a 1D array of length n_features.
 |  
 |  residues_ : array, shape (n_targets,) or (1,) or empty
 |      Sum of residuals. Squared Euclidean 2-norm for each target passed
 |      during the fit. If the linear regression problem is under-determined
 |      (the number of linearly independent rows of the training matrix is less
 |      than its number of linearly independent columns), this is an empty
 |      array. If the target vector passed during the fit is 1-dimensional,
 |      this is a (1,) shape array.
 |  
 |      .. versionadded:: 0.18
 |  
 |  intercept_ : array
 |      Independent term in the linear model.
 |  
 |  Notes
 |  -----
 |  From the implementation point of view, this is just plain Ordinary
 |  Least Squares (scipy.linalg.lstsq) wrapped as a predictor object.
 |  
 |  Method resolution order:
 |      LinearRegression
 |      LinearModel
 |      abc.NewBase
 |      sklearn.base.BaseEstimator
 |      sklearn.base.RegressorMixin
 |      __builtin__.object
 |  
 |  Methods defined here:
 |  
 |  __init__(self, fit_intercept=True, normalize=False, copy_X=True, n_jobs=1)
 |  
 |  fit(self, X, y, sample_weight=None)
 |      Fit linear model.
 |      
 |      Parameters
 |      ----------
 |      X : numpy array or sparse matrix of shape [n_samples,n_features]
 |          Training data
 |      
 |      y : numpy array of shape [n_samples, n_targets]
 |          Target values
 |      
 |      sample_weight : numpy array of shape [n_samples]
 |          Individual weights for each sample
 |