安装python和回归分类实验
目前Python的科学计算功能非常强大,R语言也是一门科学计算语言,但鉴于Python的普适性,将使用Python语言来描述统计学习课程的内容
- Markdown和扩展Markdown简洁的语法
- Python的安装和环境配置
- 数据集
- 简单线性回归
- 多元线性回归
- 分类
- 丰富的快捷键
快捷键
- 加粗
Ctrl + B
- 斜体
Ctrl + I
- 引用
Ctrl + Q
- 插入链接
Ctrl + L
- 插入代码
Ctrl + K
- 插入图片
Ctrl + G
- 提升标题
Ctrl + H
- 有序列表
Ctrl + O
- 无序列表
Ctrl + U
- 横线
Ctrl + R
- 撤销
Ctrl + Z
- 重做
Ctrl + Y
Markdown及扩展
Markdown 是一种轻量级标记语言,它允许人们使用易读易写的纯文本格式编写文档,然后转换成格式丰富的HTML页面。 —— [ 维基百科 ]
使用简单的符号标识不同的标题,将某些文字标记为粗体或者斜体,创建一个链接等,详细语法参考帮助?。
本编辑器支持 Markdown Extra , 扩展了很多好用的功能。具体请参考[Github][2].
Python的安装和环境配置
Windows:
- 1、下载安装包
https://www.python.org/downloads/ (根据自己电脑配置选择32位或者64位)- 2、安装
默认安装路径:C:\python27- 3、配置环境变量
【右键计算机】–》【属性】–》【高级系统设置】–》【高级】–》【环境变量】–》【在第二个内容框中找到 变量名为Path 的一行,双击】 –> 【Python安装目录追加到变值值中,用 ; 分割】Linux(Ubuntu):
无需安装,原装Python环境
Python的包管理工具pip的安装与使用
Windows:
下载pip的安装包get-pip.py,下载地址:https://pip.pypa.io/en/latest/installing.html#id7
Linux(Ubuntu):sudo apt-get install python-pip
交互运算环境
• IPython, an advanced Python consolehttp://ipython.org/ • Jupyter,
notebooks in the browser http://jupyter.org/ • Anaconda
(https://www.anaconda.com/download/)
• WinPython(https://winpython.github.io/) • Spyder • Pycharm
收费,比较好用一款IDE。 建议用免费community版 常用的库: • pandas, statsmodels, seaborn for
statistics • sympy for symbolic computing • scikit-image for image
processing • scikit-learn for machine learning注意:个人推荐使用anaconda2,它集成了jupyter,notebooks和Spyder,是非常好用的Python IDE,特别是jupyter notebook。
数据集
数据集可参考: https://archive.ics.uci.edu/ml/,即UCI库,常用机器学习数据集有iris和Boston。 scikit-learn是Python的一个开源机器学习模块(http://scikit-learn.org/dev/)
机器学习的核心是 Use Data Answer Question~
导入波士顿房价数据集并查看描述:
from sklearn import datasets
iris = datasets.load_iris()
#print iris.data
import sklearn.datasets
boston = sklearn.datasets.load_boston()
print boston.DESCR
boston.data 获取这506 * 13的特征数据;
boston.target 获取对应的506 * 1的对应价格
简单线性回归
用Boston数据集进行简单线性回归。
- 1) 库
sklearn.datasets, sklearn.linear_model, numpy( numpy.random,numpy.linalg )lm(), matplotlib
- 2) 要求及步骤
a) 划分数据集,分训练集和测试集; 用sklearn.linear_model.LinearRegression()完成一个简单线性回归,了解预测变量和响应变量之间的关系,关系强弱,正负相关性。
b) 绘制响应变量和预测变量关系图,绘制最小二乘回归线。
c) 使用LinearRegression模型自带的评估模块,并输出评估结果。
from sklearn.metrics import r2_score,mean_squared_error,mean_absolute_error
d) 使用线性回归模型LinearRegression和SGDRegressor分别对波士顿房价数据进行训练及预测,给出评估结果
- 划分数据集,简单线性回归
首先学习线性回归库:
help(linear_model.LinearRegression)
class LinearRegression(LinearModel, sklearn.base.RegressorMixin)
| Parameters
| ----------
| fit_intercept : boolean, optional
| whether to calculate the intercept for this model. If set
| to false, no intercept will be used in calculations
| (e.g. data is expected to be already centered).
|
| normalize : boolean, optional, default False
| If True, the regressors X will be normalized before regression.
| This parameter is ignored when `fit_intercept` is set to False.
| When the regressors are normalized, note that this makes the
| hyperparameters learnt more robust and almost independent of the number
| of samples. The same property is not valid for standardized data.
| However, if you wish to standardize, please use
| `preprocessing.StandardScaler` before calling `fit` on an estimator
| with `normalize=False`.
|
| copy_X : boolean, optional, default True
| If True, X will be copied; else, it may be overwritten.
|
| n_jobs : int, optional, default 1
| The number of jobs to use for the computation.
| If -1 all CPUs are used. This will only provide speedup for
| n_targets > 1 and sufficient large problems.
|
| Attributes
| ----------
| coef_ : array, shape (n_features, ) or (n_targets, n_features)
| Estimated coefficients for the linear regression problem.
| If multiple targets are passed during the fit (y 2D), this
| is a 2D array of shape (n_targets, n_features), while if only
| one target is passed, this is a 1D array of length n_features.
|
| residues_ : array, shape (n_targets,) or (1,) or empty
| Sum of residuals. Squared Euclidean 2-norm for each target passed
| during the fit. If the linear regression problem is under-determined
| (the number of linearly independent rows of the training matrix is less
| than its number of linearly independent columns), this is an empty
| array. If the target vector passed during the fit is 1-dimensional,
| this is a (1,) shape array.
|
| .. versionadded:: 0.18
|
| intercept_ : array
| Independent term in the linear model.
|
| Notes
| -----
| From the implementation point of view, this is just plain Ordinary
| Least Squares (scipy.linalg.lstsq) wrapped as a predictor object.
|
| Method resolution order:
| LinearRegression
| LinearModel
| abc.NewBase
| sklearn.base.BaseEstimator
| sklearn.base.RegressorMixin
| __builtin__.object
|
| Methods defined here:
|
| __init__(self, fit_intercept=True, normalize=False, copy_X=True, n_jobs=1)
|
| fit(self, X, y, sample_weight=None)
| Fit linear model.
|
| Parameters
| ----------
| X : numpy array or sparse matrix of shape [n_samples,n_features]
| Training data
|
| y : numpy array of shape [n_samples, n_targets]
| Target values
|
| sample_weight : numpy array of shape [n_samples]
| Individual weights for each sample
|