Machine Learning in Python实验笔记

最新推荐文章于 2021-01-04 02:18:17 发布

notmenotme

最新推荐文章于 2021-01-04 02:18:17 发布

阅读量486

点赞数

分类专栏：知识笔记

本文链接：https://blog.csdn.net/weixin_40367307/article/details/89488622

版权

知识笔记专栏收录该内容

19 篇文章 1 订阅

订阅专栏

Machine Learning in Python

- - Understand Your Data With Visualization

3.19

print("Zeroth Value: %d" % mylist[0])

print("Zeroth Value:" , mylist[0])

3.31 Line Plot

（点大小的区别，我好无聊。。）

plt.plot([1, 2, 3])

在这里插入图片描述

plt.plot(numpy.array([1, 1, 4]))

plt.plot(numpy.array([1, 1, 4]))
3.32 Scatter Plot

plt.scatter(x,y)

在这里插入图片描述

plt.scatter(x,y,x)

在这里插入图片描述

plt.scatter(x,y,y)

在这里插入图片描述

plt.scatter(x,y,z)

在这里插入图片描述
3.33 Pandas Series

print(myseries[0])
print(myseries['a'])

————————————

在这里插入图片描述

Chapter 3 summary

在这里插入图片描述
————————————————————————————————————————————————————————
4.3 Load CSV Files with NumPy

ValueError: could not convert string to float

float类型之外的数据集导入
用dtype

data = loadtxt(raw_data, delimiter=",", dtype=numpy.str)

4.5 loading a CSV URL using NumPy

ValueError: Wrong number of columns at line 161

可能是line161列数超出了前面统一的列数。wrong point 见 npyio.py line1058 read_data()里

            if len(vals) != N:
                line_num = i + skiprows + 1
                raise ValueError("Wrong number of columns at line %d" % line_num)

eg.

1 2
3 4 5
6 7

不会解决。。

numpy教程

4.7 loading a CSV file using Pandas

看line0 用 data[0:1] ，数据类型是？
Numpy和Python用data[0]

4.9 loading a CSV URL using Pandas

names是列，data['preg']

names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']

看line161 print(data[161:162])

Chapter 4 summary
在这里插入图片描述

————————————————————————————————————————————————————————

5.13 Skew of Univariate Distributions

偏态分布的意义

Skew refers to a distribution that is assumed Gaussian (normal or bell curve) that is shifted orsquashed in one direction or another. Many machine learning algorithms assume a Gaussiandistribution. Knowing that an attribute has a skew may allow you to perform data preparationto correct the skew and later improve the accuracy of your models.

反映偏态分布的集中趋势往往用中位数

峰左移，右偏，正偏（positive skew）
峰右移，左偏，负偏（negative skew）

在这里插入图片描述

与正态分布相对而言，偏态分布有两个特点：

一是左右不对称（即所谓偏态）；

二是当样本增大时，其均数趋向正态分布。

5 summary

在这里插入图片描述
————————————————————————————————————————————————————————

Understand Your Data With Visualization

【参考书】【Python数据可视化之matplotlib实践】
查看matplotlib可制作的各种图表，单击画廊中图表可查看用于生成图表的代码戳
colorbar() 戳

6.3 Box and Whisker Plots
6.4

# Correction Matrix Plot 
from matplotlib import pyplot 
from pandas import read_csv 
import numpy 
filename = ' pima-indians-diabetes.data.csv' 
names = [' preg' , ' plas' , ' pres' , ' skin' , ' test' , ' mass' , ' pedi' , ' age' , ' class' ] 
data = read_csv(filename, names=names) 
correlations = data.corr() 
# plot correlation matrix 
fig = pyplot.figure() #初始化一个新的视图，尽管它可以调用绘图命令并自动启动。而plt.show()命令，将关闭正在操作的图形，然后新建一个图形
ax = fig.add_subplot(111) #mnp 一块画布分成m*n块，第p块
cax = ax.matshow(correlations, vmin=-1, vmax=1) #plot a matrix or an array as an image
fig.colorbar(cax)#Add a colorbar to a plot.
ticks = numpy.arange(0,9,1)
ax.set_xticks(ticks) 
ax.set_yticks(ticks) 
ax.set_xticklabels(names) 
ax.set_yticklabels(names)
pyplot.show()#打开matplotlib查看器

6.6

from pandas.tools.plotting import scatter_matrix

报错： ModuleNotFoundError: No module named ‘pandas.tools’

from pandas.plotting import scatter_matrix

——————————————————————————————————————————————————————

7 pre-processing
在这里插入图片描述

rescale 不对
不懂

数据处理
——————————————————————————————————————————————————————

8 Feature Selection

PCA
通过计算数据矩阵的协方差矩阵，然后得到协方差矩阵的特征值特征向量，选择特征值最大(即方差最大)的k个特征所对应的特征向量组成的矩阵。这样就可以将数据矩阵转换到新的空间当中，实现数据特征的降维。

——————————————————————————————————————————————————————

10 Machine Learning Algorithm Performance Metrics

classification&regression

Classification Metrics
accuracy适用局限
Logarithmic Loss 越小越好
AUC
Confusion Matrix
Classification Report

Regression Metrics

————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————
在这里插入图片描述