从机器学习代码学习中得到的知识

最新推荐文章于 2024-09-09 23:28:21 发布

maxchet

最新推荐文章于 2024-09-09 23:28:21 发布

阅读量182

点赞数

分类专栏：学习笔记（python）文章标签： python 人工智能 numpy

本文链接：https://blog.csdn.net/macchet/article/details/109266526

版权

学习笔记（python）专栏收录该内容

2 篇文章 0 订阅

订阅专栏

在学习机器学习的过程，也是巩固熟悉python以及各个库的机会，在这里记录一些常用知识点，以备以后查看

1.pd.read_csv()的作用是将csv文件读入并转化为数据框形式

pd.read_csv()
df = pd.read_csv('D:\machine learning\Linear learing\data1\ex1data1.csv',names=['人口','利润'])

2.head() 读前5行
#括号内可填写要读取的前n行，如果不填，默认为n=5

df.head()   #读取前五行

output:
人口利润
0 6.1101 17.5920
1 5.5277 9.1302
2 8.5186 13.6620
3 7.0032 11.8540
4 5.8598 6.8233

3.info（）索引

df.info()

output:
<class ‘pandas.core.frame.DataFrame’>
RangeIndex: 97 entries, 0 to 96
Data columns (total 2 columns):
人口 97 non-null float64
利润 97 non-null float64
dtypes: float64(2)
memory usage: 1.6 KB

4.sns.lmplot() seaborn中绘制回归图 lmplot有很多参数需要学习

sns.lmplot('人口','利润',data=df,size=6,fit_reg = False)
#fit_reg:拟合回归参数,如果fit_reg=True则散点图中则出现拟合直线
plt.show()

5.np.power()
np.power(x,y) x,y为数字，求x的y次方
np.power(x,y) x,y为列表

np.power([2,3],[3,4])

output:[8,81]

6.insert()函数用于将对象插入列表的指定位置
list.insert(index,obj) index:插入的索引位置 obj:需要插入的对象
（1）index=0 从头插入

7.shape（）包含在numpy库，是矩阵（ndarray）的属性，可以获取矩阵的形状（例如二维数组的行列），获取的结果是一个元组，因此相关代码如下：
import numpy as np
x = np.array([[1,2，3,4，5],[6,7，8,9，10],[10,9，8,7，6],[5,4，3]])
(1)输出数组的行和列数
print x.shape #结果： (4, 5)
(2)只输出行数[0]
print x.shape[0] #结果： 4
(3)只输出列数[1]
print x.shape[1] #结果： 5

8.iloc()函数只根据行列号对数据进行切片或选择
https://blog.csdn.net/qq_37089628/article/details/87469403?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522160359550019725225045061%2522%252C%2522scm%2522%253A%252220140713.130102334…%2522%257D&request_id=160359550019725225045061&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2_allfirst_rank_v2~rank_v28-2-87469403.first_rank_ecpm_v3_pc_rank_v2&utm_term=iloc%E5%87%BD%E6%95%B0&spm=1018.2118.3001.4187
示例：

import pandas as pd
mydict = [{'a': 1, 'b': 2, 'c': 3, 'd': 4},
          {'a': 100, 'b': 200, 'c': 300, 'd': 400},
          {'a': 1000, 'b': 2000, 'c': 3000, 'd': 4000 }]
df = pd.DataFrame(mydict)

（1）按行索引
（1）整数作为索引：df.iloc[n]，默认查找第n行
（2）列表作为索引：查找列表中数字对应行号的数据，如，当输入[0,2]时，对应查找行号为0和2的数据，而不是0-2行
（3）切片作为索引：实现多行数据查找 df.iloc[:2]#选择前两行数据
（4）布尔类型数据作为索引：需保持布尔类型数据与原数据数目一致，True代表选择，False代表不选择
df.iloc[[True, False, True]]#选择了第一行和第三行数据
（5）表达式作为索引：
df.iloc[lambda x: x.index % 2 == 0]#选择偶数行

（2）同时规定行和列进行索引，与只按行索引类似，也有五种方式，在行和列之间添加“，”分别规定行列索引范围。

注意：规定的数字都是行列号，行列号均从0开始，行列号为行列数-1，即第1行第1列索引应为df.iloc[0,0]

df.iloc[0, 1]#选择行号=0，列号=1的数据
df.iloc[[0, 2], [1, 3]]#选择行号为0和2，列号为1和3的数据
df.iloc[1:3, 0:3]#选择行号为1-2，列号为0-2的数据，注意切片范围为左闭右开
df.iloc[:, [True, False, True, False]]#行号全选，选择第1列和第3列数据
df.iloc[:, lambda df: [0, 2]]#选择dataframe的第1列与第3列

9.numpy.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None)
在指定的间隔内返回均匀间隔的数字。创建等差数列！
（1）num : int, optional(可选) 生成的样本数，默认是50。必须是非负。
（2）endpoint : bool, optional 如果是真，则一定包括stop，如果为False，一定不会有stop
（3）retstep : bool, optional
If True, return (samples, step), where step is the spacing between samples.(看例子)

np.linspace(2.0, 3.0, num=5, retstep=True)

(array([ 2. , 2.25, 2.5 , 2.75, 3. ]), 0.25)
（4）dtype : dtype, optional
The type of the output array. If dtype is not given, infer the data type from the other input arguments(推断这个输入用例从其他的输入中).

ax.plot(x, f, ‘r’, label=‘Prediction’) #设置点的横坐标，纵坐标，用红色线，并且设置Prediction为关键字参数
ax.scatter(df.人口, df.利润, label=‘Traning Data’) #以人口为横坐标，利润为纵坐标并且设置Traning Data为关键字参数
ax.legend(loc=2) #legend为显示图例函数，loc为设置图例显示的位置，loc=2即在左上方
ax.set_xlabel(‘Population’) #设置x轴变量
ax.set_ylabel(‘Profit’) #设置y轴变量
ax.set_title(‘Predicted Profit vs. Population Size’) #设置表头
plt.show()

（1）np.linalg.inv()：矩阵求逆
（2）np.linalg.det()：矩阵求行列式（标量）

import scipy.optimize as opt
result = opt.fmin_tnc(func=cost, x0=theta, fprime=gradient, args=(X, y))
见https://blog.csdn.net/weixin_30797199/article/details/95163586?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522160559520819725225000091%2522%252C%2522scm%2522%253A%252220140713.130102334…%2522%257D&request_id=160559520819725225000091&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2_alltop_click~default-1-95163586.first_rank_ecpm_v3_pc_rank_v2&utm_term=opt.fmin_tnc&spm=1018.2118.3001.4449

astype