TensorFlow 与Python学习 II
一、Python相关函数的使用
1、range:左闭右开区间
range(start,stop,[step])
二、深层神经网络训练
1、选择非线性激活函数
2、信号前向传播:矩阵乘法
y
=
f
(
w
x
+
b
)
y=f(wx+b)
y=f(wx+b)
3、计算当前误差
δ
(
z
j
)
=
e
z
j
∑
k
=
1
K
e
z
k
H
(
y
,
y
^
)
=
−
∑
y
y
l
o
g
(
y
^
)
\delta(z_j)=\frac{e^{z_j}}{\sum_{k=1}^{K}e^{z_k}}\\ H(y,\hat{y})=-\sum_yylog(\hat{y})
δ(zj)=∑k=1KezkezjH(y,y^)=−y∑ylog(y^)
4、缩小误差
梯度下降法(求偏导)、设置学习率
∂
l
o
s
s
∂
w
(
o
)
=
∂
l
o
s
s
∂
y
^
⋅
∂
y
^
∂
O
⋅
∂
O
∂
w
(
o
)
η
=
η
s
⋅
d
e
c
a
y
_
r
a
t
e
s
t
e
p
_
c
o
u
n
t
d
e
c
a
y
_
c
o
u
n
t
w
n
e
w
=
w
o
l
d
−
η
∂
l
o
s
s
∂
w
\frac{\partial loss}{\partial w^{(o)}} =\frac{\partial loss}{\partial \hat{y}} \cdot \frac{\partial \hat{y}}{\partial O} \cdot \frac{\partial O}{\partial w^{(o)}}\\ \eta=\eta_s\cdot decay\_rate^{\frac{step\_count}{decay\_count}}\\ w_{new}=w_{old}-\eta \frac{\partial loss}{\partial w}
∂w(o)∂loss=∂y^∂loss⋅∂O∂y^⋅∂w(o)∂Oη=ηs⋅decay_ratedecay_countstep_countwnew=wold−η∂w∂loss
5、反复迭代
三、存在问题
1、损失函数不一定是凸函数,故梯度下降不一定能找到全局最优解,只能找到局部最优解
【解决】同时训练出多个模型,综合最优解
2、过拟合和欠拟合
- 通过引入正则化来解决过拟合问题
四、TensorFlow运行机制
1、计算图
2、会话
计算过程中,run函数可以同时计算多个张量,eval只能一次计算一个张量。
3、模型的保存:保存模型、恢复模型
20210701
1、替换pip源,可以用清华的pypi
pip install pip -U
pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
不过好像用豆瓣最快?
pip --default-timeout=100 install 库名称 -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com
2、升级pandas
Could not install packages due to an EnvironmentError
可以通过 --user选项来
3、简单数据可视化来分析
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
path='ex2data2.txt'
# 路径,索引项,表头
data2=pd.read_csv(path,header=None,names=['Test 1','Test 2','Accepted'])
print(data2.head())
# 数据可视化
positive=data2[data2['Accepted'].isin([1])] # isin 筛选功能,[1]是一个list
negative=data2[data2['Accepted'].isin([0])]
fig,ax=plt.subplots(figsize=(12,8))
# c=color the first two figures refers to data (x,y)
ax.scatter(positive['Test 1'],positive['Test 2'],s=50,c='b',marker='o',label='Accepted')
ax.scatter(negative['Test 1'],negative['Test 2'],s=50,c='r',marker='x',label='Rejected')
ax.legend() #加上图例
ax.set_xlabel('Test 1 Score')
ax.set_ylabel('Test 2 Score')
plt.show()
20210703
一、逻辑回归
1、新的代价函数的数学依据:最大化似然函数和最小化损失函数时等价的。
L
(
w
)
=
∏
(
p
(
x
i
)
)
y
i
⋅
[
1
−
p
(
x
i
)
]
y
i
L(w)=\prod(p(x_{i}))^{y_i} \cdot [1-p(x_i)]^{y_i}
L(w)=∏(p(xi))yi⋅[1−p(xi)]yi
取对数后得到现在的代价函数。
2、梯度下降的推导过程
J
(
θ
)
=
−
1
m
∑
i
=
1
m
[
y
(
i
)
l
o
g
(
h
θ
(
x
(
i
)
)
)
+
(
1
−
y
(
i
)
)
l
o
g
(
1
−
h
θ
(
x
(
i
)
)
)
]
\begin{aligned} J(\theta)=&-\frac{1}{m}\sum_{i=1}^m [y^{(i)}log(h_{\theta}(x^{(i)}))+(1-y^{(i)})log(1-h_{\theta}(x^{(i)}))] \end{aligned}
J(θ)=−m1i=1∑m[y(i)log(hθ(x(i)))+(1−y(i))log(1−hθ(x(i)))]
故
∂
∂
θ
j
J
(
θ
)
=
\frac{\partial}{\partial \theta_j}J(\theta)=
∂θj∂J(θ)=
−
1
m
[
y
(
i
)
x
j
(
i
)
1
+
e
θ
T
x
(
i
)
−
(
1
−
y
(
i
)
)
x
j
(
i
)
e
θ
T
x
(
i
)
1
+
e
θ
T
x
(
i
)
]
=
1
m
[
h
θ
(
x
(
i
)
)
−
y
(
i
)
]
x
j
(
i
)
\begin{aligned} &-\frac{1}{m}[y^{(i)}\frac{{x_j}^{(i)}}{1+e^{\theta^Tx^{(i)}}}-(1-y^{(i)})\frac{x_j^{(i)}e^{\theta^Tx^{(i)}}}{1+e^{\theta^Tx^{(i)}}}] \\ &=\frac{1}{m}[h_{\theta}(x^{(i)})-y^{(i)}]x_j^{(i)} \end{aligned}
−m1[y(i)1+eθTx(i)xj(i)−(1−y(i))1+eθTx(i)xj(i)eθTx(i)]=m1[hθ(x(i))−y(i)]xj(i)
二、完整的二分类代码
# -*- coding: utf-8 -*-
"""
Created on Thu Jul 1 16:15:33 2021
@author: LaiAng80586
"""
# logistic regression for classification
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy.optimize as opt # finding for best arguments
def sigmoid(z):
return 1/(1+np.exp(-z))
nums=np.arange(-10.0,10.0,step=0.01)
print(sigmoid(nums))
fig,a=plt.subplots()
a.plot(nums,sigmoid(nums),'r') # cichu meiyou denghao!
plt.show()
def cost(theta,X,y,learningRate): # with regularized cost in order not to get overfitting
theta=np.matrix(theta)
X=np.matrix(X)
y=np.matrix(y)
first=np.multiply(-y,np.log(sigmoid(X*theta.T)))
second=np.multiply(1-y,np.log(1-sigmoid(X*theta.T)))
reg=(learningRate/(2*len(X)))*np.sum(np.power(theta[:,1:theta.shape[1]],2))
return np.sum(first-second)/(len(X))+reg # len(X) refer to the size of training set
def gradient(theta,X,y,learningRate): # 0 without regularization
theta=np.matrix(theta)
X=np.matrix(X)
y=np.matrix(y)
parameters=int(theta.ravel().shape[1]) #ravel refers to convert the previous array into 1D array
grad=np.zeros(parameters)
error=sigmoid(X*theta.T)-y
for i in range(parameters):
term=np.multiply(error,X[:,i])
if(i==0):
grad[i]=np.sum(term)/len(X)
else :
grad[i]=(np.sum(term)/len(X))+((learningRate/len(X))*theta[:,i])
return grad
def predict(theta,X):
probability=sigmoid(X*theta.T)
return [1 if x>=0.5 else 0 for x in probability]
path='ex2data2.txt'
# 路径,索引项,表头
data2=pd.read_csv(path,header=None,names=['Test 1','Test 2','Accepted'])
print(data2.head())
# 数据可视化
positive=data2[data2['Accepted'].isin([1])] # isin 筛选功能,[1]是一个list
negative=data2[data2['Accepted'].isin([0])]
fig,ax=plt.subplots(figsize=(12,8))
# c=color the first two figures refers to data (x,y)
ax.scatter(positive['Test 1'],positive['Test 2'],s=50,c='b',marker='o',label='Accepted')
ax.scatter(negative['Test 1'],negative['Test 2'],s=50,c='r',marker='x',label='Rejected')
ax.legend() #加上图例
ax.set_xlabel('Test 1 Score')
ax.set_ylabel('Test 2 Score')
plt.show()
degree=5
x1=data2['Test 1']
x2=data2['Test 2']
data2.insert(3,'Ones',1)
for i in range(1,degree):
for j in range(0,i):
data2['F'+str(i)+str(j)]=np.power(x1,i-j)*np.power(x2,j)
data2.drop('Test 1',axis=1,inplace=True) #do not create new object and modify on former object
data2.drop('Test 2',axis=1,inplace=True)
print(data2.head())
# setting the data
cols=data2.shape[1]
X2=data2.iloc[:,1:cols] # the data in all rows but in specific cols
y2=data2.iloc[:,0:1]
X2=np.array(X2.values)
y2=np.array(y2.values)
theta2=np.zeros(11)
learningRate=1
#examine the cost and gradient function
print(cost(theta2,X2,y2,learningRate))
print(gradient(theta2,X2,y2,learningRate))
#prediection
result2=opt.fmin_tnc(func=cost,x0=theta2,fprime=gradient,args=(X2,y2,learningRate))
print(result2)
theta_min=np.matrix(result2[0])
predictions=predict(theta_min,X2)
correct=[1 if ((a==1 and b==1) or (a==0 and b==0)) else 0 for (a,b) in zip(predictions,y2)]
accuracy=(sum(map(int,correct))%len(correct))
print('accuracy={0}%'.format(accuracy))
20210712
一、向量化的梯度下降计算
这个并不是线性回归的正规方程法,只不过是把原来梯度下降循环赋值变成了矩阵乘法罢了。
二、几个概念的理解
逻辑回归
: 建立多特征的模型进行二分类
一对多
: 若干个二分类器,每一个分类器只是判断是这一类or不是这一类,只是
s
i
g
m
o
i
d
sigmoid
sigmoid之后会有不同的概率,选取概率最大的确定类别。
20210713
一、神经网络
对于
K
K
K分类而言,其代价函数是
K
K
K维向量,最终的输出也是
K
K
K维向量,这是与逻辑回归所不同的地方。
正向传播:计算预测函数
h
θ
(
x
)
h_{\theta}(x)
hθ(x)
反向传播:计算代价函数的偏导数
∂
∂
Θ
i
j
(
l
)
J
(
Θ
)
\frac{\partial}{\partial \Theta_{ij}^{(l)}} J(\Theta)
∂Θij(l)∂J(Θ)