我试图在Tensorflow中实现线性回归模型,其他约束(来自域)W和b术语必须是非负的.
我相信有几种方法可以做到这一点.
>我们可以修改成本函数来惩罚负权重[拉格朗日方法] [见:TensorFlow – best way to implement weight constraints
>我们可以自己计算梯度并将它们投影到[0,无限远] [投影梯度法]
方法1:拉格朗日
当我尝试第一种方法时,我经常会得到负面的b.
我修改了成本函数:
cost = tf.reduce_sum(tf.pow(pred-Y, 2))/(2*n_samples)
至:
cost = tf.reduce_sum(tf.pow(pred-Y, 2))/(2*n_samples)
nn_w = tf.reduce_sum(tf.abs(W) - W)
nn_b = tf.reduce_sum(tf.abs(b) - b)
constraint = 100.0*nn_w + 100*nn_b
cost_with_constraint = cost + constraint
保持nn_b和nn_w的系数非常高会导致不稳定性和非常高的成本.
这是完整的代码.
import numpy as np
import tensorflow as tf
n_samples = 50
train_X = np.linspace(1, 50, n_samples)
train_Y = 10*train_X + 6 +40*np.random.randn(50)
X = tf.placeholder("float")
Y = tf.placeholder("float")
# Set model weights
W = tf.Variable(np.random.randn(), name="weight")
b = tf.Variable(np.random.randn(), name="bias")
# Construct a linear model
pred = tf.add(tf.multiply(X, W), b)
# Gradient descent
learning_rate=0.0001
# Initializing the variables
init = tf.global_variables_initializer()
# Mean squared error
cost = tf.reduce_sum(tf.pow(pred-Y, 2))/(2*n_samples)
nn_w = tf.reduce_sum(tf.abs(W) - W)
nn_b = tf.reduce_sum(tf.abs(b) - b)
constraint = 1.0*nn_w + 100*nn_b
cost_with_constraint = cost + constraint
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost_with_constraint)
training_epochs=200
with tf.Session() as sess:
sess.run(init)
# Fit all training data
cost_array = np.zeros(training_epochs)
W_array = np.zeros(training_epochs)
b_array = np.zeros(training_epochs)
for epoch in range(training_epochs):
for (x, y) in zip(train_X, train_Y):
sess.run(optimizer, feed_dict={X: x, Y: y})
W_array[epoch] = sess.run(W)
b_array[epoch] = sess.run(b)
cost_array[epoch] = sess.run(cost, feed_dict={X: train_X, Y: train_Y})
以下是10次不同运行的b的平均值.
0 -1.101268
1 0.169225
2 0.158363
3 0.706270
4 -0.371205
5 0.244424
6 1.312516
7 -0.069609
8 -1.032187
9 -1.711668
显然,第一种方法并不是最优的.此外,在选择惩罚系数方面涉及很多艺术.
方法2:投影梯度
然后我想使用第二种方法,这种方法更有效.
gr = tf.gradients(cost, [W, b])
我们手动计算渐变并更新W和b.
with tf.Session() as sess:
sess.run(init)
for epoch in range(training_epochs):
for (x, y) in zip(train_X, train_Y):
W_del, b_del = sess.run(gr, feed_dict={X: x, Y: y})
W = max(0, (W - W_del)*learning_rate) #Project the gradient on [0, infinity]
b = max(0, (b - b_del)*learning_rate) # Project the gradient on [0, infinity]
这种方法似乎很慢.
我想知道是否有更好的方法来运行第二种方法,或者用第一种方法保证结果.我们能以某种方式允许优化器确保学习的权重是非负的吗?
编辑:如何在Autograd中执行此操作