Lua+torch 绘图,求最小值等(机器学习初步)


特别注意,后续的绘图代码必须要使用Lua5.1,才能运行,否则会出现由于Lua5.2,5.3升级而导致unpack的问题,而且gnuplot的作者至今并未回复此issue,详见https://github.com/torch/gnuplot/issues/41,目前无好的解决办法。

Lua+torch 绘图,求最小值等(机器学习初步)

Here are five simple hands-on steps, to get started with Torch!
This tutorial supposes the package torch is already required via

require 'torch'

or that you are using the REPL th (which requires it automatically).

1. Define a positive definite quadratic form

We rely on a few torch functions here:

  • rand() which creates tensor drawn from uniform distribution
  • t() which transposes a tensor (note it returns a new view)
  • dot() which performs a dot product between two tensors
  • eye() which returns a identity matrix
  • * operator over matrices (which performs a matrix-vector or matrix-matrix multiplication)

We first make sure the random seed is the same for everyone

torch.manualSeed(1234)
-- choose a dimension
N = 5

-- create a random NxN matrix
A = torch.rand(N, N)

-- make it symmetric positive
A = A*A:t()

-- make it definite
A:add(0.001, torch.eye(N))

-- add a linear term
b = torch.rand(N)

-- create the quadratic form
function J(x)
   return 0.5*x:dot(A*x)-b:dot(x)
end

Printing the function value (here on a random point) can be easily done with:

print(J(torch.rand(N)))

2. Find the exact minimum

We can inverse the matrix (which might not be numerically optimal)

xs = torch.inverse(A)*b
print(string.format('J(x^*) = %g', J(xs)))

3. Search the minimum by gradient descent

We first define the gradient w.r.t. x of J(x):

function dJ(x)
  return A*x-b
end

We then define some current solution:

x = torch.rand(N)

And then apply gradient descent (with a given learning rate lr) for a while:

lr = 0.01
for i=1,20000 do
  x = x - dJ(x)*lr
  -- we print the value of the objective function at each iteration
  print(string.format('at iter %d J(x) = %f', i, J(x)))
end

You should see

...
at iter 19995 J(x) = -3.135664
at iter 19996 J(x) = -3.135664
at iter 19997 J(x) = -3.135665
at iter 19998 J(x) = -3.135665
at iter 19999 J(x) = -3.135665
at iter 20000 J(x) = -3.135666

4. Using the optim package

Want to use more advanced optimization techniques, like conjugate gradient
or LBFGS? The optim package is there for that purpose! First, we need to
install it:

luarocks install optim
A word on local variables

In practice, it is never a good idea to use global variables. Use local at
everywhere. In our examples, we have defined everything in global, such that
they can be cut-and-pasted in the interpreter command line.
Indeed, defining a local like:

local A = torch.rand(N, N)

will be only available to the current scope, which, when running the interpreter, is limited
to the current input line. Subsequent lines would not have access to this local.

In lua one can define a scope with the do...end directives:

do
   local A = torch.rand(N, N)
   print(A)
end
print(A)

If you cut-and-paste this in the command line, the first print will be a
5x5 matrix (because the local A is defined for the duration of the scope
do...end), but will be nil afterwards.

Defining a closure with an upvalue

We need to define a closure which returns both J(x) and dJ(x). Here we
define a scope with do...end, such that the local variable neval is an
upvalue to JdJ(x): only JdJ(x) will be aware of it. Note that in a
script, one would not need to have the do...end scope, as the scope of
neval would be until the end of the script file (and not the end of the
line like the command line).

do
   local neval = 0
   function JdJ(x)
      local Jx = J(x)
      neval = neval + 1
      print(string.format('after %d evaluations J(x) = %f', neval, Jx))
      return Jx, dJ(x)
   end
end
Training with optim

The package is not loaded by default, so let’s require it:

require 'optim'

We first define a state for conjugate gradient:

state = {
   verbose = true,
   maxIter = 100
}

and now we train:

x = torch.rand(N)
optim.cg(JdJ, x, state)

You should see something like:

after 120 evaluation J(x) = -3.136835
after 121 evaluation J(x) = -3.136836
after 122 evaluation J(x) = -3.136837
after 123 evaluation J(x) = -3.136838
after 124 evaluation J(x) = -3.136840
after 125 evaluation J(x) = -3.136838

5. Plot

Plotting can be achieved in various ways. For example, one could use the
recent iTorch package. Here, we are
going to use gnuplot.

luarocks install gnuplot

Store intermediate function evaluations

We modify slightly the closure we had previously, such that it stores
intermediate function evaluations (as well as the real time it took to
train so far):

evaluations = {}
time = {}
timer = torch.Timer()
neval = 0
function JdJ(x)
   local Jx = J(x)
   neval = neval + 1
   print(string.format('after %d evaluations, J(x) = %f', neval, Jx))
   table.insert(evaluations, Jx)
   table.insert(time, timer:time().real)
   return Jx, dJ(x)
end

Now we can train it:

state = {
   verbose = true,
   maxIter = 100
}

x0 = torch.rand(N)
cgx = x0:clone() -- make a copy of x0
timer:reset()
optim.cg(JdJ, cgx, state)

-- we convert the evaluations and time tables to tensors for plotting:
cgtime = torch.Tensor(time)
cgevaluations = torch.Tensor(evaluations)

Add support for stochastic gradient descent

Let’s add the training with stochastic gradient, using optim:

evaluations = {}
time = {}
neval = 0
state = {
  lr = 0.1
}

-- we start from the same starting point than for CG
x = x0:clone()

-- reset the timer!
timer:reset()

-- note that SGD optimizer requires us to do the loop
for i=1,1000 do
  optim.sgd(JdJ, x, state)
  table.insert(evaluations, Jx)
end
  
sgdtime = torch.Tensor(time)
sgdevaluations = torch.Tensor(evaluations)

Final plot

We can now plot our graphs. A first simple approach is to use gnuplot.plot(x, y).
Here we precede it with gnuplot.figure() to make sure plots are on different figures.

require 'gnuplot'
gnuplot.figure(1)
gnuplot.title('CG loss minimisation over time')
gnuplot.plot(cgtime, cgevaluations)

gnuplot.figure(2)
gnuplot.title('SGD loss minimisation over time')
gnuplot.plot(sgdtime, sgdevaluations)

A more advanced way, which plots everything on the same graph would be the following. Here we save everything
in a PNG file.

gnuplot.pngfigure('plot.png')
gnuplot.plot(
   {'CG',  cgtime,  cgevaluations,  '-'},
   {'SGD', sgdtime, sgdevaluations, '-'})
gnuplot.xlabel('time (s)')
gnuplot.ylabel('J(x)')
gnuplot.plotflush()
  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值