线性问题能够用解析的方式解决
线性回归
如果一个预测问题,它的输入是变量x,目标y是连续值(实数或连续整数),预测函数f(x)的输出也是连续值,这种机器学习问题是回归问题。如果预测函数是线性的y=wx+b,该回归问题就是线性回归问题。
目标函数
根据损失函数,要调整参数的
Torch 上学习linear-regression
https://github.com/torch/demos/blob/master/linear-regression/example-linear-regression.lua
first require the necessary packages
Torch:the core torch package. Apart from tensor operations, has convolutions, cross-correlations, basic linear algebra operations, eigen values/vectors etc.
‘optim’ :an optimization library for Torch. SGD, Adagrad, Conjugate-Gradient, LBFGS, RProp and more.
‘nn’: Neural Network package for Torch. This package provides an easy and modular way to build and train simple or complex neural networks using Torch.
We will write the loss to a text file and read from there to plot the loss as training proceeds
optim provides also logging and plotting capabilities via the optim.Logger() function.
1. Create the training data
In all regression problems, some training data needs to be provided.
Here we store the data as a Torch Tensor (2D Array), where each row represents a training sample, and each column a variable. The first column is the target variable, and the others are the input variables.
In this example, we want to be able to predict the amount of corn produced, given the amount of fertilizer and insecticide used. In other words: fertilizer肥料 & insecticide杀虫剂 are our two input variables, and corn玉米 is our target value.
2. Define the model (predicator)
The model will have one layer (called a module), which takes the 2 inputs (fertilizer and insecticide) and produces the 1 output (corn).
Note that the Linear model specified below has 3 parameters:
1 for the weight assigned to fertilizer
1 for the weight assigned to insecticide
1 for the weight assigned to the bias term
Neural Network Package provides a modular way to build and train simple or complex neural networks using Torch .
Modules are the bricks used to build neural networks. Each are themselves neural networks, but can be combined with other networks using containers to create complex neural networks.
The linear model must be held in a container. A sequential container is appropriate since the outputs of each module become the inputs of the subsequent module in the model.
3. Define a loss function to be minimized.
we minimize the Mean Square Error (MSE) between the predictions of our linear model and the groundtruth available in the dataset.
MSECriterion:Creates a criterion that measures the mean squared error between n elements in the input x and output y:
4. Train the model
To minimize the loss defined above, using the linear model defined in 'model', we follow a stochastic gradient descent procedure (SGD).
Given an arbitrarily complex model, we can retrieve its trainable parameters, and the gradients of our loss function wrt these parameters by doing so:
[flatParameters, flatGradParameters] getParameters()
This function returns two tensors.
One for the flattened learnable parameters flatParameters.
Another for the gradients of the energy wrt to the learnable parameters flatGradParameters.
In the following code, we define a closure, feval, which computes the value of the loss function at a given point x, and the gradient of that function with respect to x. x is the vector of trainable weights, which, in this example, are all the weights of the linear matrix of our model, plus one bias.
Given the function above, we can now easily train the model using SGD.
For that, we need to define four key parameters:
+ a learning rate: the size of the step taken at each stochastic estimate of the gradient
+ a weight decay, to regularize the solution (L2 regularization)
+ a momentum term, to average steps over time
+ a learning rate decay, to let the algorithm converge more precisely
We're now good to go... all we have left to do is run over the dataset for a certain number of iterations, and perform a stochastic update at each iteration. The number of iterations is found empirically here, but should typically be determinined using cross-validation.
5. Test the trained model.
Now that the model is trained, one can test it by evaluating it