Linear Regression in mojo with NDBuffer-CSDN博客

本文链接：https://blog.csdn.net/Yt7589/article/details/130864909

本文介绍了如何使用自定义的NDArray类从头开始实现线性回归算法。通过创建数据集，定义参数向量w和偏置b，然后计算预测值与真实值之间的损失，利用梯度下降法更新参数，以最小化损失函数。整个过程在有限的样本和特征维度下进行了说明。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

The linear regression is the simplest machine learning algorithm. In this article I will use mojo NDBuffer to implement a simple linear regression algorithm from scratch. I will use NDArray class which was developed by in the previous article. First import the necessary libs and NDArray definition:

# common import
from String import String
from Bool import Bool
from List import VariadicList
from Buffer import NDBuffer
from List import DimList
from DType import DType
from Pointer import DTypePointer
from TargetInfo import dtype_sizeof, dtype_simd_width
from Index import StaticIntTuple
from Random import rand

alias nelts = dtype_simd_width[DType.f32]()

struct NDArray[rank:Int, dims:DimList, dtype:DType]:
    var ndb:NDBuffer[rank, dims, dtype]
    
    fn __init__(inout self, size:Int):
        let data = DTypePointer[dtype].alloc(size)
        self.ndb = NDBuffer[rank, dims, dtype](data)
        
    fn __getitem__(self, idxs:StaticIntTuple[rank]) -> SIMD[dtype,1]:
        return self.ndb.simd_load[1](idxs)
    
    fn __setitem__(self, idxs:StaticIntTuple[rank], val:SIMD[dtype,1]):
        self.ndb.simd_store[1](idxs, val)

Let’s assume we want to figure out this function:
$\cdot X$
Where:

W is the parameter
X is the sample design matrix. Each row is a sample. Each sample is a n dimension vector $\boldsymbol{x} \in R^{n}$ . If we have m samples then $\in R^{m \times n}$
Here we will deal with a very simple toy problem. We assume $n = 3$ and $m = 5$ . Let’s define the ith sample:
$\boldsymbol{x}^{(i)} = \begin{bmatrix} x^{(i)}_1 \\ x^{(i)}_2 \\ x^{(i)}_3 \end{bmatrix} \in R^{3 \times 1}, i \in \{ 1, 2, 3, 4, 5\}$
Notes:
i is the index of the sample;
1, 2, 3 is the subscript of the feature dimension;
m is the total number of samples. In this case m=5;
n is the dimension of the feature vector. in this case n=3;

Let’s generate the dataset:

alias X_rank = 2
alias r1 = 5
alias r2 = 3
var X_size = r1 * r2
var X = NDArray[X_rank, DimList(r1, r2), DType.f32](X_size)
# geneate random number and set to X
var rvs = DTypePointer[DType.f32].alloc(X_size)
rand[DType.f32](rvs, X_size)
for d1 in range(r1):
    for d2 in range(r2):
        X[StaticIntTuple[X_rank](d1, d2)] = rvs.load(d1*r2+d2)*5.0 + 1.0

Let’s define the parameter $\boldsymbol{w}$ :
$\boldsymbol{w} = \begin{bmatrix} 1.1 \\ 1.2 \\ 1.3 \end{bmatrix}$

Let define the ground truth hypothesis function:
$\boldsymbol{w}^{T} \cdot \boldsymbol{x} + b =\begin{bmatrix} 1.1 \\ 2.2 \\ 3.3 \end{bmatrix} \cdot \begin{bmatrix} x_{1} \\ x_{2} \\ x_{3} \end{bmatrix} + b = 1.1 \cdot x_{1} + 2.2 \cdot x_{2} + 3.3 \cdot x_{3} + 1.8$

Let’s define the paramter $\boldsymbol{w}$ :

alias w_rank = 2
alias w_r1 = 3
alias w_r2 = 1
var w = NDArray[w_rank, DimList(w_r1, w_r2), DType.f32](w_r1 * w_r2)
w[StaticIntTuple[w_rank](0,0)] = 0.01
w[StaticIntTuple[w_rank](1,0)] = 0.02
w[StaticIntTuple[w_rank](2,0)] = 0.03
var b = SIMD[DType.f32, 1](0.0)

Now we can get the ground truch label 𝑦:

alias y_rank = 1
alias y_r1 = 5
var y = NDArray[y_rank, DimList(y_r1), DType.f32](y_r1)
for d1 in range(y_r1):
    y[StaticIntTuple[y_rank](d1)] = 1.1 * X[StaticIntTuple[X_rank](d1,0)] + 
                2.2 * X[StaticIntTuple[X_rank](d1,1)] + 
                3.3 * X[StaticIntTuple[X_rank](d1,2)] + 1.8

Let define the function get_batch to get a mini batch from the training dataset:

alias batch_size = 2
alias batch_rank = 2
# idx can only be 0,1 We will ignore the last element in X.
fn get_batch(inout batch_X:NDArray[batch_rank, DimList(batch_size, r2), DType.f32],
             inout batch_y:NDArray[y_rank, DimList(batch_size), DType.f32],
             X:NDArray[X_rank, DimList(r1, r2), DType.f32], 
             y:NDArray[y_rank, DimList(y_r1), DType.f32],
             batch_idx:Int):
    for b_idx in range(batch_size):
        batch_y[StaticIntTuple[y_rank](b_idx)] = y[StaticIntTuple[y_rank]
        		(batch_size*batch_idx+b_idx)]
        for c_idx in range(r2):
            batch_X[StaticIntTuple[batch_rank](b_idx, c_idx)] = 
            		X[StaticIntTuple[X_rank](batch_size*batch_idx+b_idx,c_idx)]

Let’s discuss the math theory of linear regress. For ith sample we will omit the (i) subscript for simplicity. The calculated label $\hat{y}$ :
$\hat{y} = w_{1} \cdot x_{1} + w_{2} \cdot x_{2} + w_{3} \cdot x_{3} + b$
As we have the ground truth label y we define the loss function as:
$\mathcal{l} = \frac{1}{2}(\hat{y}-y)^{2} = \frac{1}{2}(w_{1} \cdot x_{1} + w_{2} \cdot x_{2} + w_{3} \cdot x_{3} + b - y)^{2}$
According to linear regression algorithm we will set random value to parameter $\boldsymbol{w}$ and zero to $b$ . We will calculate y by using these parameters setting. Then we calculate the loss which represent how good our parameters are. Our task is to find the best parameters setting to let the loss minmum:
$\arg\min_{\boldsymbol{w},b} \mathcal{l}$

To get the minmum parameter we will get each individual parameter’s gradient of loss and adjust the parameter against the gradient direction. This is the gradient descent algorithm. So let get parameter $w_{1}$ gradient of loss:
$\frac{\partial{\mathcal{l}}}{\partial{w_{1}}} = \frac{\partial{ \big( \frac{1}{2}(w_{1} \cdot x_{1} + w_{2} \cdot x_{2} + w_{3} \cdot x_{3} + b - y)^{2} \big) }}{\partial{w_{1}}} \\ = \frac{\partial{ \big( \frac{1}{2}(w_{1} \cdot x_{1} + w_{2} \cdot x_{2} + w_{3} \cdot x_{3} + b - y)^{2} \big) }}{\partial{ \big( (w_{1} \cdot x_{1} + w_{2} \cdot x_{2} + w_{3} \cdot x_{3} + b - y) \big) }} \cdot \frac{\partial{ (w_{1} \cdot x_{1} + w_{2} \cdot x_{2} + w_{3} \cdot x_{3} + b - y) }} { \partial{w_{1}} } \\ = (w_{1} \cdot x_{1} + w_{2} \cdot x_{2} + w_{3} \cdot x_{3} + b - y) \cdot x_{1}$
We use the chain rule of gradient in the above formula. We can get all parameters gradient of loss in the same way:
$\frac{\partial{\mathcal{l}}}{\partial{w_{1}}} = (w_{1} \cdot x_{1} + w_{2} \cdot x_{2} + w_{3} \cdot x_{3} + b - y) \cdot x_{1} \\ \frac{\partial{\mathcal{l}}}{\partial{w_{2}}} = (w_{1} \cdot x_{1} + w_{2} \cdot x_{2} + w_{3} \cdot x_{3} + b - y) \cdot x_{2} \\ \frac{\partial{\mathcal{l}}}{\partial{w_{3}}} = (w_{1} \cdot x_{1} + w_{2} \cdot x_{2} + w_{3} \cdot x_{3} + b - y) \cdot x_{3} \\ \frac{\partial{\mathcal{l}}}{\partial{b}} = (w_{1} \cdot x_{1} + w_{2} \cdot x_{2} + w_{3} \cdot x_{3} + b - y)$

Assume the learning rate is $\alpha$ then we can update the parameters:
$w_{1} := w_{1} - \alpha \cdot \frac{\mathcal{l}}{\partial{w_{1}}} \\ w_{2} := w_{2} - \alpha \cdot \frac{\mathcal{l}}{\partial{w_{2}}} \\ w_{3} := w_{3} - \alpha \cdot \frac{\mathcal{l}}{\partial{w_{3}}} \\ b := b - \alpha \cdot \frac{\mathcal{l}}{\partial{b}}$

let epochs = 1000
var y_hat = NDArray[y_rank, DimList(batch_size), DType.f32](batch_size)
alias loss_rank = 1
var loss = NDArray[loss_rank, DimList(batch_size), DType.f32](batch_size)
var coff = 0.5
var Xi = NDArray[batch_rank, DimList(batch_size, r2), DType.f32](batch_size*r2)
var yi = NDArray[y_rank, DimList(batch_size), DType.f32](batch_size)
var lr = 0.001
for epoch in range(epochs):
    for bidx in range(batch_size):
        get_batch(Xi, yi, X, y, bidx)
        # forward pass
        y_hat[StaticIntTuple[y_rank](0)] = 
        			w[StaticIntTuple[w_rank](0,0)]*Xi[StaticIntTuple[X_rank](0,0)] + 
                    w[StaticIntTuple[w_rank](1,0)]*Xi[StaticIntTuple[X_rank](0,1)] + 
                    w[StaticIntTuple[w_rank](2,0)]*Xi[StaticIntTuple[X_rank](0,2)] +
                    b
        y_hat[StaticIntTuple[y_rank](1)] = 
        			w[StaticIntTuple[w_rank](0,0)]*Xi[StaticIntTuple[X_rank](1,0)] + 
                    w[StaticIntTuple[w_rank](1,0)]*Xi[StaticIntTuple[X_rank](1,1)] + 
                    w[StaticIntTuple[w_rank](2,0)]*Xi[StaticIntTuple[X_rank](1,2)] +
                    b
        # calculate the loss
        loss[StaticIntTuple[loss_rank](0)] = coff *(
                (y[StaticIntTuple[y_rank](0)]-y_hat[StaticIntTuple[y_rank](0)])*
                (y[StaticIntTuple[y_rank](0)]-y_hat[StaticIntTuple[y_rank](0)])
        )
        loss[StaticIntTuple[loss_rank](1)] = coff *(
                (y[StaticIntTuple[y_rank](1)]-y_hat[StaticIntTuple[y_rank](1)])*
                (y[StaticIntTuple[y_rank](1)]-
                y_hat[StaticIntTuple[y_rank](1)])
        )
        g_w1 = (y_hat[StaticIntTuple[y_rank](0)]-
        		y[StaticIntTuple[y_rank](0)])*Xi[StaticIntTuple[X_rank](0,0)] + 
                (y_hat[StaticIntTuple[y_rank](1)]-
                y[StaticIntTuple[y_rank](1)])*Xi[StaticIntTuple[X_rank](1,0)]
        w[StaticIntTuple[w_rank](0,0)] -= lr*g_w1
        g_w2 = (y_hat[StaticIntTuple[y_rank](0)]-
        		y[StaticIntTuple[y_rank](0)])*Xi[StaticIntTuple[X_rank](0,1)] + 
                (y_hat[StaticIntTuple[y_rank](1)]-
                y[StaticIntTuple[y_rank](1)])*Xi[StaticIntTuple[X_rank](1,1)]
        w[StaticIntTuple[w_rank](1,0)] -= lr*g_w2
        g_w3 = (y_hat[StaticIntTuple[y_rank](0)]-
        y[StaticIntTuple[y_rank](0)])*Xi[StaticIntTuple[X_rank](0,2)] + 
                (y_hat[StaticIntTuple[y_rank](1)]-
                y[StaticIntTuple[y_rank](1)])*Xi[StaticIntTuple[X_rank](1,2)]
        w[StaticIntTuple[w_rank](2,0)] -= lr*g_w3
        g_b = (y_hat[StaticIntTuple[y_rank](0)]-y[StaticIntTuple[y_rank](0)]) + 
                (y_hat[StaticIntTuple[y_rank](1)]-y[StaticIntTuple[y_rank](1)])
        b -= lr*g_b
        loss_val = loss[StaticIntTuple[loss_rank](0)] + 
        		loss[StaticIntTuple[loss_rank](0)]
        print('epoch_', epoch, ': idx=', bidx, ' loss=', loss_val, 
        		'; w1=', w[StaticIntTuple[w_rank](0,0)], 
              ', w2=', w[StaticIntTuple[w_rank](1,0)], 
              ', w3=', w[StaticIntTuple[w_rank](2,0)], ', b=', b, ';')