Linear Regression in mojo with NDBuffer

The linear regression is the simplest machine learning algorithm. In this article I will use mojo NDBuffer to implement a simple linear regression algorithm from scratch. I will use NDArray class which was developed by in the previous article. First import the necessary libs and NDArray definition:

# common import
from String import String
from Bool import Bool
from List import VariadicList
from Buffer import NDBuffer
from List import DimList
from DType import DType
from Pointer import DTypePointer
from TargetInfo import dtype_sizeof, dtype_simd_width
from Index import StaticIntTuple
from Random import rand

alias nelts = dtype_simd_width[DType.f32]()

struct NDArray[rank:Int, dims:DimList, dtype:DType]:
    var ndb:NDBuffer[rank, dims, dtype]
    
    fn __init__(inout self, size:Int):
        let data = DTypePointer[dtype].alloc(size)
        self.ndb = NDBuffer[rank, dims, dtype](data)
        
    fn __getitem__(self, idxs:StaticIntTuple[rank]) -> SIMD[dtype,1]:
        return self.ndb.simd_load[1](idxs)
    
    fn __setitem__(self, idxs:StaticIntTuple[rank], val:SIMD[dtype,1]):
        self.ndb.simd_store[1](idxs, val)

Let’s assume we want to figure out this function:
y = W ⋅ X y = W \cdot X y=WX
Where:

  • W is the parameter
  • X is the sample design matrix. Each row is a sample. Each sample is a n dimension vector x ∈ R n \boldsymbol{x} \in R^{n} xRn. If we have m samples then X ∈ R m × n X \in R^{m \times n} XRm×n
    Here we will deal with a very simple toy problem. We assume n = 3 n=3 n=3 and m = 5 m=5 m=5. Let’s define the ith sample:
    x ( i ) = [ x 1 ( i ) x 2 ( i ) x 3 ( i ) ] ∈ R 3 × 1 , i ∈ { 1 , 2 , 3 , 4 , 5 } \boldsymbol{x}^{(i)} = \begin{bmatrix} x^{(i)}_1 \\ x^{(i)}_2 \\ x^{(i)}_3 \end{bmatrix} \in R^{3 \times 1}, i \in \{ 1, 2, 3, 4, 5\} x(i)= x1(i)x2(i)x3(i) R3×1,i{1,2,3,4,5}
    Notes:
  • i is the index of the sample;
  • 1, 2, 3 is the subscript of the feature dimension;
  • m is the total number of samples. In this case m=5;
  • n is the dimension of the feature vector. in this case n=3;

Let’s generate the dataset:

alias X_rank = 2
alias r1 = 5
alias r2 = 3
var X_size = r1 * r2
var X = NDArray[X_rank, DimList(r1, r2), DType.f32](X_size)
# geneate random number and set to X
var rvs = DTypePointer[DType.f32].alloc(X_size)
rand[DType.f32](rvs, X_size)
for d1 in range(r1):
    for d2 in range(r2):
        X[StaticIntTuple[X_rank](d1, d2)] = rvs.load(d1*r2+d2)*5.0 + 1.0

Let’s define the parameter w \boldsymbol{w} w:
w = [ 1.1 1.2 1.3 ] \boldsymbol{w} = \begin{bmatrix} 1.1 \\ 1.2 \\ 1.3 \end{bmatrix} w= 1.11.21.3

Let define the ground truth hypothesis function:
y = w T ⋅ x + b = [ 1.1 2.2 3.3 ] ⋅ [ x 1 x 2 x 3 ] + b = 1.1 ⋅ x 1 + 2.2 ⋅ x 2 + 3.3 ⋅ x 3 + 1.8 y = \boldsymbol{w}^{T} \cdot \boldsymbol{x} + b =\begin{bmatrix} 1.1 \\ 2.2 \\ 3.3 \end{bmatrix} \cdot \begin{bmatrix} x_{1} \\ x_{2} \\ x_{3} \end{bmatrix} + b = 1.1 \cdot x_{1} + 2.2 \cdot x_{2} + 3.3 \cdot x_{3} + 1.8 y=wTx+b= 1.12.23.3 x1x2x3 +b=1.1x1+2.2x2+3.3x3+1.8

Let’s define the paramter w \boldsymbol{w} w:

alias w_rank = 2
alias w_r1 = 3
alias w_r2 = 1
var w = NDArray[w_rank, DimList(w_r1, w_r2), DType.f32](w_r1 * w_r2)
w[StaticIntTuple[w_rank](0,0)] = 0.01
w[StaticIntTuple[w_rank](1,0)] = 0.02
w[StaticIntTuple[w_rank](2,0)] = 0.03
var b = SIMD[DType.f32, 1](0.0)

Now we can get the ground truch label 𝑦:

alias y_rank = 1
alias y_r1 = 5
var y = NDArray[y_rank, DimList(y_r1), DType.f32](y_r1)
for d1 in range(y_r1):
    y[StaticIntTuple[y_rank](d1)] = 1.1 * X[StaticIntTuple[X_rank](d1,0)] + 
                2.2 * X[StaticIntTuple[X_rank](d1,1)] + 
                3.3 * X[StaticIntTuple[X_rank](d1,2)] + 1.8

Let define the function get_batch to get a mini batch from the training dataset:

alias batch_size = 2
alias batch_rank = 2
# idx can only be 0,1 We will ignore the last element in X.
fn get_batch(inout batch_X:NDArray[batch_rank, DimList(batch_size, r2), DType.f32],
             inout batch_y:NDArray[y_rank, DimList(batch_size), DType.f32],
             X:NDArray[X_rank, DimList(r1, r2), DType.f32], 
             y:NDArray[y_rank, DimList(y_r1), DType.f32],
             batch_idx:Int):
    for b_idx in range(batch_size):
        batch_y[StaticIntTuple[y_rank](b_idx)] = y[StaticIntTuple[y_rank]
        		(batch_size*batch_idx+b_idx)]
        for c_idx in range(r2):
            batch_X[StaticIntTuple[batch_rank](b_idx, c_idx)] = 
            		X[StaticIntTuple[X_rank](batch_size*batch_idx+b_idx,c_idx)]

Let’s discuss the math theory of linear regress. For ith sample we will omit the (i) subscript for simplicity. The calculated label y ^ \hat{y} y^:
y ^ = w 1 ⋅ x 1 + w 2 ⋅ x 2 + w 3 ⋅ x 3 + b \hat{y} = w_{1} \cdot x_{1} + w_{2} \cdot x_{2} + w_{3} \cdot x_{3} + b y^=w1x1+w2x2+w3x3+b
As we have the ground truth label y we define the loss function as:
l = 1 2 ( y ^ − y ) 2 = 1 2 ( w 1 ⋅ x 1 + w 2 ⋅ x 2 + w 3 ⋅ x 3 + b − y ) 2 \mathcal{l} = \frac{1}{2}(\hat{y}-y)^{2} = \frac{1}{2}(w_{1} \cdot x_{1} + w_{2} \cdot x_{2} + w_{3} \cdot x_{3} + b - y)^{2} l=21(y^y)2=21(w1x1+w2x2+w3x3+by)2
According to linear regression algorithm we will set random value to parameter w \boldsymbol{w} w and zero to b b b. We will calculate y by using these parameters setting. Then we calculate the loss which represent how good our parameters are. Our task is to find the best parameters setting to let the loss minmum:
arg ⁡ min ⁡ w , b l \arg\min_{\boldsymbol{w},b} \mathcal{l} argw,bminl

To get the minmum parameter we will get each individual parameter’s gradient of loss and adjust the parameter against the gradient direction. This is the gradient descent algorithm. So let get parameter w 1 w_{1} w1 gradient of loss:
∂ l ∂ w 1 = ∂ ( 1 2 ( w 1 ⋅ x 1 + w 2 ⋅ x 2 + w 3 ⋅ x 3 + b − y ) 2 ) ∂ w 1 = ∂ ( 1 2 ( w 1 ⋅ x 1 + w 2 ⋅ x 2 + w 3 ⋅ x 3 + b − y ) 2 ) ∂ ( ( w 1 ⋅ x 1 + w 2 ⋅ x 2 + w 3 ⋅ x 3 + b − y ) ) ⋅ ∂ ( w 1 ⋅ x 1 + w 2 ⋅ x 2 + w 3 ⋅ x 3 + b − y ) ∂ w 1 = ( w 1 ⋅ x 1 + w 2 ⋅ x 2 + w 3 ⋅ x 3 + b − y ) ⋅ x 1 \frac{\partial{\mathcal{l}}}{\partial{w_{1}}} = \frac{\partial{ \big( \frac{1}{2}(w_{1} \cdot x_{1} + w_{2} \cdot x_{2} + w_{3} \cdot x_{3} + b - y)^{2} \big) }}{\partial{w_{1}}} \\ = \frac{\partial{ \big( \frac{1}{2}(w_{1} \cdot x_{1} + w_{2} \cdot x_{2} + w_{3} \cdot x_{3} + b - y)^{2} \big) }}{\partial{ \big( (w_{1} \cdot x_{1} + w_{2} \cdot x_{2} + w_{3} \cdot x_{3} + b - y) \big) }} \cdot \frac{\partial{ (w_{1} \cdot x_{1} + w_{2} \cdot x_{2} + w_{3} \cdot x_{3} + b - y) }} { \partial{w_{1}} } \\ = (w_{1} \cdot x_{1} + w_{2} \cdot x_{2} + w_{3} \cdot x_{3} + b - y) \cdot x_{1} w1l=w1(21(w1x1+w2x2+w3x3+by)2)=((w1x1+w2x2+w3x3+by))(21(w1x1+w2x2+w3x3+by)2)w1(w1x1+w2x2+w3x3+by)=(w1x1+w2x2+w3x3+by)x1
We use the chain rule of gradient in the above formula. We can get all parameters gradient of loss in the same way:
∂ l ∂ w 1 = ( w 1 ⋅ x 1 + w 2 ⋅ x 2 + w 3 ⋅ x 3 + b − y ) ⋅ x 1 ∂ l ∂ w 2 = ( w 1 ⋅ x 1 + w 2 ⋅ x 2 + w 3 ⋅ x 3 + b − y ) ⋅ x 2 ∂ l ∂ w 3 = ( w 1 ⋅ x 1 + w 2 ⋅ x 2 + w 3 ⋅ x 3 + b − y ) ⋅ x 3 ∂ l ∂ b = ( w 1 ⋅ x 1 + w 2 ⋅ x 2 + w 3 ⋅ x 3 + b − y ) \frac{\partial{\mathcal{l}}}{\partial{w_{1}}} = (w_{1} \cdot x_{1} + w_{2} \cdot x_{2} + w_{3} \cdot x_{3} + b - y) \cdot x_{1} \\ \frac{\partial{\mathcal{l}}}{\partial{w_{2}}} = (w_{1} \cdot x_{1} + w_{2} \cdot x_{2} + w_{3} \cdot x_{3} + b - y) \cdot x_{2} \\ \frac{\partial{\mathcal{l}}}{\partial{w_{3}}} = (w_{1} \cdot x_{1} + w_{2} \cdot x_{2} + w_{3} \cdot x_{3} + b - y) \cdot x_{3} \\ \frac{\partial{\mathcal{l}}}{\partial{b}} = (w_{1} \cdot x_{1} + w_{2} \cdot x_{2} + w_{3} \cdot x_{3} + b - y) w1l=(w1x1+w2x2+w3x3+by)x1w2l=(w1x1+w2x2+w3x3+by)x2w3l=(w1x1+w2x2+w3x3+by)x3bl=(w1x1+w2x2+w3x3+by)

Assume the learning rate is α \alpha α then we can update the parameters:
w 1 : = w 1 − α ⋅ l ∂ w 1 w 2 : = w 2 − α ⋅ l ∂ w 2 w 3 : = w 3 − α ⋅ l ∂ w 3 b : = b − α ⋅ l ∂ b w_{1} := w_{1} - \alpha \cdot \frac{\mathcal{l}}{\partial{w_{1}}} \\ w_{2} := w_{2} - \alpha \cdot \frac{\mathcal{l}}{\partial{w_{2}}} \\ w_{3} := w_{3} - \alpha \cdot \frac{\mathcal{l}}{\partial{w_{3}}} \\ b := b - \alpha \cdot \frac{\mathcal{l}}{\partial{b}} w1:=w1αw1lw2:=w2αw2lw3:=w3αw3lb:=bαbl

let epochs = 1000
var y_hat = NDArray[y_rank, DimList(batch_size), DType.f32](batch_size)
alias loss_rank = 1
var loss = NDArray[loss_rank, DimList(batch_size), DType.f32](batch_size)
var coff = 0.5
var Xi = NDArray[batch_rank, DimList(batch_size, r2), DType.f32](batch_size*r2)
var yi = NDArray[y_rank, DimList(batch_size), DType.f32](batch_size)
var lr = 0.001
for epoch in range(epochs):
    for bidx in range(batch_size):
        get_batch(Xi, yi, X, y, bidx)
        # forward pass
        y_hat[StaticIntTuple[y_rank](0)] = 
        			w[StaticIntTuple[w_rank](0,0)]*Xi[StaticIntTuple[X_rank](0,0)] + 
                    w[StaticIntTuple[w_rank](1,0)]*Xi[StaticIntTuple[X_rank](0,1)] + 
                    w[StaticIntTuple[w_rank](2,0)]*Xi[StaticIntTuple[X_rank](0,2)] +
                    b
        y_hat[StaticIntTuple[y_rank](1)] = 
        			w[StaticIntTuple[w_rank](0,0)]*Xi[StaticIntTuple[X_rank](1,0)] + 
                    w[StaticIntTuple[w_rank](1,0)]*Xi[StaticIntTuple[X_rank](1,1)] + 
                    w[StaticIntTuple[w_rank](2,0)]*Xi[StaticIntTuple[X_rank](1,2)] +
                    b
        # calculate the loss
        loss[StaticIntTuple[loss_rank](0)] = coff *(
                (y[StaticIntTuple[y_rank](0)]-y_hat[StaticIntTuple[y_rank](0)])*
                (y[StaticIntTuple[y_rank](0)]-y_hat[StaticIntTuple[y_rank](0)])
        )
        loss[StaticIntTuple[loss_rank](1)] = coff *(
                (y[StaticIntTuple[y_rank](1)]-y_hat[StaticIntTuple[y_rank](1)])*
                (y[StaticIntTuple[y_rank](1)]-
                y_hat[StaticIntTuple[y_rank](1)])
        )
        g_w1 = (y_hat[StaticIntTuple[y_rank](0)]-
        		y[StaticIntTuple[y_rank](0)])*Xi[StaticIntTuple[X_rank](0,0)] + 
                (y_hat[StaticIntTuple[y_rank](1)]-
                y[StaticIntTuple[y_rank](1)])*Xi[StaticIntTuple[X_rank](1,0)]
        w[StaticIntTuple[w_rank](0,0)] -= lr*g_w1
        g_w2 = (y_hat[StaticIntTuple[y_rank](0)]-
        		y[StaticIntTuple[y_rank](0)])*Xi[StaticIntTuple[X_rank](0,1)] + 
                (y_hat[StaticIntTuple[y_rank](1)]-
                y[StaticIntTuple[y_rank](1)])*Xi[StaticIntTuple[X_rank](1,1)]
        w[StaticIntTuple[w_rank](1,0)] -= lr*g_w2
        g_w3 = (y_hat[StaticIntTuple[y_rank](0)]-
        y[StaticIntTuple[y_rank](0)])*Xi[StaticIntTuple[X_rank](0,2)] + 
                (y_hat[StaticIntTuple[y_rank](1)]-
                y[StaticIntTuple[y_rank](1)])*Xi[StaticIntTuple[X_rank](1,2)]
        w[StaticIntTuple[w_rank](2,0)] -= lr*g_w3
        g_b = (y_hat[StaticIntTuple[y_rank](0)]-y[StaticIntTuple[y_rank](0)]) + 
                (y_hat[StaticIntTuple[y_rank](1)]-y[StaticIntTuple[y_rank](1)])
        b -= lr*g_b
        loss_val = loss[StaticIntTuple[loss_rank](0)] + 
        		loss[StaticIntTuple[loss_rank](0)]
        print('epoch_', epoch, ': idx=', bidx, ' loss=', loss_val, 
        		'; w1=', w[StaticIntTuple[w_rank](0,0)], 
              ', w2=', w[StaticIntTuple[w_rank](1,0)], 
              ', w3=', w[StaticIntTuple[w_rank](2,0)], ', b=', b, ';')
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值