Abstract
This article is trying to share Linear Regression Model usage and basic concept information with classmates and teachers in Machine Learning class. The first part is about Linear Regression mathematical model and matrix expression via Neural network model. The 2nd part shows how to implement Linear Regression Model from scratch without using any 3rd party library (e.g. pytorch, tensor flow and so on). The last part is to introduce the GPU library in CUDA.
目录
Basic Concept of Linear Regression Model usage in Neural Network Model
Implementing Simple Linear Model from Scratch in Neural Network Model
Mathematical proof of Jacobian Matrix
C++ Program which implements the SSE regression model
Introduction of CUDA library and Parallel Programming
Advanced questions of Machine Learning
Basic Concept of Linear Regression Model usage in Neural Network Model
Deep learning includes 2 major parts. They are feedforward process and Backpropagation process. Above picture shows feedforward process. Where X1, X2, ….. Xp are the independent variables. Y1, Y2,….Yn are dependent variable. If we only consider the training model uses linear regression model (i.e. MSE/SSE) and the Activate function is selected as f(x) = x. then we can simply to calculate the gradient of Backpropagation process of weighting values as below formula of output layer coefficients (weighting variables of YWi) as:
Implementing Simple Linear Model from Scratch in Neural Network Model
If we simplify the Neural Network model as just one input layer with only one out put node as following figure. Then Its implies that we are trying to calculate the linear regression model of below formula. In other words, we are trying to find a hyper plane to have the shortest distance on Z axis.
For the simplified Linear Regression Model, we can use the gradient method to update the weighting values in Backpropagation process. The gradient of function F(X) can be expressed as to minimize below SSE:
We can use the matrix formula to explain the data in below first picture and express the gradient of the partial derivative of w in Jacobian Matrix form can be expressed as second one.
Jacobian Matrix:
Cost function derivative expressed in Matrix form:
where alpha is the learning rate.
Mathematical proof of Jacobian Matrix
C++ Program which implements the SSE regression model
Below is the C++ source code and screenshot for reading 100 random (Xi,Yi,Zi) data from csv file. Then calculate the interpolation plane and draw it in 3D model. Here I also use python->Sklearn->LinearRegression API to verify the coefficients and interception equals to the C++ program calculation result.
[Regression Plane is located between the scatter sample points]:
[Coefficients Comparison between Python and C++ program]:
[Screenshots of Python and C++ source code]:
C++ Part:
Introduction of CUDA library and Parallel Programming
Folloing statements comes from Nvidia official website: CUDA C++ Programming GuideCUDA C++ Programming Guide.
The advent of multicore CPUs and manycore GPUs means that mainstream processor chips are now parallel systems. The challenge is to develop application software that transparently scales its parallelism to leverage the increasing number of processor cores, much as 3D graphics applications transparently scale their parallelism to manycore GPUs with widely varying numbers of cores. The CUDA parallel programming model is designed to overcome this challenge while maintaining a low learning curve for programmers familiar with standard programming languages such as C. At its core are three key abstractions - a hierarchy of thread groups, shared memories, and barrier synchronization - that are simply exposed to the programmer as a minimal set of language extensions.
Advanced questions of Machine Learning
Question1: What is the detailed mathematical proof of Backpropagation process? The Hidden Layer of the mathematical proof seems to have d/dx of f(g(h(x))). What’s the matrix form of the hidden layer Jacobian matrix.
Question2: How to normalize skill of the independent variables when large matrix operation is needed?