The Implementation of baseline algorithms of Machine Learning

最新推荐文章于 2022-05-25 18:50:54 发布

寒烟凝

最新推荐文章于 2022-05-25 18:50:54 发布

阅读量417

点赞数

分类专栏： python 机器学习＼数据挖掘

本文链接：https://blog.csdn.net/xiazai123time/article/details/79111941

版权

机器学习＼数据挖掘同时被 2 个专栏收录

5 篇文章 0 订阅

订阅专栏

python

1 篇文章 0 订阅

订阅专栏

[THE THEORY COMES FROM HOW TO IMPLEMENT BASELINE MACHINE LEARNING ALGORITHMS FROM SCRATCH WITH PYTHON]

Background

When coming across a problem, we want to use to either classification or regression methods to predict or generate the basic pattern about this specific problem. The commonest way is to build a model to learn the basic pattern based on the labeled data and then build a model ,after which we will make some predictions upon other data without labels based on the model. In this regard, how can we know the performance of the model we built, and with what kinds of methods can we evaluate our model. Here, I will implement two kinds of models called baseline machine learning algorithms with which we can compare the performance of our models or make some improvement based on that.

Random Prediction Algorithm

This algorithm generate the random outcome as observed in the training data for the testing data.

There are several things need to be kept in mind:

(1) The random outcome of testing data is depended on the outcomes training data, so we should first store all of theunique outcomes from training data, and then randomly give thoseunique data to testing data as their prediction outcome;

(2) We should fix the random number seed. Since those generated random numbers decide which outcome from training data to be given to that of the testing data. If we fixed the random number seed, it ensures us to get the same set of random numbers, and the same decisions when we make some predictions.

(3)Assuming the last column of our dataset is the label. so we just need the labels of the training data and then randomly assign them to the testing data as the prediction outcomes.

For example, we want to get the last column of the dataset in python, we could do this.

first, give a dataset.

So, we can see, I generated a 3X4 array as dataset, and assume the last column [3,7,11] as the labels of the dataset.

Then we visit each row and get the last data of the row (for row in test:....print(row[-1])).

At last I will randomly assign these numbers to testing data as their labels.

The algorithm should be this:

The Whole algorithm should be like this:

Zero_Rule_Algorithm

Compared with random_Algorithm to assign the prediction values for testing data, Zero_Rule_Algorithm assign the those values based on the possibilities of these values appeared in the labels of training dataset. For example, in the labels of training dataset, if there 90% are '0' and 10% are '1', we would be more likely to assign 0 as the prediction value for the testing data to gaurantee the 90% accuracy.

Based on which the implementation of the algorithm could be like following:

Firstly, we should store all the labels of training dataset;

Secondly, we should statistic the value that appears most of times in the training dataset;

Finally, assign the value as the prediction value of test dataset.

The whole code is like this:

Regression

Regression problems need to predict a real value. A good default prediction for real values is to predict the central tendency (The center can either be the median or mean of the values). It means that, we can use the mean or median value of the training dataset as prediction of the testing data.

寒烟凝

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
The Implementation of baseline algorithms of Machine Learning

[THE THEORY COMES FROM HOW TO IMPLEMENT BASELINE MACHINE LEARNING ALGORITHMS FROM SCRATCH WITH PYTHON]BackgroundWhen coming across a problem, we want to use to either classification or regression
复制链接

扫一扫

专栏目录