Easy to understand the top ten commonly used algorithms for machine learning

The unknown Word

The First columnThe Second Column
Nearest neighbor algorithm近邻算法
K - Nearest neighbor algorithm\(\sqrt{{\sum_{i=1}^n}(a_i-b_j)^2}\)

Decision Tree

According to some features,each node asks a question.By judging ,the data is divided into two categories,and then continue to ask questions.These problems are learned from existion data.When new data is added,the data can be divided into appropriate leaves according to the problem on the tree.

1433065-20180909150840976-1213033182.png

Random Forest

Randomly select data in the source data to form several subsets

1433065-20180909151346418-2071192695.png

The S matrix is the source data,with 1-N data,A B C is the feature,and the last column C is the category.

1433065-20180909151643790-439578566.png

M sub-matrices are randomly generated by S

1433065-20180909151948138-891101465.png
These M subsets get M decision trees

Put the new data into the M trees,get M classification results,count to see which category is the most predictive,and use this category as the final prediction result.
1433065-20180909154248690-1178971535.png

Logistic regression

When the prediction target is probabilistic,the value domain needs to satisfy greater than or equal to 0.when less than or equal to 1,at this time,a simple linear model cannot be used because the value range is beyond when the domain is not within a certain range.

1433065-20180909155750681-901378639.png

So it is better to have a model of this shape at this time

1433065-20180909155909640-1590605874.png

So how do you get such a model?
This model needs to satisfy two conditions:greater than or equal to 0,less than or equal to 1
Models greater than or equal to 0 can choose absolute value,square,square value,here using exponential function,must be greater than 0
Less than or equal to 1 with division,the numerator is itself,and the denominator is itself plus 1,which must be less than 1.

1433065-20180909161913086-471711777.png

After doing another deformation,I got the logistic regression model.

1433065-20180909162129586-98488584.png

The corresponding coefficients can be obtained by calculating the source data

1433065-20180909164837347-1542834990.png

In the end,there is the logistic graph

1433065-20180909164943362-2118427961.png

SVM(Support vector machine)

To separate the two types,you want to get a hyperplane.The optimal hyperplane is the maximum of the two types of margins.The margin is the distance between the distance between the hyperplane and the nearest point.As shown below,\(Z_2>Z_1\),so the green super Plane is better.

1433065-20180909170345980-206662909.png

Express this hyperplane as a linear equation,one above the line is greater than or equal to 1,and the other is less than or equal to -1

1433065-20180909171248720-527199155.png

The point-to-face distance is calculated according to the formula in the figure.

1433065-20180909171620642-1055953037.png

So the expression for the total margin is as follows,the goal is to maximize this margin,you need to minimize the denominator,so it becomes an optimization problem.

1433065-20180909172046076-1122437539.png

Giving three examples,we find the optimal hperplane,define the weight vector=(2,3)-(1,1)

1433065-20180909172301874-1666062697.png

Obtainning the weight vector as (a,2a),substitube two points into the equation,substitute (2,3) and another value=1,substitute (1,1) and another value=-1 to solve for a and truncation \(W_0\).The value,which in turn gives the expression of the hyperplane.

1433065-20180909181936020-889312243.png

After getting out A,substituting (a,2a) is the support vector
The equation for a and $W_0$ substituting the hyperplane is support vector machine.

Naive Bayes

Giving an application in NLP
Giving paregraph of text,return the emotional classification,the attitude of this text is positive,or negative.

1433065-20180909183647839-1347877120.png

In order to solve this problem,you can just look at some of the words.

1433065-20180909183808235-296565230.png

This text will only be represented by some words and their counts

1433065-20180909183935862-1746963007.png

The orginal question is :give you a sentense,which category it belongs to?Become a simpler and easier question through bayer rules

1433065-20180909184323907-209314545.png

The question becomes,what is the probability of this sentence appearing in this category,of course,don't forget the other towo probabilities in the fomula
For example,The probability that the word love appears in the positive case is 0.1,and the probability in the negative case is 0.001.

1433065-20180909185221240-1810217992.png

K nearest neighbors

When giving a new data,which of the k points closest to it is more,which class does the data belong to?
For example,to distinguish between cats and dogs,the shape of the claws and sound is judged.The circles and triangles are known to be classified.What kind of star does this represent?

1433065-20180909211456185-1090695879.png

When k=3,the points connecting the three lines are the last three points,so the circle is more,so this star belongs to the cat.

1433065-20180909211735058-985122105.png

K-means

I want to divide a set of data into three categories,with large pink values and small yellow values.
Initially initialized first,here is the simplest 3,2,1 as the initial value of each type.In the rest of the data,each calculates the distance from the three initial values and then classifies it into the category of the initial value closest to it.

1433065-20180909213714886-921382856.png

After classifying the class,calculate the average of each class as the center point of the new round.

1433065-20180909213851666-1403556329.png

After a few rounds,the group no longer changes,you can stop

1433065-20180909215247443-634200039.png
1433065-20180909215615955-902888840.png

Adaboost

Adaboost is one of the methods of bosting
Bosting is combine several classifiers with poor classification effects,and get a better classifier
The following picture,the left and right decision trees,the single look is not very good,but put the same data into it,add the two results together,it will increase the credibility.

1433065-20180909220101169-1765179245.png

For example, the Adaboost's handwriting recongnition,which can capture a lot of features on the artboard,such as the direction of the starting point,the distance between the starting point and the ending point,etc.

1433065-20180909220657363-1572607054.png

When training, you get the weight of each feature. For example, the beginning of 2 and 3 is very similar. This feature has little effect on the classification, and its weight is also small.

1433065-20180909220928957-1133005531.png

And this alpha angle is very recognizable, the weight of this feature will be larger, and the final prediction is the result of considering these features.

1433065-20180909220957285-359809901.png

Neural Network

Neural Networks is suitable for an input that may fall into at least two categories
NN consists of several layers of neurons, and the connections between them
The first layer is the input layer, and the last layer is the output layer.
Both the hidden layer and the output layer have their own classifier

1433065-20180909221057818-1683376991.png

The input is input to the network, activated, the calculated score is passed to the next layer, the subsequent neural layer is activated, and the score on the node of the output layer represents the scores belonging to each class. The following example shows the classification result as class 1
The same input is transmitted to different nodes, and the different results are obtained because the respective nodes have different weights and biases.
This is forward propagation

1433065-20180909221147691-376121733.png

Markov

Markov Chains consists of state and transitions
Chestnut, according to the phrase ‘the quick brown fox jumps over the lazy dog’, to get the markov chain
Step, first set each word to a state, then calculate the probability of transition between states

1433065-20180909221245741-1806768092.png

Markov Chains consists of state and transitions
Chestnut, according to the phrase ‘the quick brown fox jumps over the lazy dog’, to get the markov chain
Step, first set each word to a state, then calculate the probability of transition between states

1433065-20180909221318948-485162503.png

In life, the alternative result of the keyboard input method is the same principle, and the model will be more advanced.

1433065-20180909221401056-38872699.png

转载于:https://www.cnblogs.com/hugeng007/p/9609679.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值