The unknown Word
The First column | The Second Column |
---|---|
Nearest neighbor algorithm | 近邻算法 |
K - Nearest neighbor algorithm | \(\sqrt{{\sum_{i=1}^n}(a_i-b_j)^2}\) |
Decision Tree
According to some features,each node asks a question.By judging ,the data is divided into two categories,and then continue to ask questions.These problems are learned from existion data.When new data is added,the data can be divided into appropriate leaves according to the problem on the tree.
Random Forest
Randomly select data in the source data to form several subsets
The S matrix is the source data,with 1-N data,A B C is the feature,and the last column C is the category.
M sub-matrices are randomly generated by S
These M subsets get M decision treesPut the new data into the M trees,get M classification results,count to see which category is the most predictive,and use this category as the final prediction result.
Logistic regression
When the prediction target is probabilistic,the value domain needs to satisfy greater than or equal to 0.when less than or equal to 1,at this time,a simple linear model cannot be used because the value range is beyond when the domain is not within a certain range.
So it is better to have a model of this shape at this time
So how do you get such a model?
This model needs to satisfy two conditions:greater than or equal to 0,less than or equal to 1
Models greater than or equal to 0 can choose absolute value,square,square value,here using exponential function,must be greater than 0
Less than or equal to 1 with division,the numerator is itself,and the denominator is itself plus 1,which must be less than 1.
After doing another deformation,I got the logistic regression model.
The corresponding coefficients can be obtained by calculating the source data
In the end,there is the logistic graph
SVM(Support vector machine)
To separate the two types,you want to get a hyperplane.The optimal hyperplane is the maximum of the two types of margins.The margin is the distance between the distance between the hyperplane and the nearest point.As shown below,\(Z_2>Z_1\),so the green super Plane is better.
Express this hyperplane as a linear equation,one above the line is greater than or equal to 1,and the other is less than or equal to -1
The point-to-face distance is calculated according to the formula in the figure.
So the expression for the total margin is as follows,the goal is to maximize this margin,you need to minimize the denominator,so it becomes an optimization problem.
Giving three examples,we find the optimal hperplane,define the weight vector=(2,3)-(1,1)
Obtainning the weight vector as (a,2a),substitube two points into the equation,substitute (2,3) and another value=1,substitute (1,1) and another value=-1 to solve for a and truncation \(W_0\).The value,which in turn gives the expression of the hyperplane.
After getting out A,substituting (a,2a) is the support vector
The equation fora
and$W_0$
substituting the hyperplane is support vector machine.
Naive Bayes
Giving an application in NLP
Giving paregraph of text,return the emotional classification,the attitude of this text is positive,or negative.
In order to solve this problem,you can just look at some of the words.
This text will only be represented by some words and their counts
The orginal question is :give you a sentense,which category it belongs to?Become a simpler and easier question through bayer rules
The question becomes,what is the probability of this sentence appearing in this category,of course,don't forget the other towo probabilities in the fomula
For example,The probability that the word love appears in the positive case is 0.1,and the probability in the negative case is 0.001.
K nearest neighbors
When giving a new data,which of the k points closest to it is more,which class does the data belong to?
For example,to distinguish between cats and dogs,the shape of the claws and sound is judged.The circles and triangles are known to be classified.What kind of star does this represent?
When k=3,the points connecting the three lines are the last three points,so the circle is more,so this star belongs to the cat.
K-means
I want to divide a set of data into three categories,with large pink values and small yellow values.
Initially initialized first,here is the simplest 3,2,1 as the initial value of each type.In the rest of the data,each calculates the distance from the three initial values and then classifies it into the category of the initial value closest to it.
After classifying the class,calculate the average of each class as the center point of the new round.
After a few rounds,the group no longer changes,you can stop
Adaboost
Adaboost is one of the methods of bosting
Bosting is combine several classifiers with poor classification effects,and get a better classifier
The following picture,the left and right decision trees,the single look is not very good,but put the same data into it,add the two results together,it will increase the credibility.
For example, the Adaboost's handwriting recongnition,which can capture a lot of features on the artboard,such as the direction of the starting point,the distance between the starting point and the ending point,etc.
When training, you get the weight of each feature. For example, the beginning of 2 and 3 is very similar. This feature has little effect on the classification, and its weight is also small.
And this alpha angle is very recognizable, the weight of this feature will be larger, and the final prediction is the result of considering these features.
Neural Network
Neural Networks is suitable for an input that may fall into at least two categories
NN consists of several layers of neurons, and the connections between them
The first layer is the input layer, and the last layer is the output layer.
Both the hidden layer and the output layer have their own classifier
The input is input to the network, activated, the calculated score is passed to the next layer, the subsequent neural layer is activated, and the score on the node of the output layer represents the scores belonging to each class. The following example shows the classification result as class 1
The same input is transmitted to different nodes, and the different results are obtained because the respective nodes have different weights and biases.
This is forward propagation
Markov
Markov Chains consists of state and transitions
Chestnut, according to the phrase ‘the quick brown fox jumps over the lazy dog’, to get the markov chain
Step, first set each word to a state, then calculate the probability of transition between states
Markov Chains consists of state and transitions
Chestnut, according to the phrase ‘the quick brown fox jumps over the lazy dog’, to get the markov chain
Step, first set each word to a state, then calculate the probability of transition between statesIn life, the alternative result of the keyboard input method is the same principle, and the model will be more advanced.