link address : https://en.wikipedia.org/wiki/Statistical_classification
>>For the unsupervised learning approach, see Cluster analysis<<
In machine learning and statistics, classification is the problem of identifying to which of a set of categories (sub-populations) a new observation belongs(for example, there is a populations we call sheep), on the basis of a training set of data containing observations (or instances) whose category membership is known.
>>Classification is an example of pattern recognition<<.
In the terminology of machine learning,[1] >>classification is considered an instance of supervised learning<<, i.e. learning where a training set of correctly identified observations is available. The corresponding >>unsupervised procedure is known as clustering<<, and >>involves grouping data into categories based on some measure of inherent similarity or distance<<.
Often, >>the individual observations are analyzed into a set of quantifiable properties, known variously as explanatory variables or features<<. These properties may variously be categorical (e.g. "A", "B", "AB" or "O", for blood type), ordinal (e.g. "large", "medium" or "small"), integer-valued (e.g. the number of occurrences of a particular word in an email) or real-valued (e.g. a measurement of blood pressure)(these properties should be quantifiable). >>Other classifiers work by comparing observations to previous observations by means of a similarity or distance function<<.
An algorithm that implements classification, especially in a concrete implementation, is known as a classifier. The term "classifier" sometimes also refers to the mathematical function, >>implemented by a classification algorithm, that maps input data to a category(seems like y = f(x) )<<.
Terminology across fields is quite varied. In statistics, where classification is often done with logistic regression or a similar procedure, the properties of observations are termed explanatory variables (or >>independent variables<<, regressors, etc.), and the categories to be predicted are known as outcomes, which are considered to be possible values of the >>dependent variable<<(construct a relative between input and output).
Classification and clustering are examples of the more general problem of pattern recognition, which is the >>assignment of some sort of output value to a given input value<<(construct a relative between input and output).
Other examples are >>regression, which assigns a real-valued output to each input<<
A common subclass of classification is >>probabilistic classification<<. Algorithms of this nature use statistical inference to find the best class for a given instance. Unlike other algorithms, which simply output a "best" class, >>probabilistic algorithms output a probability of the instance being a member of each of the possible classes<<. The best class is normally then selected as the one with the highest probability. However, such an algorithm has numerous advantages over non-probabilistic classifiers:
1.It can output a confidence value associated with its choice (in general, a classifier that can do this is known as a confidence-weighted classifier).
2.Correspondingly, it can abstain when its confidence of choosing any particular output is too low.
3.Because of the probabilities which are generated, probabilistic classifiers can be more effectively incorporated into larger machine-learning tasks, in a way that partially or completely avoids the problem of error propagation
Mahalanobis distance:
The Mahalanobis distance is a measure of the distance between a point P and a distribution D, introduced by P. C. Mahalanobis in 1936.[1] It is a multi-dimensional generalization of the idea of measuring how many standard deviations away P is from the mean of D. This distance is zero if P is at the mean of D, and grows as P moves away from the mean: along each principal component axis, it measures the number of standard deviations from P to the mean of D. If each of these axes is rescaled to have unit variance, then Mahalanobis distance corresponds to standard Euclidean distance in the transformed space. Mahalanobis distance is thus unitless and scale-invariant, and takes into account the correlations of the data set
Bayesian procedures:
Unlike frequentist procedures, Bayesian classification procedures provide a natural way of taking into account any available information about the relative sizes of the sub-populations associated with the different groups within the overall population.[7] Bayesian procedures tend to be computationally expensive and, in the days before Markov chain Monte Carlo computations were developed, approximations for Bayesian clustering rules were devised.[8]
Some Bayesian procedures involve the calculation of group membership probabilities: these can be viewed as providing a more and more informative outcome of a data analysis than a simple attribution of a single group-label to each new observation
Binary and multiclass classification:
Classification can be thought of as two separate problems – binary classification and multiclass classification. In binary classification, a better understood task, only two classes are involved, whereas multiclass classification involves assigning an object to one of several classes.[9] Since many classification methods have been developed specifically for binary classification, >>multiclass classification often requires the combined use of multiple binary classifiers<<.
Application domains:
Computer vision:
1.Medical imaging and medical image analysis
2.Optical character recognition
3.Video tracking
Speech recognition
Handwriting recognition
Document classification
Internet search engines
Biometric identification
Biological classification
Statistical natural language processing
Credit scoring
See also:
Artificial intelligence
Binary classification
Class membership probabilities
Classification rule
Compound term processing
Data mining
Data warehouse
Fuzzy logic
Information retrieval
List of datasets for machine learning research
Machine learning
Recommender system