虹膜数据集
Clustering with R
用R聚类
本文是关于使用流行的“ Iris”数据集进行R中的动手集群分析(无监督机器学习)的。 (This article is about hands-on Cluster Analysis (an Unsupervised Machine Learning) in R with the popular ‘Iris’ data set.)
Let’s brush up some concepts from Wikipedia
让我们回顾一下维基百科的一些概念
Machine learning is the study of computer algorithms that improve automatically through experience. It is seen as a subset of Artificial Intelligence. Machine learning algorithms build a mathematical model based on sample data, in order to make predictions or decisions without being explicitly programmed to do so.
机器学习是对计算机算法的研究,这些算法会根据经验自动提高。 它被视为人工智能的子集。 机器学习算法会基于样本数据构建数学模型,以便做出预测或决策而无需明确地编程。
Supervised Learning is the machine learning task of learning a function that maps an input to an output based on example input-output pairs. It infers a function from labeled training data consisting of a set of training examples.
监督学习是机器学习任务,它学习基于示例输入输出对将输入映射到输出的功能。 它从标记的训练数据(由一组训练示例组成)中推断出功能。
Unsupervised Learning is a type of machine learning that looks for previously undetected patterns in a data set with no pre-existing labels and with a minimum of human supervision.
无监督学习是一种机器学习,它在没有预先存在的标签且最少需要人工监督的情况下,在数据集中查找先前未检测到的模式。
Cluster Analysis or Clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters).
聚类分析或聚类是,在所述相同的组(称为簇 )对象这样的方式,而不是那些在其他组(簇状物)的分组的一组物体更相似(在某些意义上)彼此的任务。
About Iris Data set
关于虹膜数据集
Iris flower data set was introduced by the British statistician and biologist Ronald Fisher in his 1936 paper The use of multiple measurements in taxonomic problems. This is perhaps the best known database to be found in the pattern recognition literature. Iris data set gives the measurements in centimetres of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa, versicolor, and virginica.
鸢尾花数据集由英国统计学家和生物学家罗纳德·费舍尔 ( Ronald Fisher)在其1936年发表的论文中介绍。 这也许是模式识别文献中最著名的数据库。 鸢尾花数据集以厘米为单位,分别测量了3种鸢尾花中每种花的50朵花的萼片长度和宽度以及花瓣长度和宽度 。 该品种是山鸢尾, 花斑癣和弗吉尼亚 。
So, let’s st