r语言用于多分类的预测_在R中使用分类方法进行预测-CSDN博客

本文介绍了如何在R语言环境中运用分类方法进行多分类预测，详细阐述了预测过程，并提供了相关资源链接。

摘要由CSDN通过智能技术生成

r语言用于多分类的预测

In this analysis i’ll build a model that will predict whether a tumor is malignant or benign, based on data from a study on breast cancer. Classification algorithms will be used in the modelling process.

在此分析中，我将基于一项有关乳腺癌研究的数据，建立一个预测肿瘤是恶性还是良性的模型。分类算法将在建模过程中使用。

The dataset

数据集

The data for this analysis refer to 569 patients from a study on breast cancer. The actual data can be found at UCI (Machine Learning Repository): https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic). The variables were computed from a digitized image of a breast mass and describe characteristics of the cell nucleus present in the image. In particular the variables are the following:

该分析的数据涉及来自乳腺癌研究的569名患者。实际数据可以在UCI(机器学习存储库)中找到： https : //archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic) 。这些变量是根据乳腺肿块的数字化图像计算得出的，并描述了图像中存在的细胞核的特征。特别是以下变量：

radius (mean of distances from center to points on the perimeter)
半径 (从中心到外围点的距离的平均值)
texture (standard deviation of gray-scale values)
纹理 (灰度值的标准偏差)
perimeter
周长
area
区
smoothness (local variation in radius lengths)
平滑度 (半径长度的局部变化)
compactness (perimeter^² / area — 1.0)
紧凑度 (周长^²/面积— 1.0)
concavity (severity of concave portions of the contour)
凹度 (轮廓凹部的严重程度)
concave points (number of concave portions of the contour)
凹点 (轮廓的凹入部分的数量)
symmetry
对称
fractal dimension (“coastline approximation” — 1)
分形维数 (“海岸线近似” — 1)
type (tumor can be either malignant -M- or benign -B-)
类型 (肿瘤可以是恶性-M-或良性-B-)

探索性分析 (Exploratory Analysis)

It is essential to have an overview of the dataset. Below there is a box-plot of each predictor against the target variable (tumor). The log value of the predictors used instead of the actual values, for a better view of the plot.

概述数据集至关重要。下面是每个预测变量相对于目标变量(肿瘤)的箱形图。为了更好地查看图表，使用了预测变量的对数值而不是实际值。