内部AI(Inside AI)
During the survey and data collection step, we do not know which feature/attribute have a strong influence on the output and the ones which do not have that much effect. Due to this, we collect or measure as many logical attributes as possible in this stage.
在调查和数据收集步骤中,我们不知道哪个功能/属性对输出有很大的影响,而哪些功能/属性没有那么大的影响。 因此,在此阶段,我们将收集或测量尽可能多的逻辑属性。
The machine learning model becomes complex, and also computationally becomes expensive as the number of features in the training dataset increases.
机器学习模型变得复杂,并且随着训练数据集中特征数量的增加,计算量也变得昂贵。
The aim is to develop a trained machine learning model with the minimal required feature and which can predict the data points with acceptable accuracy. We should not oversimplify the model and lose the significant information by pruning important features, and at the same time have a complex model with too many redundant or lesser important features.
目的是开发一种训练有素的机器学习模型,该模型具有所需的最少功能,并且可以以可接受的精度预测数据点。 我们不应该过分简化模型并通过修剪重要特征而丢失大量信息,同时我们也要拥有一个具有过多冗余或次要重要特征的复杂模型。
Scikit-Learn library provides several methods to simplify the model with the dimensional reduction of the training dataset and minimal impact on the prediction accuracy of the machine learning model.
Scikit-Learn库提供了几种方法来简化模型,同时减少训练数据集的维数,并且对机器学习模型的预测准确性产生最小的影响。
In this article, I will discuss recursive feature elimination and cross-validated selection to identify the optimal independent variables and reduce the dimension of the training dataset.
在本文中,我将讨论递归特征消除和交叉验证选择,以识别最佳自变量并减小训练数据集的维数。
We will be using the breast cancer dataset in Sckit_Learn in this article to explain Recursive Feature Elimination With Cross-Validation (RFECV).
我们将在本文的Sckit_Learn中使用乳腺癌数据集来解释通过交叉验证进行的递归特征消除(RFECV)。