达尔文进化奖_使用Kydavra GeneticAlgorithmSelector将达尔文进化应用于特征选择

最新推荐文章于 2023-11-10 10:21:46 发布

weixin_26752765

最新推荐文章于 2023-11-10 10:21:46 发布

阅读量165

点赞数

文章标签： python java 算法机器学习人工智能

原文链接：https://towardsdatascience.com/applying-darwinian-evolution-to-feature-selection-with-kydavra-geneticalgorithmselector-378662fd1f5b

版权

达尔文进化奖

Maths almost always have a good answer in questions related to feature selection. However, sometimes good-old brute force algorithms can bring into the game a better and more practical answer.

中号 ATHS几乎总是在与特征选择问题一个很好的答案。但是，有时旧式的蛮力算法可以为游戏带来更好，更实用的答案。

Genetic algorithms are a family of algorithms inspired by biological evolution, that basically use the cycle — cross, mutate, try, developing the best combination of states depending on the scoring metric. So, let’s get to the code.

遗传算法是一类受生物进化启发的算法，它们基本上使用循环-交叉，变异，尝试，根据评分标准开发状态的最佳组合。因此，让我们看一下代码。

使用来自Kydavra库的GeneticAlgorithmSelector。 (Using GeneticAlgorithmSelector from Kydavra library.)

To install kydavra just write the following command in terminal:

要安装kydavra，只需在终端中输入以下命令：

pip install kydavra

Now you can import the Selector and apply it on your data set a follows:

现在，您可以导入选择器，并将其应用于数据集，如下所示：

from kydavra import GeneticAlgorithmSelectorselector = GeneticAlgorithmSelector()new_columns = selector.select(model, df, ‘target’)

As with every Kydavra selector that’s all. Now let’s try it on the Heart disease dataset.

就像所有Kydavra选择器一样。现在让我们在“心脏病”数据集上尝试一下。

import pandas as pddf = pd.read_csv(‘cleaned.csv’)

I highly recommend you to shuffle your dataset before applying the selector, because it uses metrics (and right now cross_val_score isn’t implemented in this selector).

我强烈建议您在应用选择器之前先对数据集进行洗牌，因为它使用指标(并且此选择器中目前未实现cross_val_score)。

df = df.sample(frac=1).reset_index(drop=True)

Now we can apply our selector. To mention it has some parameters:

现在我们可以应用选择器了。要说它有一些参数：

nb_children (int, default = 4) the number of best children that the algorithm will choose for the next generation.
nb_children (int，默认= 4)该算法将为下一代选择的最佳子代数。
nb_generation (int, default = 200) the number of generations that will be created, technically speaking the number of iterations.
nb_generation (整数，默认值= 200)将要创建的世代数，从技术上讲是迭代数。
scoring_metric (sklearn scoring metric, default = accuracy_score) The metric score used to select the best feature combination.
scoring_metric (sklearn评分标准，默认= precision_score)用于选择最佳功能组合的度量标准分数。
max (boolean, default=True) if is set to True, the algorithm will select the combinations with the highest score if False the lowest scores will be chosen.
max (布尔值，默认值= True)，如果设置为True，则算法将选择得分最高的组合，如果为False，则选择最低得分。

But for now, we will use the basic setting except for the scoring_metric, because we have there a problem of disease diagnosis, so it will better to use Precision instead of accuracy.

但是现在，我们将使用除scoring_metric之外的基本设置，因为我们存在疾病诊断的问题，因此最好使用Precision而不是Precision。

from kydavra import GeneticAlgorithmSelectorfrom sklearn.metrics import precision_scorefrom sklearn.ensemble import RandomForestClassifierselector = GeneticAlgorithmSelector(scoring_metric=precision_score)model = RandomForestClassifier()

So now let’s find the best features. GAS (short version for GeneticAlgorithmSelector) need a sklearn model to train during the process of choosing features, the data frame itself and of course the name of target column:

因此，现在让我们找到最佳功能。 GAS(GeneticAlgorithmSelector的缩写)需要一个sklearn模型来进行特征选择，数据框本身以及目标列名称的训练：

selected_cols = selector.select(model, df, 'target')

Now let’s evaluate the result. Before feature selection, the precision score of the Random Forest was — 0.805. GAS choose the following features:

现在让我们评估结果。在特征选择之前，随机森林的精度得分为-0.805。 GAS选择以下功能：

['age', 'sex', 'cp', 'fbs', 'restecg', 'exang', 'slope', 'thal']

Which gave the following precision score — 0.823. Which is a good result, knowing that in the majority of cases it is very hard to level up the scoring metrics.

得出的精度得分为0.823。知道在大多数情况下很难提高评分标准，这是一个很好的结果。

If you want to find out more about Genetic Algorithms, at the bottom of the article are some useful links. If you tried Kydavra and have some issues or feedback, please contact me on medium or please fill this form.

如果您想了解有关遗传算法的更多信息，请在本文底部找到一些有用的链接。如果您尝试了Kydavra，但有任何问题或反馈，请通过媒体与我联系，或填写此表格。

Made with ❤ by Sigmoid

由Sigmoid制造的❤

Useful links:

有用的链接：