达尔文进化奖_使用Kydavra GeneticAlgorithmSelector将达尔文进化应用于特征选择

达尔文进化奖

Maths almost always have a good answer in questions related to feature selection. However, sometimes good-old brute force algorithms can bring into the game a better and more practical answer.

中号 ATHS几乎总是在与特征选择问题一个很好的答案。 但是,有时旧式的蛮力算法可以为游戏带来更好,更实用的答案。

Genetic algorithms are a family of algorithms inspired by biological evolution, that basically use the cycle — cross, mutate, try, developing the best combination of states depending on the scoring metric. So, let’s get to the code.

遗传算法是一类受生物进化启发的算法,它们基本上使用循环-交叉,变异,尝试,根据评分标准开发状态的最佳组合。 因此,让我们看一下代码。

使用来自Kydavra库的GeneticAlgorithmSelector。 (Using GeneticAlgorithmSelector from Kydavra library.)

To install kydavra just write the following command in terminal:

要安装kydavra,只需在终端中输入以下命令:

pip install kydavra

Now you can import the Selector and apply it on your data set a follows:

现在,您可以导入选择器,并将其应用于数据集,如下所示:

from kydavra import GeneticAlgorithmSelectorselector = GeneticAlgorithmSelector()new_columns = selector.select(model, df, ‘target’)

As with every Kydavra selector that’s all. Now let’s try it on the Heart disease dataset.

就像所有Kydavra选择器一样。 现在让我们在“心脏病”数据集上尝试一下。

import pandas as pddf = pd.read_csv(‘cleaned.csv’)

I highly recommend you to shuffle your dataset before applying the selector, because it uses metrics (and right now cross_val_score isn’t implemented in this selector).

我强烈建议您在应用选择器之前先对数据集进行洗牌,因为它使用指标(并且此选择器中目前未实现cross_val_score)。

df = df.sample(frac=1).reset_index(drop=True)

Now we can apply our selector. To mention it has some parameters:

现在我们可以应用选择器了。 要说它有一些参数:

  • nb_children (int, default = 4) the number of best children that the algorithm will choose for the next generation.

    nb_children (int,默认= 4)该算法将为下一代选择的最佳子代数。

  • nb_generation (int, default = 200) the number of generations that will be created, technically speaking the number of iterations.

    nb_generation (整数,默认值= 200)将要创建的世代数,从技术上讲是迭代数。

  • scoring_metric (sklearn scoring metric, default = accuracy_score) The metric score used to select the best feature combination.

    scoring_metric (sklearn评分标准,默认= precision_score)用于选择最佳功能组合的度量标准分数。

  • max (boolean, default=True) if is set to True, the algorithm will select the combinations with the highest score if False the lowest scores will be chosen.

    max (布尔值,默认值= True),如果设置为True,则算法将选择得分最高的组合,如果为False,则选择最低得分。

But for now, we will use the basic setting except for the scoring_metric, because we have there a problem of disease diagnosis, so it will better to use Precision instead of accuracy.

但是现在,我们将使用除scoring_metric之外的基本设置,因为我们存在疾病诊断的问题,因此最好使用Precision而不是Precision。

from kydavra import GeneticAlgorithmSelectorfrom sklearn.metrics import precision_scorefrom sklearn.ensemble import RandomForestClassifierselector = GeneticAlgorithmSelector(scoring_metric=precision_score)model = RandomForestClassifier()

So now let’s find the best features. GAS (short version for GeneticAlgorithmSelector) need a sklearn model to train during the process of choosing features, the data frame itself and of course the name of target column:

因此,现在让我们找到最佳功能。 GAS(GeneticAlgorithmSelector的缩写)需要一个sklearn模型来进行特征选择,数据框本身以及目标列名称的训练:

selected_cols = selector.select(model, df, 'target')

Now let’s evaluate the result. Before feature selection, the precision score of the Random Forest was — 0.805. GAS choose the following features:

现在让我们评估结果。 在特征选择之前,随机森林的精度得分为-0.805。 GAS选择以下功能:

['age', 'sex', 'cp', 'fbs', 'restecg', 'exang', 'slope', 'thal']

Which gave the following precision score — 0.823. Which is a good result, knowing that in the majority of cases it is very hard to level up the scoring metrics.

得出的精度得分为0.823。 知道在大多数情况下很难提高评分标准,这是一个很好的结果。

Image for post

If you want to find out more about Genetic Algorithms, at the bottom of the article are some useful links. If you tried Kydavra and have some issues or feedback, please contact me on medium or please fill this form.

如果您想了解有关遗传算法的更多信息,请在本文底部找到一些有用的链接。 如果您尝试了Kydavra,但有任何问题或反馈,请通过媒体与我联系,或填写此表格

Made with ❤ by Sigmoid

由Sigmoid制造的❤

Useful links:

有用的链接:

翻译自: https://towardsdatascience.com/applying-darwinian-evolution-to-feature-selection-with-kydavra-geneticalgorithmselector-378662fd1f5b

达尔文进化奖

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值