Udacity机器学习入门——特征选择

最新推荐文章于 2024-04-09 16:27:44 发布

张文彬彬

最新推荐文章于 2024-04-09 16:27:44 发布

阅读量758

点赞数

分类专栏：机器学习入门笔记

本文链接：https://blog.csdn.net/u012084802/article/details/80194125

版权

练习：一个新的安然特征练习

poi_flag_emal.py

    if from_emails:
        ctr=0
        while not from_poi and ctr < len(from_emails):
            if from_emails[ctr] in poi_email_list:
                from_poi = True
            ctr += 1

练习：可视化新特征

studentCode.py

    ### you fill in this code, so that it returns either
    ###     the fraction of all messages to this person that come from POIs
    ###     or
    ###     the fraction of all messages from this person that are sent to POIs
    ### the same code can be used to compute either quantity

    ### beware of "NaN" when there is no known email address (and so
    ### no filled email features), and integer division!
    ### in case of poi_messages or all_messages having "NaN" value, return 0.
    if poi_messages !='NaN' and all_messages != 'NaN':
        fraction = float(poi_messages)/all_messages
    else:
        fraction =0.

    return fraction

警惕特征漏洞：

任何人都有可能犯错—要对你得到的结果持怀疑态度！你应该时刻警惕 100% 准确率。不寻常的主张要有不寻常的证据来支持。如果有特征过度追踪你的标签，那么它很可能就是一个漏洞！如果你确定它不是漏洞，那么你很大程度上就不需要机器学习了——你可以只用该特征来分配标签。

去除特征：

什么情况下回忽略一种特征：