Python Machine Learning Chapter 1 Giving Computers the Ability to Learn from Data 学习笔记

本帖是学习Sebastian Raschka 的《Python Machine Learning》做的笔记,便于需要时查看。

 Chapter 1 Giving Computers the Ability to Learn from Data

In this chapter, including the following topics:

1. the general concepts of machine learning

2. the three types of learning and basic terminology

3. the building blocks for successfully designing machine learning systems

4. installing and setting up python for data analysis and machine learning

 

2. types of machine learning

2.1 监督学习(supervised learning)

监督学习的主要目的是学习有标签的模型,然后做预测。监督指的是样本有输出结果,比如线性回归y=ax+b,样本数据中x对应的y是已知的。

                                 

classification for predicting class labels

regression for predicting continuous outcomes

 

2.2 非监督学习(unsupervised learning)

监督学习里,在训练模型之前就知道正确的答案;强化学习里,定义reward测量actions;非监督学习里,处理没有标签的数据,通过提取有用的信息探索数据结构。

Clustering is an exploratory data analysis technique that allows us to organize a pile of information into meaningful subgroups (clusters) without having any prior knowledge of their group memberships.

降维(dimensionality reduction):高维度的数据占有储存空间,增加计算,非监督降维可以去除数据噪声,保留最相关的信息,降低维度。同时,降维还可以用于数据可视化。将高维度的特征集降到2维或3维,通过2D 或3D图形展示出来。比如t-sne算法。后面我会单独写一篇降维的笔记。

2.3 强化学习(reinforcement learning)

The goal is to develop a system that improves its performance based on interactions with the environment. 由于目前典型的环境是奖赏信号(reward signal),所以也可以认为强化学习是监督学习领域的。但是,强化学习不是为了正确分类或者求值,而是通过奖赏函数测量how well the action wsa measured. Through the interaction with the environment, an agent can then use reinforcement learning to learn a series of actions that maximizes this reward via an exploratory trial-and-error approach or deliberative planning.

象棋机制是一个典型的强化学习例子。the agent 根据一系列的移动(the environment) ,the reward是比赛的输赢。

                                                         

 

3. a roadmap for building machine learning systems

The diagram below shows a typical workflow diagram for using machine learning in predictive modeling .

 

 

4. Training and selecting a predictive model

Question: how do we know which model performs well on the final test dataset and real-world data if we don't use this test set for the model selection but keep it for the final model evaluation?

using cross-validation to divid training data into training and validation subsets in order to estimate the generalization performance of the model. using hyperparameter optimization techniques frequently to fine-tune the performance of model.

Evaluating models and predicting unseen data instances:

用训练集(training data)选定模型之后,用测试集(testing data)评估模型在unseen data上的好坏层度,评估泛化误差。

 

5. Installing Python packages

pip install somepackage    # 新安装一个包,有些包可能通过pip安装不了,要去网站安装

pip install somepackage --upgrade   # 更新已安装的包

可以安装Anaconda,里面有spyder和jupyter,界面简洁易操作。可通过下面code安装包:

conda install somepackage   # 安装包

conda update somepackage  # 更新包

这本书中用到的主要包有:NumPy, SciPy, scikit-learn, matplotlib, pandas.

 

6. Summary

监督学习主要用于两个领域:分类和回归。分类模型是将categorize objects 分到已知的类别,回归模型是预测目标变量的连续的结果。非监督学习发现未打标签数据的结构,还可用于特征与处理。

 

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值