关于特征那些事儿
特征表达、构建、选择、提取、工程、学习…傻傻分不清楚
╮(╯▽╰)╭ 整理整理
feature representation (特征表达)
When representing images, the feature values might correspond to the pixels of an image, while when representing texts the features might be the frequencies of occurrence of textual terms.
直白点就是,把数字也好、字符串也好,统一变化为算法可学习的数值特征;
比如,一张10像素*10像素图片,到3个10*10的矩阵(RGB三层),就算特征表达。
feature construction (特征构建)
Higher-level features can be obtained from already available features and added to the feature vector; for example, for the study of diseases the feature ‘Age’ is useful and is defined as Age = ‘Year of death’ minus ‘Year of birth’.
构建就是在原有特征基础上,组合出新特征,如同英文中的例子,存活年龄(new)可由去世时间(original)减去出生时间(original)得到。
feature selection (特征选择) & feature extraction (特征提取)
A preliminary step in many applications of machine learning and pattern recognition consists of selecting a subset of features, or constructing a new and reduced set of features to facilitate learning, and to improve generalization and interpretability
选择和提取都是在降维,但主要区别是,选择是挑出一个特征子集,不改变原始特征;而提取是在浓缩创造新的特征,比如卷积神经网络的全连接层输入(经常被称为高阶特征),实际可视为特征提取的产物。
feature engineering (特征工程)
Extracting or selecting features is a combination of art and science; developing systems to do so is known as feature engineering. It requires the experimentation of multiple possibilities and the combination of automated techniques with the intuition and knowledge of the domain expert.
包含了表达、构建、选择或提取过程。
The process of feature engineering:
- Brainstorming or Testing features;
- Deciding what features to create;
- Creating features;
- Checking how the features work with your model;
- Improving your features if needed;
- Go back to brainstorming/creating more features until the work is done.
feature learning (特征学习)
Automating the process is feature learning, where a machine not only uses features for learning, but learns the features itself.
不依靠人为经验参与特征工程,可视为特征学习过程。
References
另注
本博客对特征各类操作的定义源于个人理解及对维基百科的翻译,如有不同见解,欢迎讨论!