What is Data Mining
Discovery of useful, possibly unexpected, patterns in data.
-
What is Pattern?
Statistic Patterns & Machine Learning -
推荐阅读:十个有趣的“大数据”经典数据挖掘案例
Data Mining Process
(Describe the steps involved in data mining when viewed as a process of knowledge discovery.)
- Data Cleaning 数据清理(消除噪声或不一致数据)
- Data Integration数据集成(多种数据源可以组合在一起)
- Data Selection 数据选择(从数据库中检索与分析任务相关的数据)
- Data transformation数据变换(数据变换或统一成适合挖掘的形式)
- Data Mining Method 挖掘方法(使用各种方法提取数据模式)
- Pattern Assessment 模式评估(使用某种度量,识别真正有价值的模式)
- Knowledge Representation 知识表示(使用可视化和知识表示技术,向用户提供挖掘的知识)
What is Data
Definition
“Data are pieces of information that represent the qualitative or quantitative attributes of a variable or set of variables.
Data are often viewed as the lowest level of abstraction from which information and knowledge are derived.”
Data Types
- Continuous
- Discrete
- Symbolic
Storage
- Physical
- Logical
Major Issues
- Transformation
- Errors and Corruption
What is Big Data?
-
“Big data is high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.” — Gartner
大数据是高容量,高速度,多变的信息资产,需要经济高效的创新形式的信息处理方式,以增强洞察力和决策能力。” -
“Big data refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze.” —
Mckinsey & Company