综述
成绩组成
- 10%签到
- 50%作业(在线动手作业,5次课)
- 40%考察,期末作业。(阿里天池大数据竞赛)(https://tianchi.aliyun.com/competition/gameList/activeList)
提交:英文报告(0.1)、代码(0.3)、数据集 - 0-5分presentation
五个专题
Understanding your data
description and preparation(数据预处理)
Typical data mining tasks
- pattern mining(模式挖掘)
- Classification(分类)
- Cluster analysis(聚类分析)
- Outlier detection(异常值检测)
Introduction
Why
- 针对大数据,数据的产生、存储、处理,越来越方式多样。
- 数据挖掘,从数据到知识。价值来源于数据。
Evolution
- before 1600,empirical
- 1600-1950s,theoretical science
- 1950s-1990s,computational science
- 1900-now,data science
What Is Data Mining?
- Data mining (knowledge discovery from data)
Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) patterns or knowledge from huge amount of data - Alternative names: Knowledge discovery ( mining) in databases( KDD)
KDD Process
Data Cleaning>>>Data Warehouse >>>Task-relevant >>> Data Mining >>> Patern Evaluation >>> Knowledge
A Web Mining Framework
- Web mining usually involves
- Data cleaning
- Data integration from multiple sources
- Warehousing the data
- Data cube construction
- Data selection for data mining
- Data mining
- Presentation of the mining results
- Patterns and knowledge to be used or stored into
- knowledge-base