//2015年2月25日
//日后补全
零、几篇有趣的文章
The Information Age in Which You Live: Changing the Face of Business
Ilkka Tuomi: Data is More Than Knowledge
Chaim Zins: Conceptual Approaches for Defining Data,information, and Knowledge
一、Problem solving
1、什么是问题?
business->CS
如何将商务现实问题转化为CS(计算机科学)可以解决的问题
2、为什么要用CS解决?
uncertainty->certainty
管理控制、decision making
simon模型
Programmed-CS-电商?
non-Programmed-工商
从non往programmed压
3、IT不只是机器,IT的更高作用
structure of the info supported problem-solving
more uncertainty->more information
统计学
方法论上对
病急乱投医
big data
3、有哪些商务问题?
business problem
efficient
competition advantage
optimize the business process
better understand the real world
real world
ERP做的太好(overfit),太不好都会失败
牛人overfit
外部数据:保洁找百度而不是自己市场部
Query log:品牌/产品 +特点
还能找竞争对手
4、三个层次
IBM电商实验室
M:process和data的关系
F:架构师
C:coding
最缺M
二、means of BI
BI can handle large amounts of information to help identify and
develop new opportunities.
n Making use of new opportunities and implementing an effective
strategy can provide a competitive market advantage and longterm stability.[2]
[2] (Rud, Olivia (2009). Business Intelligence Success Factors: Tools for Aligning Your Business in the Global
Economy. Hoboken, N.J: Wiley & Sons.
n In short:
帮助企业更好地利用已有数据, 提高管理决策质量 (better decisionmaking) *不是代替决策!
ETL:E十分关,需要Domain knowledge:
你拿到病人的数据,就取代了医生了吗?
OLAP:simple,general的任务 做reporting
OLDP:complete、ad-hoc
三、KDD Model
1、KDD模型与数量模型
数量模型没有太多知识也可建模,但需要太多专家知识,客观无法描述主观
让历史告诉未来
2、KDD过程
Management oriented problem
Developing an understanding of the application domain:研究什么
问题
The relevant prior knowledge:围绕问题,如何选择方法和技术
The goals of the end-user: 用户需求决定à研究方向&如何评价结果
n Creating a target data set
Selecting a data set, or focusing on a subset of variables, or data
samples, on which discovery is to be performed.
(tag挖掘)
Data cleaning and preprocessing.
检查数据完整性及一致性
Missing:
删去
Global constant
Attribute mean
Most probable value
Noise and missing data:分箱,回归,聚类
v *Removal of noise or outliers.
v *Strategies for handling missing data fields.
Accounting for time sequence information and known changes.
n Data transformation, reduction and projection.
Finding useful features to represent the data depending on the goal
of the task (根据问题需要,对数据进行相应的转化).
Reduce the effective number of variables.
Data preprocessing
Aggregation
Sampling
Dimensionality reduction
Data that has been processed within a context to give it meaning
OR
Data that has been processed into a form that gives it meaning
data->(relation表)->information