计划从今天开始整理SAS DM的读书笔记,计划一天2-3篇,争取3个月左右时间,在小学期前完成。今天倒腾了一段宏变量加总的代码,发现学海无涯。还是要通过整理读书笔记把数据分析处理的能力系统巩固下。本读书笔记主要以分析代码为主。考虑读者已经具备基本的SAS BASE,MACRO,SQL的知识。
学习书籍:Data Preparation for DataMining Using SAS
网络上很容易下载到电子版,发现百度网盘和新浪爱问的电子资源很丰富,基本上常用的电子教材上面都可以找到。另外感谢新西兰的王同学,提供了这本教材的代码和数据。
教材总体评价:
自己硬盘里面有关SAS的教材估计下载了大几个G,大部分是有关于统计分析的,基本大同小异。而这本教材是我看过的最系统、最有价值的定位于分析前的数据准备的教材,这部分工作往往会受到部分不是很严谨的科研工作者的忽略,因为,他们很多研究成果都是为了发文章而炮制结果,纯属于自娱自乐。而业界应用则不一样,一旦数据分析结果产生偏差,将导致决策失误,损失的将是白花花的银子和宝贵的声誉,因此,业界非常非常注重分析前的数据准备工作。综上,无论对于科研工作人员,还是业界实务分析人员,这项工作都是非常重要的。这本书里面大量使用了BASE,MACRO和SQL,这和大部分SAS分析人员日常使用的工具是高度吻合的,读懂这些代码,并能熟练应用,应该可以说是精通SAS BASE了。对于BASE MACRO SQL还不是很熟悉的用户来说,有一定难度。可以先学习另外一门教材,SAS.Publishing.Data.Preparation.for.Analytics.Using.SAS 这本的内容相对简单一些,代码编程量和MACRO的应用会少很多。也是一本以实务应用为导向的教材。这两本书的内容都掌握的话,应该来说,贯穿统计分析、数据挖掘全过程的数据处理的基本功是具备了。
教材目录
1. Introduction
* setting the context of data mining
2. Tasks and Data Flow
* describes what data mining can do and where data preparation fits in
3. Review of Data Mining Modeling Techniques
* an overview of data mining techniques
4. SAS Macros: A Quick Start
* just in case you haven't worked with SAS macros
5. Data Acquisition and Integration
* where you get your data from and how it's pulled together
6. Integrity Checks
* how to make sure the data is correct and even what "correct" means
7. Exploratory Data Analysis
* get to know your data
8. Sampling and Partition
* dealing with large data sets as well as getting ready to validate the models you build
9. Data Transformations
* rarely is your source data in the form most effective for modeling - this chapter describes what can be done to produce the most effective models
10. Binning and Reduction of Cardinality
* make your variables less complex and often times, more presentable and understandable
11. Treatment of Missing Values
* you will have missing values in your data - here are several approaches for dealing with them
12. Predictive Power and Variable Reduction I
* introduces the concept of identifying usefulness of input variables and reducing the required number of variables
13. Analysis of Nominal and Ordinal Variables
* how to evaluate relationships with discrete variables
14. Analysis of Continuous Variables
* how to evaluate relationships with continuous variables
15. Principal Component Analysis
* how to use PCA for variable reduction during data preparation
16. Factor Analysis
* how to use Factor Analysis for variable reduction during data preparation
17. Predictive Power and Variable Reduction II
* defines methods of simplifying and reducing input variables with respect to the target variable
18. Putting It All Together
* a case study showing the application of all these techniques for data preparation in a realistic example
Appendix. Listing of SAS Macros