## Fundamentals 基础

1. Matrics & Linear Algebra Fundamentals 矩阵和线性代数基础
2. Hash Functions, Binary Tree, O(n) 哈希函数，二叉树，时间复杂度计算
3. Relational Algebra, DB Basics 关系代数，数据库基础
4. Inner, Outer, Cross, Theta Join 数据库中的内、外、交叉、西塔连接
5. CAP theorem CAP理论
6. Tabular Data 表格数据
7. Entropy 熵理论
8. Data Frames & Series 数据框和数据系列理论
9. Sharding  分片理论
10. OLAP
11. Multidimensional Data Model 多维数据模型
12. ETL
13. Reporting Vs BI Vs Analytics
14. JSON & XML
15. NoSQL
16. Regex 正则表达式
17. Vendor Landscape
18. Env Setup


## Statistics 统计

1. Pick a Dataset(UCI Repo)
2. Descriptive Statistics(mean, median, range, SD, Var) 统计描述（均值、中间值、层次、方差、标准差）
3. Exploratory Data Analysis
4. Histograms
5. Percentiles & Outliers
6. Probability Theory 概率论
7. Bayes Theorem 贝叶斯理论
8. Random Variables 随机变量
9. Cumul Dist Fn(CDF)
10. Continuos Distributions(Normal, Poisson, Gaussian)
11. Skewness
12. ANOVA
13. Prob Den Fn(PDF)
14. Central Limit Theorem
15. Monte Carlo Method
16. Hypothesis Testing
17. p-Value
18. Chiz Test
19. Estimation
20. Confid Int(CI)
21. MLE
22. Kernel Density Estimate
23. Regression
24. Convariance
25. Correlation
26. Pearson Coeff
27. Causation
28. Least2 fit 最小二乘算法
29. Eculidean Distance 欧几里得距离


## Programming 编程

1. Python Basics python语言基础
2. Working in Excel excel操作基础
3. R Setup, R studio 安装R语言和R studio
4. R Basics R语言基础
5. IBM SPSS
6. Rapid Miner
7. Varibles 变量
8. Vectors 向量
9. Matrices 矩阵
10. Arrays 数组
11. Factors 特征
12. Lists 列表
13. Data Frames 数据框
16. Subsetting Data
17. Manipulate Data Frames
18. Functions 函数
19. Factor Analysis 特征分析
20. Install Pkgs 安装pkgs


## Machine Learning 机器学习

1. What is ML?
2. Numerical Var
3. Categorical Var
4. supervised Learning
5. Unsupervied Learning
6. Concepts, Inputs & Attributes
7. Traning & Test Data
8. Classifier
9. Prediction
10. Lift
11. Overfitting
12. Bias & Variance
13. Trees & Classification
14. Classification Rate
15. Decision Tress
16. Boosting
17. Naive Bayes Classifiers
18. K-Nearest Neighbour
19. Logistic Regression
20. Ranking
21. Linear Regression
22. Perceptron
23. Hierarchical Clustering
24. K-means Clusterning
25. Neural Networks
26. Sentimeter Analysis
27. Collaborative Fitering
28. Tagging


## Text Mining / NLP 文本挖掘，自然语言处理

1. Corpus
2. Named Entity Recognition
3. Text Analysis
4. UIMA
5. Term Document Matrix
6. Tern Document Matrix
7. Term Frequency & Weight
8. Support Vector Machines
9. Association Rules
10. Market Based Analysis
11. Feature Extraction
12. Using Mahout
13. Using Weka
14. Using NLTK
15. Classify Text
16. Vocabulary Mapping


## Visualization 可视化

1. Data Exploration in R(Hist, Boxplot etc)
2. Uni, Bi & Multivariate Viz
3. ggplot2
4. Histogram & Pie(Uni)
5. Tree & Tree Map
6. Scatter Plot (Bi)
7. Line Charts (Bi)
8. Spatial Charts
9. Survey Plot
10. Timeline
11. Decision Tree
12. D3.js
13. infoVis
14. IBM ManyEyes
15. Tableau


## Big Data 大数据

1. Map Reduce Fundamentals
3. HDFS
4. Data Replication Principles
6. Name & Data Nodes
8. MIR Programming
10. Flue, Scribe: For Unstruct Data
11. SQL with Pig
12. DWH with Hive
13. Scribe, Chunkwa For Weblog
14. Using Mahout
15. Zookeeper Avro
18. rmr
19. Cassandra
20. MongoDB, Neo4j


## Data Ingestion 数据获取

1. Summary of Data Formats
2. Data Discovery
3. Data Sources & Acquisition
4. Data Integration
5. Data Fusion
6. Transformation & Enrichament
7. Data Survey
9. How much Data
10. Using ETL


## Data Munging 数据清理/数据转换

1. Dimensionality & Numerosity Reduction
2. Normalization
3. Data Scrubbing
4. Handling Missing Values
5. Unbiased Estimators
6. Binning Sparse Values
7. Feature Extraction
8. Denoising
9. Sampling
10. Stratified Sampling
11. Principal Component Analysis


## Toolbox 工具箱

1. MS Excel w/ Analysis Toolpak
2. Java, Python
3. R, Rstudio, Rattle
4. Weka, Knime, RapidMiner
6. Spark, Storm
7. Flume, Scibe, Chukwa
8. Nutch, Talend, Scraperwiki
9. Webscraper, Flume, Sqoop
10. tm, RWeka, NLTK
11. PHIPE
12. D3.js, ggplot2, Shiny
13. IBM Languageware
14. Cassandra, MongoDB


