文章目录
- The instance
- 1. Download the data and unzip
- 2. Load and Preparation
- 3. Custom random sampling
- 4. Stratified Sampling
- 5. Compare of the proportions in stratified sampling
- 6. Compare of the proportions in stratified and random sampling
- 7. Save train set and test set
- 8. Load Data, Name train set to *housing*
- 9. Looking for correlations
- 10. Data Processing
- 11. Tips: Pipeline introduction
- 12. Sklearn data processing
- 13. Model Selection and validation
- 14.Validate the system with test dataset
The instance
Based on the longitude and latitude, inhabitants, median incomes etc, to predict the house value.
1. Download the data and unzip
2. Load and Preparation
Each row represents one district. There are 10 attributes.
total_bedrooms has null values
ocean_proximity 's type is object
3. Custom random sampling
4. Stratified Sampling
5. Compare of the proportions in stratified sampling
It looks like the same
6. Compare of the proportions in stratified and random sampling
random
Delete income_cat attribute
7. Save train set and test set
8. Load Data, Name train set to housing
9. Looking for correlations
Colormap the median house value of California
10. Data Processing
11. Tips: Pipeline introduction
12. Sklearn data processing
13. Model Selection and validation
The better performance as below
Based on grid search, we can find the best Super-parameter of the Randomtree algorithms
14.Validate the system with test dataset