python中的auto_ml自动机器学习框架学习实践

     之前就有接触过auto_ml这个自动机器学习框架,但是一直没有时间做一个简单的记录总结,以便于后续有时间继续学习,我相信随着机器学习的普及推广和发展,自动机器学习一定会占据越来越大的作用,因为机器学习、深度学习里面很大的一部分时间都要花在特征工程、模型选择、组合和参数调优上面,auto_ml框架提供了一种很好的解决思路,当前的自动学习框架也有很多,想要完整地进行学习还是需要花费一定的时间的,这里就简单对之前使用的auto_ml做个简单的记录。

     由于数据集的缘故我不能随意公开使用,这里索性直接使用官方提供的Demo来简单学习实践一下,之后使用自己的数据集的时候只需要做一点数据集规范格式的统一处理就好了。

      以波士顿房价数据为例,简单的一个小例子如下:

def bostonSimpleFunc():
    '''
    波士顿房价数据的简单应用实例
    '''
    train_data,test_data = get_boston_dataset()
    column_descriptions = {
                                'MEDV': 'output',
                                'CHAS': 'categorical'
                            }
    ml_predictor = Predictor(type_of_estimator='regressor', column_descriptions=column_descriptions)
    ml_predictor.train(train_data)
    ml_predictor.score(test_data, test_data.MEDV)

       运行结果如下:

Welcome to auto_ml! We're about to go through and make sense of your data using machine learning, and give you a production-ready pipeline to get predictions with.

If you have any issues, or new feature ideas, let us know at http://auto.ml
You are running on version 2.9.10
Now using the model training_params that you passed in:
{}
After overwriting our defaults with your values, here are the final params that will be used to initialize the model:
{'presort': False, 'warm_start': True, 'learning_rate': 0.1}
Running basic data cleaning
Fitting DataFrameVectorizer
Now using the model training_params that you passed in:
{}
After overwriting our defaults with your values, here are the final params that will be used to initialize the model:
{'presort': False, 'warm_start': True, 'learning_rate': 0.1}


********************************************************************************************
About to fit the pipeline for the model GradientBoostingRegressor to predict MEDV
Started at:
2019-06-12 09:14:59
[1] random_holdout_set_from_training_data's score is: -9.82
[2] random_holdout_set_from_training_data's score is: -9.054
[3] random_holdout_set_from_training_data's score is: -8.48
[4] random_holdout_set_from_training_data's score is: -7.925
[5] random_holdout_set_from_training_data's score is: -7.424
[6] random_holdout_set_from_training_data's score is: -7.051
[7] random_holdout_set_from_training_data's score is: -6.608
[8] random_holdout_set_from_training_data's score is: -6.315
[9] random_holdout_set_from_training_data's score is: -6.0
[10] random_holdout_set_from_training_data's score is: -5.728
[11] random_holdout_set_from_training_data's score is: -5.499
[12] random_holdout_set_from_training_data's score is: -5.288
[13] random_holdout_set_from_training_data's score is: -5.126
[14] random_holdout_set_from_training_data's score is: -4.918
[15] random_holdout_set_from_training_data's score is: -4.775
[16] random_holdout_set_from_training_data's score is: -4.625
[17] random_holdout_set_from_training_data's score is: -4.513
[18] random_holdout_set_from_training_data's score is: -4.365
[19] random_holdout_set_from_training_data's score is: -4.281
[20] random_holdout_set_from_training_data's score is: -4.196
[21] random_holdout_set_from_training_data's score is: -4.133
[22] random_holdout_set_from_training_data's score is: -4.033
[23] random_holdout_set_from_training_data's score is: -4.004
[24] random_holdout_set_from_training_data's score is: -3.945
[25] random_holdout_set_from_training_data's score is: -3.913
[26] random_holdout_set_from_training_data's score is: -3.852
[27] random_holdout_set_from_training_data's score is: -3.844
[28] random_holdout_set_from_training_data's score is: -3.795
[29] random_holdout_set_from_training_data's score is: -3.824
[30] random_holdout_set_from_training_data's score is: -3.795
[31] random_holdout_set_from_training_data's score is: -3.778
[32] random_holdout_set_from_training_data's score is: -3.748
[33] random_holdout_set_from_training_data's score is: -3.739
[34] random_holdout_set_from_training_data's score is: -3.72
[35] random_holdout_set_from_training_data's score is: -3.721
[36] random_holdout_set_from_training_data's score is: -3.671
[37] random_holdout_set_from_training_data's score is: -3.644
[38] random_holdout_set_from_training_data's score is: -3.639
[39] random_holdout_set_from_training_data's score is: -3.617
[40] random_holdout_set_from_training_data's score is: -3.62
[41] random_holdout_set_from_training_data's score is: -3.614
[42] random_holdout_set_from_training_data's score is: -3.643
[43] random_holdout_set_from_training_data's score is: -3.647
[44] random_holdout_set_from_training_data's score is: -3.624
[45] random_holdout_set_from_training_data's score is: -3.589
[46] random_holdout_set_from_training_data's score is: -3.578
[47] random_holdout_set_from_training_data's score is: -3.565
[48] random_holdout_set_from_training_data's score is: -3.555
[49] random_holdout_set_from_training_data's score is: -3.549
[50] random_holdout_set_from_training_data's score is: -3.539
[52] random_holdout_set_from_training_data's score is: -3.571
[54] random_holdout_set_from_training_data's score is: -3.545
[56] random_holdout_set_from_training_data's score is: -3.588
[58] random_holdout_set_from_training_data's score is: -3.587
[60] random_holdout_set_from_training_data's score is: -3.584
[62] random_holdout_set_from_training_data's score is: -3.585
[64] random_holdout_set_from_training_data's score is: -3.589
[66] random_holdout_set_from_training_data's score is: -3.59
[68] random_holdout_set_from_training_data's score is: -3.558
[70] random_holdout_set_from_training_data's score is: -3.587
[72] random_holdout_set_from_training_data's score is: -3.583
[74] random_holdout_set_from_training_data's score is: -3.58
[76] random_holdout_set_from_training_data's score is: -3.578
[78] random_holdout_set_from_training_data's score is: -3.577
[80] random_holdout_set_from_training_data's score is: -3.591
[82] random_holdout_set_from_training_data's score is: -3.592
[84] random_holdout_set_from_training_data's score is: -3.586
[86] random_holdout_set_from_training_data's score is: -3.58
[88] random_holdout_set_from_training_data's score is: -3.562
[90] random_holdout_set_from_training_data's score is: -3.561
The number of estimators that were the best for this training dataset: 50
The best score on the holdout set: -3.539421497275334
Finished training the pipeline!
Total training time:
0:00:01


Here are the results from our GradientBoostingRegressor
predicting MEDV
Calculating feature responses, for advanced analytics.
The printed list will only contain at most the top 100 features.
+----+----------------+--------------+----------+-------------------+-------------------+-----------+-----------+-----------+-----------+
|    | Feature Name   |   Importance |    Delta |   FR_Decrementing |   FR_Incrementing |   FRD_abs |   FRI_abs |   FRD_MAD |   FRI_MAD |
|----+----------------+--------------+----------+-------------------+-------------------+-----------+-----------+-----------+-----------|
|  1 | ZN             |       0.0001 |  11.5619 |           -0.0027 |            0.0050 |    0.0027 |    0.0050 |    0.0000 |    0.0000 |
| 13 | CHAS=1.0       |       0.0011 | nan      |          nan      |          nan      |  nan      |  nan      |  nan      |  nan      |
| 12 | CHAS=0.0       |       0.0012 | nan      |          nan      |          nan      |  nan      |  nan      |  nan      |  nan      |
|  2 | INDUS          |       0.0013 |   3.4430 |            0.0070 |           -0.0539 |    0.0070 |    0.0539 |    0.0000 |    0.0000 |
|  7 | RAD            |       0.0029 |   4.2895 |           -0.7198 |            0.0463 |    0.7198 |    0.0463 |    0.3296 |    0.0000 |
|  5 | AGE            |       0.0145 |  13.9801 |            0.0757 |           -0.0292 |    0.2862 |    0.2393 |    0.0000 |    0.0000 |
|  8 | TAX            |       0.0160 |  82.9834 |            0.9411 |           -0.3538 |    0.9691 |    0.3538 |    0.0398 |    0.0000 |
| 10 | B              |       0.0171 |  45.7266 |           -0.1144 |            0.0896 |    0.1746 |    0.1200 |    0.1503 |    0.0000 |
|  3 | NOX            |       0.0193 |   0.0588 |            0.1792 |           -0.1584 |    0.1996 |    0.2047 |    0.0000 |    0.0000 |
|  9 | PTRATIO        |       0.0247 |   1.1130 |            0.5625 |           -0.2905 |    0.5991 |    0.2957 |    0.4072 |    0.1155 |
|  0 | CRIM           |       0.0252 |   4.4320 |           -0.0986 |           -0.4012 |    0.3789 |    0.4623 |    0.0900 |    0.0900 |
|  6 | DIS            |       0.0655 |   1.0643 |            3.4743 |           -0.2346 |    3.5259 |    0.5256 |    0.5473 |    0.2233 |
| 11 | LSTAT          |       0.3086 |   3.5508 |            1.5328 |           -1.6693 |    1.5554 |    1.6703 |    1.3641 |    1.6349 |
|  4 | RM             |       0.5026 |   0.3543 |           -1.1450 |            1.7191 |    1.1982 |    1.8376 |    0.4338 |    0.8010 |
+----+----------------+--------------+----------+-------------------+-------------------+-----------+-----------+-----------+-----------+


*******
Legend:
Importance = Feature Importance
     Explanation: A weighted measure of how much of the variance the model is able to explain is due to this column
FR_delta = Feature Response Delta Amount
     Explanation: Amount this column was incremented or decremented by to calculate the feature reponses
FR_Decrementing = Feature Response From Decrementing Values In This Column By One FR_delta
     Explanation: Represents how much the predicted output values respond to subtracting one FR_delta amount from every value in this column
FR_Incrementing = Feature Response From Incrementing Values In This Column By One FR_delta
     Explanation: Represents how much the predicted output values respond to adding one FR_delta amount to every value in this column
FRD_MAD = Feature Response From Decrementing- Median Absolute Delta
     Explanation: Takes the absolute value of all changes in predictions, then takes the median of those. Useful for seeing if decrementing this feature provokes strong changes that are both positive and negative
FRI_MAD = Feature Response From Incrementing- Median Absolute Delta
     Explanation: Takes the absolute value of all changes in predictions, then takes the median of those. Useful for seeing if incrementing this feature provokes strong changes that are both positive and negative
FRD_abs = Feature Response From Decrementing Avg Absolute Change
     Explanation: What is the average absolute change in predicted output values to subtracting one FR_delta amount to every value in this column. Useful for seeing if output is sensitive to a feature, but not in a uniformly positive or negative way
FRI_abs = Feature Response From Incrementing Avg Absolute Change
     Explanation: What is the average absolute change in predicted output values to adding one FR_delta amount to every value in this column. Useful for seeing if output is sensitive to a feature, but not in a uniformly positive or negative way
*******

None


***********************************************
Advanced scoring metrics for the trained regression model on this particular dataset:

Here is the overall RMSE for these predictions:
2.9415706036925924

Here is the average of the predictions:
21.3944468736

Here is the average actual value on this validation set:
21.4882352941

Here is the median prediction:
20.688959488015513

Here is the median actual value:
20.15

Here is the mean absolute error:
2.011340247445387

Here is the median absolute error (robust to outliers):
1.4717184675805761

Here is the explained variance:
0.8821274319123865

Here is the R-squared value:
0.882007483541501
Count of positive differences (prediction > actual):
51
Count of negative differences:
51
Average positive difference:
1.91755182694
Average negative difference:
-2.10512866795


***********************************************


[Finished in 2.8s]

       作者说了,这个auto_ml是为了产品研发的,提供了很完整的应用,这里从训练测试数据集划分、模型训练、模型持久化、模型加载、模型预测几个部分来拿波士顿房价数据做一个完成的实践,具体如下:

def bostonWholeFunc():
    '''
    波士顿房价数据的一个比较完整的实例
    包括: 训练测试数据集划分、模型训练、模型持久化、模型加载、模型预测
    '''
    train_data,test_data = get_boston_dataset()
    column_descriptions = {
      'MEDV': 'output', 
      'CHAS': 'categorical'
    }
    ml_predictor = Predictor(type_of_estimator='regressor', column_descriptions=column_descriptions)
    ml_predictor.train(train_data)
    test_score = ml_predictor.score(test_data, test_data.MEDV)
    file_name = ml_predictor.save()
    trained_model = load_ml_model(file_name)
    predictions = trained_model.predict(test_data)
    print('=====================predictions===========================')
    print(predictions)
    predictions = trained_model.predict_proba(test_data)
    print('=====================predictions===========================')
    print(predictions)

       结果如下:

Welcome to auto_ml! We're about to go through and make sense of your data using machine learning, and give you a production-ready pipeline to get predictions with.

If you have any issues, or new feature ideas, let us know at http://auto.ml
You are running on version 2.9.10
Now using the model training_params that you passed in:
{}
After overwriting our defaults with your values, here are the final params that will be used to initialize the model:
{'presort': False, 'warm_start': True, 'learning_rate': 0.1}
Running basic data cleaning
Fitting DataFrameVectorizer
Now using the model training_params that you passed in:
{}
After overwriting our defaults with your values, here are the final params that will be used to initialize the model:
{'presort': False, 'warm_start': True, 'learning_rate': 0.1}


********************************************************************************************
About to fit the pipeline for the model GradientBoostingRegressor to predict MEDV
Started at:
2019-06-12 09:21:21
[1] random_holdout_set_from_training_data's score is: -9.93
[2] random_holdout_set_from_training_data's score is: -9.281
[3] random_holdout_set_from_training_data's score is: -8.683
[4] random_holdout_set_from_training_data's score is: -8.03
[5] random_holdout_set_from_training_data's score is: -7.494
[6] random_holdout_set_from_training_data's score is: -7.074
[7] random_holdout_set_from_training_data's score is: -6.649
[8] random_holdout_set_from_training_data's score is: -6.374
[9] random_holdout_set_from_training_data's score is: -6.115
[10] random_holdout_set_from_training_data's score is: -5.877
[11] random_holdout_set_from_training_data's score is: -5.566
[12] random_holdout_set_from_training_data's score is: -5.391
[13] random_holdout_set_from_training_data's score is: -5.088
[14] random_holdout_set_from_training_data's score is: -4.911
[15] random_holdout_set_from_training_data's score is: -4.692
[16] random_holdout_set_from_training_data's score is: -4.566
[17] random_holdout_set_from_training_data's score is: -4.379
[18] random_holdout_set_from_training_data's score is: -4.296
[19] random_holdout_set_from_training_data's score is: -4.14
[20] random_holdout_set_from_training_data's score is: -4.009
[21] random_holdout_set_from_training_data's score is: -3.92
[22] random_holdout_set_from_training_data's score is: -3.856
[23] random_holdout_set_from_training_data's score is: -3.81
[24] random_holdout_set_from_training_data's score is: -3.72
[25] random_holdout_set_from_training_data's score is: -3.632
[26] random_holdout_set_from_training_data's score is: -3.601
[27] random_holdout_set_from_training_data's score is: -3.538
[28] random_holdout_set_from_training_data's score is: -3.487
[29] random_holdout_set_from_training_data's score is: -3.459
[30] random_holdout_set_from_training_data's score is: -3.458
[31] random_holdout_set_from_training_data's score is: -3.422
[32] random_holdout_set_from_training_data's score is: -3.408
[33] random_holdout_set_from_training_data's score is: -3.356
[34] random_holdout_set_from_training_data's score is: -3.335
[35] random_holdout_set_from_training_data's score is: -3.323
[36] random_holdout_set_from_training_data's score is: -3.313
[37] random_holdout_set_from_training_data's score is: -3.262
[38] random_holdout_set_from_training_data's score is: -3.236
[39] random_holdout_set_from_training_data's score is: -3.207
[40] random_holdout_set_from_training_data's score is: -3.214
[41] random_holdout_set_from_training_data's score is: -3.198
[42] random_holdout_set_from_training_data's score is: -3.188
[43] random_holdout_set_from_training_data's score is: -3.174
[44] random_holdout_set_from_training_data's score is: -3.164
[45] random_holdout_set_from_training_data's score is: -3.122
[46] random_holdout_set_from_training_data's score is: -3.122
[47] random_holdout_set_from_training_data's score is: -3.109
[48] random_holdout_set_from_training_data's score is: -3.11
[49] random_holdout_set_from_training_data's score is: -3.119
[50] random_holdout_set_from_training_data's score is: -3.113
[52] random_holdout_set_from_training_data's score is: -3.113
[54] random_holdout_set_from_training_data's score is: -3.099
[56] random_holdout_set_from_training_data's score is: -3.102
[58] random_holdout_set_from_training_data's score is: -3.097
[60] random_holdout_set_from_training_data's score is: -3.069
[62] random_holdout_set_from_training_data's score is: -3.061
[64] random_holdout_set_from_training_data's score is: -3.024
[66] random_holdout_set_from_training_data's score is: -2.999
[68] random_holdout_set_from_training_data's score is: -2.999
[70] random_holdout_set_from_training_data's score is: -2.984
[72] random_holdout_set_from_training_data's score is: -2.978
[74] random_holdout_set_from_training_data's score is: -2.96
[76] random_holdout_set_from_training_data's score is: -2.943
[78] random_holdout_set_from_training_data's score is: -2.947
[80] random_holdout_set_from_training_data's score is: -2.938
[82] random_holdout_set_from_training_data's score is: -2.921
[84] random_holdout_set_from_training_data's score is: -2.914
[86] random_holdout_set_from_training_data's score is: -2.91
[88] random_holdout_set_from_training_data's score is: -2.901
[90] random_holdout_set_from_training_data's score is: -2.906
[92] random_holdout_set_from_training_data's score is: -2.892
[94] random_holdout_set_from_training_data's score is: -2.885
[96] random_holdout_set_from_training_data's score is: -2.884
[98] random_holdout_set_from_training_data's score is: -2.894
[100] random_holdout_set_from_training_data's score is: -2.88
[103] random_holdout_set_from_training_data's score is: -2.893
[106] random_holdout_set_from_training_data's score is: -2.889
[109] random_holdout_set_from_training_data's score is: -2.886
[112] random_holdout_set_from_training_data's score is: -2.869
[115] random_holdout_set_from_training_data's score is: -2.875
[118] random_holdout_set_from_training_data's score is: -2.852
[121] random_holdout_set_from_training_data's score is: -2.855
[124] random_holdout_set_from_training_data's score is: -2.848
[127] random_holdout_set_from_training_data's score is: -2.854
[130] random_holdout_set_from_training_data's score is: -2.86
[133] random_holdout_set_from_training_data's score is: -2.857
[136] random_holdout_set_from_training_data's score is: -2.854
[139] random_holdout_set_from_training_data's score is: -2.856
[142] random_holdout_set_from_training_data's score is: -2.854
[145] random_holdout_set_from_training_data's score is: -2.845
[148] random_holdout_set_from_training_data's score is: -2.84
[151] random_holdout_set_from_training_data's score is: -2.838
[154] random_holdout_set_from_training_data's score is: -2.838
[157] random_holdout_set_from_training_data's score is: -2.839
[160] random_holdout_set_from_training_data's score is: -2.837
[163] random_holdout_set_from_training_data's score is: -2.838
[166] random_holdout_set_from_training_data's score is: -2.838
[169] random_holdout_set_from_training_data's score is: -2.84
[172] random_holdout_set_from_training_data's score is: -2.828
[175] random_holdout_set_from_training_data's score is: -2.836
[178] random_holdout_set_from_training_data's score is: -2.834
[181] random_holdout_set_from_training_data's score is: -2.836
[184] random_holdout_set_from_training_data's score is: -2.837
[187] random_holdout_set_from_training_data's score is: -2.86
[190] random_holdout_set_from_training_data's score is: -2.862
[193] random_holdout_set_from_training_data's score is: -2.856
[196] random_holdout_set_from_training_data's score is: -2.855
[199] random_holdout_set_from_training_data's score is: -2.857
[202] random_holdout_set_from_training_data's score is: -2.856
[205] random_holdout_set_from_training_data's score is: -2.86
[208] random_holdout_set_from_training_data's score is: -2.859
[211] random_holdout_set_from_training_data's score is: -2.857
[214] random_holdout_set_from_training_data's score is: -2.855
[217] random_holdout_set_from_training_data's score is: -2.852
[220] random_holdout_set_from_training_data's score is: -2.849
[223] random_holdout_set_from_training_data's score is: -2.853
[226] random_holdout_set_from_training_data's score is: -2.845
[229] random_holdout_set_from_training_data's score is: -2.846
[232] random_holdout_set_from_training_data's score is: -2.849
The number of estimators that were the best for this training dataset: 172
The best score on the holdout set: -2.827876248876794
Finished training the pipeline!
Total training time:
0:00:01


Here are the results from our GradientBoostingRegressor
predicting MEDV
Calculating feature responses, for advanced analytics.
The printed list will only contain at most the top 100 features.
+----+----------------+--------------+----------+-------------------+-------------------+-----------+-----------+-----------+-----------+
|    | Feature Name   |   Importance |    Delta |   FR_Decrementing |   FR_Incrementing |   FRD_abs |   FRI_abs |   FRD_MAD |   FRI_MAD |
|----+----------------+--------------+----------+-------------------+-------------------+-----------+-----------+-----------+-----------|
| 12 | CHAS=0.0       |       0.0000 | nan      |          nan      |          nan      |  nan      |  nan      |  nan      |  nan      |
|  1 | ZN             |       0.0004 |  11.5619 |           -0.0194 |            0.0204 |    0.0205 |    0.0230 |    0.0000 |    0.0000 |
| 13 | CHAS=1.0       |       0.0005 | nan      |          nan      |          nan      |  nan      |  nan      |  nan      |  nan      |
|  2 | INDUS          |       0.0031 |   3.4430 |            0.1103 |            0.0494 |    0.1565 |    0.1543 |    0.0597 |    0.0000 |
|  7 | RAD            |       0.0059 |   4.2895 |           -0.3558 |            0.0537 |    0.3620 |    0.1431 |    0.3727 |    0.0000 |
|  5 | AGE            |       0.0105 |  13.9801 |            0.2805 |           -0.3050 |    0.5735 |    0.4734 |    0.3615 |    0.2435 |
| 10 | B              |       0.0118 |  45.7266 |           -0.1885 |            0.1507 |    0.3139 |    0.2903 |    0.1688 |    0.0582 |
|  8 | TAX            |       0.0167 |  82.9834 |            1.1477 |           -0.4399 |    1.2920 |    0.4563 |    0.2671 |    0.2617 |
|  9 | PTRATIO        |       0.0247 |   1.1130 |            0.5095 |           -0.2323 |    0.5599 |    0.4590 |    0.2984 |    0.3357 |
|  0 | CRIM           |       0.0284 |   4.4320 |           -0.4701 |           -0.2061 |    0.7788 |    0.4938 |    0.5027 |    0.2806 |
|  3 | NOX            |       0.0298 |   0.0588 |            0.3083 |           -0.1691 |    0.4285 |    0.3968 |    0.0745 |    0.0745 |
|  6 | DIS            |       0.0608 |   1.0643 |            3.4966 |           -0.3628 |    3.5823 |    0.8045 |    0.9935 |    0.3655 |
|  4 | RM             |       0.3571 |   0.3543 |           -1.2174 |            1.4995 |    1.3628 |    1.7090 |    0.7740 |    1.0375 |
| 11 | LSTAT          |       0.4504 |   3.5508 |            1.9849 |           -1.8635 |    2.0343 |    1.9289 |    1.8354 |    1.5375 |
+----+----------------+--------------+----------+-------------------+-------------------+-----------+-----------+-----------+-----------+


*******
Legend:
Importance = Feature Importance
     Explanation: A weighted measure of how much of the variance the model is able to explain is due to this column
FR_delta = Feature Response Delta Amount
     Explanation: Amount this column was incremented or decremented by to calculate the feature reponses
FR_Decrementing = Feature Response From Decrementing Values In This Column By One FR_delta
     Explanation: Represents how much the predicted output values respond to subtracting one FR_delta amount from every value in this column
FR_Incrementing = Feature Response From Incrementing Values In This Column By One FR_delta
     Explanation: Represents how much the predicted output values respond to adding one FR_delta amount to every value in this column
FRD_MAD = Feature Response From Decrementing- Median Absolute Delta
     Explanation: Takes the absolute value of all changes in predictions, then takes the median of those. Useful for seeing if decrementing this feature provokes strong changes that are both positive and negative
FRI_MAD = Feature Response From Incrementing- Median Absolute Delta
     Explanation: Takes the absolute value of all changes in predictions, then takes the median of those. Useful for seeing if incrementing this feature provokes strong changes that are both positive and negative
FRD_abs = Feature Response From Decrementing Avg Absolute Change
     Explanation: What is the average absolute change in predicted output values to subtracting one FR_delta amount to every value in this column. Useful for seeing if output is sensitive to a feature, but not in a uniformly positive or negative way
FRI_abs = Feature Response From Incrementing Avg Absolute Change
     Explanation: What is the average absolute change in predicted output values to adding one FR_delta amount to every value in this column. Useful for seeing if output is sensitive to a feature, but not in a uniformly positive or negative way
*******

None


***********************************************
Advanced scoring metrics for the trained regression model on this particular dataset:

Here is the overall RMSE for these predictions:
2.4474947386663786

Here is the average of the predictions:
21.2925792927

Here is the average actual value on this validation set:
21.4882352941

Here is the median prediction:
20.457423442279662

Here is the median actual value:
20.15

Here is the mean absolute error:
1.844793596155306

Here is the median absolute error (robust to outliers):
1.3340192567295777

Here is the explained variance:
0.9188375538746201

Here is the R-squared value:
0.9183155397464807
Count of positive differences (prediction > actual):
51
Count of negative differences:
51
Average positive difference:
1.64913759477
Average negative difference:
-2.04044959754


***********************************************




We have saved the trained pipeline to a filed called "auto_ml_saved_pipeline.dill"
It is saved in the directory: 
C:\Users\18706\Desktop\myBlogs\auto_ml_use
To use it to get predictions, please follow the following flow (adjusting for your own uses as necessary:


`from auto_ml.utils_models import load_ml_model
`trained_ml_pipeline = load_ml_model("auto_ml_saved_pipeline.dill")
`trained_ml_pipeline.predict(data)`


Note that this pickle/dill file can only be loaded in an environment with the same modules installed, and running the same Python version.
This version of Python is:
sys.version_info(major=2, minor=7, micro=13, releaselevel='final', serial=0)


When passing in new data to get predictions on, columns that were not present (or were not found to be useful) in the training data will be silently ignored.
It is worthwhile to make sure that you feed in all the most useful data points though, to make sure you can get the highest quality predictions.
=====================predictions===========================
[23.503099796820333, 32.63486484873551, 17.607843570794248, 22.96364141712182, 18.037259790025, 22.154154350077157, 18.157171399351753, 14.490724400217747, 20.91569106207268, 21.371745165599958, 19.978460029298827, 17.617959317911595, 6.657480263073871, 21.259425283809687, 19.30470390603625, 23.54754498054679, 20.616057833202493, 8.569816325663448, 45.01902942229479, 15.319975928505148, 23.84765254861352, 24.49050663723932, 12.344561585629016, 23.24874551694055, 15.137348894013865, 15.067038653704085, 21.674735923166942, 12.88017013620315, 19.43339890697579, 20.933210490656045, 20.235546222120107, 22.99264652948031, 20.45638944287541, 20.50831821637611, 14.026411558432988, 17.14000803427353, 34.322736768893236, 19.82116882409099, 20.757084718131125, 23.523990773770624, 17.92101235838185, 30.745980540024213, 45.09505946725109, 18.76719301853909, 23.69250732281568, 14.627546717865679, 15.404318347865019, 23.856332667077602, 18.597560915078148, 28.295069087679007, 20.335783749261154, 35.49551328178157, 17.049478769941757, 27.36240739278428, 49.168123673644864, 21.919364008618228, 16.431621230418827, 32.50614954154076, 22.60486571683311, 17.190717714534216, 24.86659240393153, 34.726632201151446, 32.56154963374883, 17.991423510542266, 23.19139847589728, 16.3827778391806, 13.763406903575234, 23.041746542718485, 28.897952087920405, 15.16115409656009, 20.54704218671605, 27.630784534960636, 9.265217126500687, 20.218468086624206, 22.678130640115423, 3.978712919679104, 20.458457441683915, 44.47945990229906, 12.603336785642627, 11.482102006681343, 21.066151218556975, 13.559181962607349, 21.19973222974325, 10.447704116792627, 20.110776756244167, 28.928923567731772, 15.527462244687818, 23.24725371877329, 25.743821297087276, 18.04671684265537, 22.950747524482065, 9.088864852661203, 19.075035374223955, 18.42257896844079, 23.564483816162195, 19.647455910849818, 44.12778583727594, 11.427374611849514, 12.040264853009598, 16.998049081305517, 20.25692214075818, 22.80453061159547]
=====================predictions===========================
[[1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0]]
[Finished in 3.3s]

       在官方提供的原始实例上,我对输出多加了一层类别的输出。

       完整的程序如下:

#!usr/bin/env python
#encoding:utf-8
from __future__ import division

'''
__Author__:沂水寒城
功能: auto_ml 学习实践使用

GitHub地址:
           https://github.com/yishuihanhan/auto_ml
官方文档:
         https://auto-ml.readthedocs.io/en/latest/formatting_data.html
'''


from auto_ml import Predictor
from auto_ml.utils import get_boston_dataset
from auto_ml.utils_models import load_ml_model



def bostonSimpleFunc():
    '''
    波士顿房价数据的简单应用实例
    '''
    train_data,test_data = get_boston_dataset()
    column_descriptions = {
                                'MEDV': 'output',
                                'CHAS': 'categorical'
                            }
    ml_predictor = Predictor(type_of_estimator='regressor', column_descriptions=column_descriptions)
    ml_predictor.train(train_data)
    ml_predictor.score(test_data, test_data.MEDV)


def bostonWholeFunc():
    '''
    波士顿房价数据的一个比较完整的实例
    包括: 训练测试数据集划分、模型训练、模型持久化、模型加载、模型预测
    '''
    train_data,test_data = get_boston_dataset()
    column_descriptions = {
      'MEDV': 'output', 
      'CHAS': 'categorical'
    }
    ml_predictor = Predictor(type_of_estimator='regressor', column_descriptions=column_descriptions)
    ml_predictor.train(train_data)
    test_score = ml_predictor.score(test_data, test_data.MEDV)
    file_name = ml_predictor.save()
    trained_model = load_ml_model(file_name)
    predictions = trained_model.predict(test_data)
    print('=====================predictions===========================')
    print(predictions)
    predictions = trained_model.predict_proba(test_data)
    print('=====================predictions===========================')
    print(predictions)


if __name__=='__main__':
    bostonSimpleFunc()

    bostonWholeFunc()

        相应地GitHub地址和官方文档地址在代码的开头部分我都给出来了,感兴趣的话可以去看看,记录学习了!

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

Together_CZ

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值