autogluon秒杀机器学习分类问题_autogluon infers your prediction problem is: 'bina-CSDN博客

本文链接：https://blog.csdn.net/weixin_52561314/article/details/124264970

1.数据预处理

autogluon竟都不需要对str类数据进行处理，为了简便只对yes/no做了一个简单处理，同时，预测的label列也不用专门摘出来，在后续训练中直接指定即可。

from autogluon.tabular import TabularDataset, TabularPredictor
import pandas as pd

df1=pd.read_csv('/Users/johnny/Downloads/CreditMaster/heart_2020_cleaned.csv')

healthy = df1[df1['HeartDisease']=='No']
unhealthy = df1[df1['HeartDisease']=='Yes']
up_sampled = resample(unhealthy, replace=True, n_samples=len(healthy))
df_new = pd.concat([healthy,up_sampled])
df_new = df_new.replace({'No': 0, 'Yes': 1})
df_new = df_new.replace({'Male': 0, 'Female': 1})
train, test = train_test_split(df_new,test_size=0.1,random_state=0)

2.模型训练

指定要预测的心脏病列尾预测label，然后调用fit即可。

label = 'HeartDisease'
predictor = TabularPredictor(label = label,presets='best_quality', ).fit(train)
TabularPredictor()
#训练过程中发生了什么
results = predictor.fit_summary()

No path specified. Models will be saved in: "AutogluonModels/ag-20220418_130233/"
Beginning AutoGluon training ...
AutoGluon will save models to "AutogluonModels/ag-20220418_130233/"
AutoGluon Version:  0.2.0
Train Data Rows:    526359
Train Data Columns: 17
Preprocessing data ...
AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed).
	2 unique label values:  [1, 0]
	If 'binary' is not the correct problem_type, please manually specify the problem_type argument in fit() (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
Selected class <--> label mapping:  class 1 = 1, class 0 = 0
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
	Available Memory:                    3317.47 MB
	Train Data (Original)  Memory Usage: 172.08 MB (5.2% of available memory)
	Warning: Data size prior to feature transformation consumes 5.2% of available memory. Consider increasing memory or subsampling the data to avoid instability.
	Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
	Stage 1 Generators:
		Fitting AsTypeFeatureGenerator...
	Stage 2 Generators:
		Fitting FillNaFeatureGenerator...
	Stage 3 Generators:
		Fitting IdentityFeatureGenerator...
		Fitting CategoryFeatureGenerator...
			Fitting CategoryMemoryMinimizeFeatureGenerator...
	Stage 4 Generators:
		Fitting DropUniqueFeatureGenerator...
	Types of features in original data (raw dtype, special dtypes):
		('float', [])  : 4 | ['BMI', 'PhysicalHealth', 'MentalHealth', 'SleepTime']
		('int', [])    : 9 | ['Smoking', 'AlcoholDrinking', 'Stroke', 'DiffWalking', 'Sex', ...]
		('object', []) : 4 | ['AgeCategory', 'Race', 'Diabetic', 'GenHealth']
	Types of features in processed data (raw dtype, special dtypes):
		('category', []) : 4 | ['AgeCategory', 'Race', 'Diabetic', 'GenHealth']
		('float', [])    : 4 | ['BMI', 'PhysicalHealth', 'MentalHealth', 'SleepTime']
		('int', [])      : 9 | ['Smoking', 'AlcoholDrinking', 'Stroke', 'DiffWalking', 'Sex', ...]
	2.5s = Fit runtime
	17 features in original data used to generate 17 features in processed data.
	Train Data (Processed) Memory Usage: 56.85 MB (1.7% of available memory)
Data preprocessing and feature engineering runtime = 2.87s ...
AutoGluon will gauge predictive performance using evaluation metric: 'accuracy'
	To change this, specify the eval_metric argument of fit()
Automatically generating train/validation split with holdout_frac=0.01, Train Rows: 521095, Val Rows: 5264
Fitting model: KNeighborsUnif ...
	0.8575	 = Validation accuracy score
	210.89s	 = Training runtime
	0.8s	 = Validation runtime
Fitting model: KNeighborsDist ...
	0.8733	 = Validation accuracy score
	222.96s	 = Training runtime
	0.54s	 = Validation runtime
Fitting model: LightGBMXT ...
	0.762	 = Validation accuracy score
	2.54s	 = Training runtime
	0.03s	 = Validation runtime
Fitting model: LightGBM ...
[1000]	train_set's binary_error: 0.199513	valid_set's binary_error: 0.206117
[2000]	train_set's binary_error: 0.177611	valid_set's binary_error: 0.18712
[3000]	train_set's binary_error: 0.159547	valid_set's binary