1.数据准备和生成模型的Pipeline
Pipeline最小化数据损失
Pipeline能够处理训练数据和评估数据集之间的数据泄漏问题,通常在数据处理过程中对分离出的所有数据子集做同样的数据处理,如正态化处理。
from pandas import read_csv
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
filename='/home/duan/pima indians.txt'
names&#