用python进行训练步骤:
- 导入数据 import the data
- 清理数据 clean the data
- 把数据分为训练和验证部分 split the data into training and test sets
- 创建一个模型 create a model
- 训练模型 train the model
- 创建预测 make predictions
- 打分评估 score and evaluate.
以DecisionTreeClassifier为例:
直接把数据灌入模型,然后预测
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
music_data = pd.read_csv('music.csv')
x = music_data.drop(columns=['genre'])
y = music_data['genre']model = DecisionTreeClassifier()
model.fit(x,y)
predictions = model.predict([[21,1],[22,0]])
print(predictions)
把数据分为训练数据和验证数据,再训练,之后打分。
import pandas as pd from sklearn.tree import DecisionTreeClassifier from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score music_data = pd.read_csv('music.csv') x = music_data.drop(columns=['genre']) y = music_data['genre'] x_train, x_test, y_train, y_test = train_test_split(x,y,test_size=0.1) model = DecisionTreeClassifier() model.fit(x_train, y_train) predictions = model.predict(x_test) score_results = accuracy_score(y_test, predictions) print(score_results)
训练后把训练的结果存入文件,以后可以直接载入文件进行预测。
import pandas as pd from sklearn.tree import DecisionTreeClassifier from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score import joblib music_data = pd.read_csv('music.csv') x = music_data.drop(columns=['genre']) y = music_data['genre'] x_train, x_test, y_train, y_test = train_test_split(x,y,test_size=0.1) model = DecisionTreeClassifier() model.fit(x_train, y_train) #把训练完成的结果输出到文件。 joblib.dump(model, 'music-rec.joblib') #直接载入 训练完成的文件,用来预测。 model = joblib.load('music-rec.joblib') predictions = model.predict([[21,1],[22,0]]) print(predictions)
生成.dot文件,用VS code生成预览,查看训练结束的逻辑。
import pandas as pd from sklearn.tree import DecisionTreeClassifier from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score import joblib from sklearn import tree music_data = pd.read_csv('music.csv') x = music_data.drop(columns=['genre']) y = music_data['genre'] x_train, x_test, y_train, y_test = train_test_split(x,y,test_size=0.1) model = DecisionTreeClassifier() model.fit(x_train, y_train) tree.export_graphviz(model, out_file='music-plot.dot', feature_names=['age','gender'], class_names=sorted(y_train.unique()), label='all', rounded=True, filled=True )