Data Processing
steps:
- Import the Data
- Clean the Data
- Split the Data into Training/Test Sets
- Create a Model
- Train the Model
- Make Predictions
- Evaluate and Improve
Libraries:
Numpy
Pandas
MatPlotLib
Scikit-Learn(sklearn)
Pre:
1.Download Anaconda Distribution from the anaconda.com
2.Select video game sales from the kaggle.com
3.Choose the data set music.csv
Import the Data
jupyter notebook:
import pandas as pd
music_data = pd.read_csv('music.csv')
music_data
prepare the Data
Divide one data sets into two.
In:
x = music_data.drop(column=['genre']
y = music_data['genre']
x
y
Out:
Learning and Predicting
scikit-learn 是基于 Python 语言的机器学习工具。
简单高效的数据挖掘和数据分析工具
可供大家在各种环境中重复使用
建立在 NumPy ,SciPy 和 matplotlib 上
开源,可商业使用 - BSD许可证
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
music_data = pd.read_csv('music.csv')
X = music_data.drop(columns=['genre'])
y = music_data['genre']
model = DecisionTreeClassifier()
model.fit(X,y)
predictions = model.predict([ [21,1],[22,0] ])
predictions
out:
array([‘HipHop’, ‘Dance’], dtype=object)
Calculating the Accuracy
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
music_data = pd.read_csv('music.csv')
X = music_data.drop(columns=['genre'])
y = music_data['genre']
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2)
model = DecisionTreeClassifier()
model.fit(X_train,y_train)
predictions = model.predict(X_test)
score = accuracy_score(y_test, predictions)
score
Persisting Models
from sklearn.externals import joblib
model = joblib.load('music-recommender.joblib')
prdictions = model.predict([[21,1]])
Visualizing a Decision Tree
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn import tree
music_data = pd.read_csv('music.csv')
X = music_data.drop(columns=['genre'])
y = music_data['genre']
model = DecisionTreeClassifier()
model.fit(X,y)
model = DecisionTreeClassifier()
model.fit(X,y)
tree.export_graphviz(model, out_file='music-recommender.dot',
feature_names=['age','gender'],
class_names=sorted(y.unique()),
label='all',
filled=True,
rounded=True)
//此时会生成一个music-recommender.dot文件,将其拖入vscode,并下载dot扩展