注意:本文引用自专业人工智能社区Venus AI
更多AI知识请参考原站 ([www.aideeplearning.cn])
预测 NBA 新秀的职业生涯寿命
该项目是使用 Scikit-learn 的二元分类模型来预测 NBA 新秀在提供一些信息(例如出场次数、助攻、抢断和失误等)的情况下是否会在联盟中持续服役 5 年。
数据集来源:数据世界
我们将重点关注:
- 1)利用热图相关性进行特征选择
- 2)逻辑回归
Part 1: 导入科学计算库并加载数据集
导入科学计算库
import pandas as pd # load and manipulate data
import numpy as np # calculate the mean and standard deviation
import matplotlib.pyplot as plt # drawing graphs
from sklearn.model_selection import train_test_split # split data into training and testing sets
from sklearn.linear_model import LogisticRegression # import Logistic regression from sklearn
import sklearn.metrics as metrics # import metrics
import seaborn as sns # import seaborn for visualization
from sklearn.preprocessing import MinMaxScaler #import min max scaler
from sklearn.metrics import confusion_matrix#confusion matrix
from yellowbrick.classifier import ROCAUC#Discriminationthreshold
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
加载数据集
nba = pd.read_csv('./data/nba_logreg.csv')
nba.head()
数据集中特征的描述如下所示
Part 2: 数据探索
# check class imbalance
nba['TARGET_5Yrs'].value_counts()