OverView
In this chapte, I will use PCA for data visualization. Visualizing 2 or 3 dimensional data is not that challenging, however, here we get 13 features.
Now, I will use PCA to reduce that 13 dimensional data into 2 dimensions so that you can plot and hopefully understand the data better.
Step1: Standardize the Data
PCA is effected by scale so you need to scale the features in your data before applying PCA. Use StandardScaler to help you standardize the dataset’s features onto unit scale (mean = 0 and variance = 1) which is a requirement for the optimal performance of many machine learning algorithms.
## load data
trainSet = pd.read_csv("clevelandtrain.csv")
testSet = pd.read_csv("clevelandtest.csv")
xtrain = (trainSet.drop(["heartdisease::category|0|1"], axis=1)).iloc[:,:].values # (152, 13)
ytrain = trainSet["heartdisease::category|0|1"].iloc[:].values # (152,)
xtest = (testSet.drop(["heartdisease::category|0|1"], axis=1)).iloc[:,:].values # (145, 13)
ytest = testSet["heartdisease::category|0|1"].iloc[:].values # (145,)
print("the first 4 raw data is:\n"