pca降维分类
什么是PCA? (What is PCA?)
Principal Component Analysis (PCA) is a common feature extraction technique in data science that employs matrix factorization to reduce the dimensionality of data into lower space.
主成分分析(PCA)是数据科学中的一种常见特征提取技术,该技术采用矩阵分解来减少数据进入较低空间的维数。
In real-world datasets, there are often too many features in the data. The higher the number of features harder it is to visualize the data and work on it. Sometimes most of the features are correlated, and hence redundant. Hence feature extraction comes into play.
在现实世界的数据集中,数据中通常有太多特征。 功能数量越多,就越难以可视化数据并对其进行处理。 有时大多数功能是相关的,因此是多余的。 因此,特征提取开始起作用。
关于数据: (About the Data:)
The dataset used in this article is Ionosphere Dataset from the UCI machine learning repository. It is a binary class classification problem. There are 351 observations with 34 features.
本文中使用的数据集是UCI机器学习存储库中的Ionosphere数据集 。 这是一个二进制类分类问题。 有351个观测结果,具有34个特征。
准备数据集: (Preparing the Dataset:)
- Importing necessary libraries and reading the dataset 导入必要的库并读取数据集
- Preprocessing of dataset 数据集的预处理
- Standardization 标准化
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
rom sklearn.model_selection import train_test_split
data = pd.read_csv("ionosphere.csv", header=None)
X = data.iloc[:,:-1]
y = data.iloc[:,-1]
y = [1 if x=='g' else 0 for x in y]
y = np.reshape(y, (len(y), 1))
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
std = StandardScaler()
X_train_std = std.fit_transform(X_train)
X_test_std = std.transform(X_test)
y_train = np.reshape(y_train, (y_train.shape[0]))
y_test = np.reshape(y_test, (y_test.shape[0]))
y_train = y_train.astype('int')
y_test = y_test.astype('int')
使用所有34个功能的Logistic回归ML模型: (Logistic Regression ML model using all 34 features:)
The training data has 34 features.
训练数据具有34个特征。
- After preprocessing of data, training data is trained using Logistic Regression algorithm for binary class classification 在对数据进行预处理之后,使用Logistic回归算法对训练数据进行训练,以进行二分类分类
- Finetuning Logistic Regression model to find the best parameters 微调Logistic回归模型以找到最佳参数
- Compute training and test accuracy and f1 score. 计算训练和测试的准确性以及f1分数。
![Image for post](https://i-blog.csdnimg.cn/blog_migrate/47775a5e19fc07271f75170bb89f2768.png)
- Training