遇到需要拟合若干数据到几个主要元素的问题时,我们要采取PCA(主成分分析)来解决这个问题,首先我们要将若干数据读取到python中,代码如下:(代码中有详细的注释帮助理解)
import openpyxl
# Define a list of filenames
filenames = []
# Define transportation modes corresponding to each number
excel_file_names = {1: 'walk',
2: 'car',
3: 'run',
4: 'scooter',
5: 'bike',
6: 'tramway',
7: 'bus',
8: 'train'}
# Counter for assigning transportation modes
cnt = 1
# Iterate over the list of filenames
for name in filenames:
# Create a new Excel workbook
wb = openpyxl.Workbook()
# Create a worksheet named 'data'
ws = wb.create_sheet('data')
# Open the text file
with open(f'E:/python_analyse_data/multimodal_transport_analytics/Collecty_data/{name}.txt', 'r') as data_file:
# Read the contents of the data file line by line
contents = data_file.readline()
while contents != '':
# Split each line by comma and convert it into a list
xlsx_contents = contents.strip().split(',')
print(xlsx_contents)
# Append the data to the Excel worksheet
ws.append(xlsx_contents)
# Read the next line of the data file
contents = data_file.readline()
# Get the corresponding Excel filename based on the counter
excel_file_name = excel_file_names[cnt]
# Save the workbook as an Excel file
wb.save(f'data_to_xlsx{excel_file_name}.xlsx')
# Increment the counter for the next transportation mode
cnt += 1
然后我们要用到主成分分析法,将若干数据变成几个主要数据(我们题中是三个主要数据),首先先对数据进行归一化,再用python中自带的PCA库解决问题,代码如下(代码中有详细的注释便于理解):
import numpy as np
# Sample matrix
M = []
# Find the minimum and maximum values for each column
min_vals = np.min(M, axis=0) # Find the minimum value for each column
max_vals = np.max(M, axis=0) # Find the maximum value for each column
# Normalization
normalized_matrix = (M - min_vals) / (max_vals - min_vals)
from sklearn.decomposition import PCA
data = []
# Perform Principal Component Analysis (PCA)
pca = PCA()
pca.fit(data)
# Get the principal component loading matrix
coeff = pca.components_
# Get the projected data matrix
score = pca.transform(data)
# Get the eigenvalues of the principal components
latent = pca.explained_variance_
# Get the explained variance ratio for each principal component
explained = pca.explained_variance_ratio_
# Output the explained variance ratio for the first few principal components
num_components = 3 # Assume output for the first three principal components
print('Explained variance ratio of the first', num_components, 'components:')
print(explained[:num_components])
这就是如何对大型数据进行数据的抽离和进行PCA了,希望本博客能对你有所帮助!