读取文件后进行PCA操作

本文介绍了如何在Python中使用主成分分析(PCA)处理大型数据,包括数据读取、归一化步骤以及使用sklearn库实现PCA的详细过程。
摘要由CSDN通过智能技术生成

遇到需要拟合若干数据到几个主要元素的问题时,我们要采取PCA(主成分分析)来解决这个问题,首先我们要将若干数据读取到python中,代码如下:(代码中有详细的注释帮助理解)

import openpyxl

# Define a list of filenames
filenames = []

# Define transportation modes corresponding to each number
excel_file_names = {1: 'walk',
                    2: 'car',
                    3: 'run',
                    4: 'scooter',
                    5: 'bike',
                    6: 'tramway',
                    7: 'bus',
                    8: 'train'}

# Counter for assigning transportation modes
cnt = 1

# Iterate over the list of filenames
for name in filenames:
    # Create a new Excel workbook
    wb = openpyxl.Workbook()

    # Create a worksheet named 'data'
    ws = wb.create_sheet('data')

    # Open the text file
    with open(f'E:/python_analyse_data/multimodal_transport_analytics/Collecty_data/{name}.txt', 'r') as data_file:
        # Read the contents of the data file line by line
        contents = data_file.readline()
        while contents != '':
            # Split each line by comma and convert it into a list
            xlsx_contents = contents.strip().split(',')
            print(xlsx_contents)

            # Append the data to the Excel worksheet
            ws.append(xlsx_contents)

            # Read the next line of the data file
            contents = data_file.readline()

    # Get the corresponding Excel filename based on the counter
    excel_file_name = excel_file_names[cnt]

    # Save the workbook as an Excel file
    wb.save(f'data_to_xlsx{excel_file_name}.xlsx')

    # Increment the counter for the next transportation mode
    cnt += 1

然后我们要用到主成分分析法,将若干数据变成几个主要数据(我们题中是三个主要数据),首先先对数据进行归一化,再用python中自带的PCA库解决问题,代码如下(代码中有详细的注释便于理解):

import numpy as np

# Sample matrix
M = []

# Find the minimum and maximum values for each column
min_vals = np.min(M, axis=0)  # Find the minimum value for each column
max_vals = np.max(M, axis=0)  # Find the maximum value for each column

# Normalization
normalized_matrix = (M - min_vals) / (max_vals - min_vals)

from sklearn.decomposition import PCA

data = []

# Perform Principal Component Analysis (PCA)
pca = PCA()
pca.fit(data)

# Get the principal component loading matrix
coeff = pca.components_

# Get the projected data matrix
score = pca.transform(data)

# Get the eigenvalues of the principal components
latent = pca.explained_variance_

# Get the explained variance ratio for each principal component
explained = pca.explained_variance_ratio_

# Output the explained variance ratio for the first few principal components
num_components = 3  # Assume output for the first three principal components
print('Explained variance ratio of the first', num_components, 'components:')
print(explained[:num_components])

这就是如何对大型数据进行数据的抽离和进行PCA了,希望本博客能对你有所帮助!

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值