Question 2: Lung Cancer Data
The data.txt file is a 12625 x 56 matrix; each column (row) of the matrix corresponds to the individual case (gene). Among the 56 cases,
- Columns 1~20: pulmonary carcinoid samples (Carcinoid);
- Columns 21~33: colon cancer metastasis samples (Colon);
- Columns 34~50: normal lung samples (Normal);
- Columns 51~56: small cell carcinoma samples (SmallCell).
Before the following analyses, please first center each row of the data, i.e. remove the mean of each row and transpose the matrix.
library(tidyverse)
lungcancer = read.table("lungcancer.txt")
## center each row of the data and transpose
data = data.frame(t(
该博客分析了12625x56的肺癌数据矩阵,进行了主成分分析(PCA)、名义逻辑回归、LDA和SVM,以及K-means和层次聚类。PCA结果显示前三组件解释了55.8%的方差,用于散点图绘制。Logistic回归、LDA和SVM预测结果一致,准确率为1。然而,K-means和层次聚类未能合理分类样本,与实际类别不符。
订阅专栏 解锁全文
2439

被折叠的 条评论
为什么被折叠?



