简 介
RNA-seq 数据集在识别下游分析和数据挖掘工作的生物学相关特征方面提出了相当大的挑战。标准方法涉及差异基因表达 (DGE) 分析,但由于其单变量性质,其有效性可能受到数据的限制。在复杂的数据集中,另一种方法涉及使用各种机器学习 (ML) 工具,这些工具试图理解特征之间的非线性关系,并专注于概括性而不是统计显著性。这种方法将导致生成多个特征列表,这些特征列表可能在分类性能指标方面表现出相似性。因此,迫切需要一个内聚的工作流程,使用不同的机器学习方法无缝集成鲁棒的特征选择,同时评估结果特征列表的生物学相关性。考虑到两组标准,这种组合方法将能够确定最佳执行列表的优先级。
今天介绍一下 GeneSelectR 软件包,创新地结合了机器学习和生物信息学数据挖掘方法,以增强特征选择。使用 GeneSelectR,可以使用各种ML方法和用户定义的参数从规范化的 RNA-seq 数据集中选择特征。接下来是评估与基因本体 (GO) 富集分析的生物学相关性,以及对结果 GO 术语的语义相似性分析。此外,计算相似系数和 GO 感兴趣项的分数。
因此,GeneSelectR 优化了机器学习性能,并严格评估了各种列表的生物学相关性,提供了一种根据生物学问题优先考虑特征列表的方法。当应用于 TCGA-BRCA 数据集时,GeneSelectR工作流使用不同的 ML 方法和 DGE 分析生成了几个特征列表。通过利用 GeneSelectR 中的各种功能,可以根据 ML 性能和生物学相关性评估不同的列表。这种全面的评估有助于选择表现最好的列表,这些列表既表现出强大的机器学习性能,又与生物学问题高度相关,同时保持了可管理的高度具体的特征。
特征选择过程
核心功能GeneSelectR使用各种方法进行基因选择,并通过交叉验证评估其性能。还支持超参数调优、排列特征重要性计算等。
软件包安装
先安装 GeneSelectR 软件包
# install.packages("devtools")
devtools::install_github("dzhakparov/GeneSelectR")
window 11 上安装 Anaconda3
下载地址:https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/Anaconda3-2019.10-Windows-x86_64.exe
配置 GeneSelectR 环境
安装python的软件包有点多,需要挺长时间,耐心等待。
GeneSelectR::configure_environment()
Conda is installed.
The conda environment GeneSelectR_env does not exist. Do you want to create it?
1: yes
2: no
Selection: 1
Creating conda environment and installing required packages
+ "D:/Program Files/Anaconda3/condabin/conda.bat" "create" "--yes" "--name" "GeneSelectR_env" "python=3.8" "numpy <= 1.19" "scikit-learn <= 0.22.1" "pandas" "boruta_py" "scikit-optimize" "--quiet" "-c" "conda-forge"
WARNING: A space was detected in your requested environment path
'D:\Program Files\Anaconda3\envs\GeneSelectR_env'
Spaces in paths can sometimes be problematic.
Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... done
## Package Plan ##
environment location: D:\Program Files\Anaconda3\envs\GeneSelectR_env
added / updated specs:
- boruta_py
- numpy[version='<=1.19']
- pandas
- python=3.8
- scikit-learn[version='<=0.22.1']
- scikit-optimize
The following packages will be downloaded:
package | build
---------------------------|-----------------
boruta_py-0.3 | py_0 51 KB conda-forge
brotli-1.1.0 | hcfcfb64_1 19 KB conda-forge
brotli-bin-1.1.0 | hcfcfb64_1 20 KB conda-forge
bzip2-1.0.8 | hcfcfb64_5 122 KB conda-forge
ca-certificates-2024.2.2 | h56e8100_0 152 KB conda-forge
certifi-2024.2.2 | pyhd8ed1ab_0 157 KB conda-forge
cycler-0.12.1 | pyhd8ed1ab_0 13 KB conda-forge
fonttools-4.50.0 | py38h91455d4_0 1.8 MB conda-forge
freetype-2.12.1 | hdaf720e_2 498 KB conda-forge
intel-openmp-2024.0.0 | h57928b3_49841 2.2 MB conda-forge
joblib-1.3.2 | pyhd8ed1ab_0 216 KB conda-forge
kiwisolver-1.4.5 | py38hb1fd069_1 54 KB conda-forge
lcms2-2.16 | h67d730c_0 496 KB conda-forge
lerc-4.0.0 | h63175ca_0 190 KB conda-forge
libblas-3.9.0 | 21_win64_mkl 4.8 MB conda-forge
libbrotlicommon-1.1.0 | hcfcfb64_1 69 KB conda-forge
libbrotlidec-1.1.0 | hcfcfb64_1 32 KB conda-forge
libbrotlienc-1.1.0 | hcfcfb64_1 241 KB conda-forge
libcblas-3.9.0 | 21_win64_mkl 4.8 MB conda-forge
libdeflate-1.20 | hcfcfb64_0 152 KB conda-forge
libffi-3.4.2 | h8ffe710_5 41 KB conda-forge
libhwloc-2.9.3 |default_haede6df_1009 2.5 MB conda-forge
libiconv-1.17 | hcfcfb64_2 621 KB conda-forge
libjpeg-turbo-3.0.0 | hcfcfb64_1 804 KB conda-forge
liblapack-3.9.0 | 21_win64_mkl 4.8 MB conda-forge
libpng-1.6.43 | h19919ed_0 339 KB conda-forge
libsqlite-3.45.2 | hcfcfb64_0 849 KB conda-forge
libtiff-4.6.0 | hddb2be6_3 769 KB conda-forge
libwebp-base-1.3.2 | hcfcfb64_0 263 KB conda-forge
libxcb-1.15 | hcd874cb_0 947 KB conda-forge
libxml2-2.12.6 | hc3477c8_1 1.6 MB conda-forge
libzlib-1.2.13 | hcfcfb64_5 54 KB conda-forge
m2w64-gcc-libgfortran-5.3.0| 6 342 KB conda-forge
m2w64-gcc-libs-5.3.0 | 7 520 KB conda-forge
m2w64-gcc-libs-core-5.3.0 | 7 214 KB conda-forge
m2w64-gmp-6.1.0 | 2 726 KB conda-forge
m2w64-libwinpthread-git-5.0.0.4634.697f757| 2 31 KB conda-forge
matplotlib-base-3.5.1 | py38h1f000d6_0 7.3 MB conda-forge
mkl-2024.0.0 | h66d3029_49657 103.5 MB conda-forge
msys2-conda-epoch-20160418 | 1 3 KB conda-forge
munkres-1.1.4 | pyh9f0ad1d_0 12 KB conda-forge
numpy-1.19.0 | py38h72c728b_0 4.9 MB conda-forge
openjpeg-2.5.2 | h3d672ee_0 232 KB conda-forge
openssl-3.2.1 | hcfcfb64_1 7.8 MB conda-forge
packaging-24.0 | pyhd8ed1ab_0 49 KB conda-forge
pandas-1.4.1 | py38h5d928e2_0 11.0 MB conda-forge
pillow-10.2.0 | py38hc375fad_0 39.7 MB conda-forge
pip-24.0 | pyhd8ed1ab_0 1.3 MB conda-forge
pthread-stubs-0.4 | hcd874cb_1001 6 KB conda-forge
pthreads-win32-2.9.1 | hfa6e2cd_3 141 KB conda-forge
pyaml-23.12.0 | pyhd8ed1ab_0 26 KB conda-forge
pyparsing-3.1.2 | pyhd8ed1ab_0 87 KB conda-forge
python-3.8.19 |h4de0772_0_cpython 15.3 MB conda-forge
python-dateutil-2.9.0 | pyhd8ed1ab_0 218 KB conda-forge
python_abi-3.8 | 4_cp38 7 KB conda-forge
pytz-2024.1 | pyhd8ed1ab_0 184 KB conda-forge
pyyaml-6.0.1 | py38h91455d4_1 148 KB conda-forge
scikit-learn-0.22.1 | py38h7208079_1 6.2 MB conda-forge
scikit-optimize-0.9.0 | pyhd8ed1ab_1 74 KB conda-forge
scipy-1.8.0 | py38ha1292f7_1 27.2 MB conda-forge
setuptools-69.2.0 | pyhd8ed1ab_0 460 KB conda-forge
six-1.16.0 | pyh6c4a22f_0 14 KB conda-forge
tbb-2021.11.0 | h91493d7_1 158 KB conda-forge
tk-8.6.13 | h5226925_1 3.3 MB conda-forge
ucrt-10.0.22621.0 | h57928b3_0 1.2 MB conda-forge
unicodedata2-15.1.0 | py38h91455d4_0 362 KB conda-forge
vc-14.3 | hcf57466_18 17 KB conda-forge
vc14_runtime-14.38.33130 | h82b7239_18 732 KB conda-forge
vs2015_runtime-14.38.33130 | hcb4865c_18 17 KB conda-forge
wheel-0.43.0 | pyhd8ed1ab_0 57 KB conda-forge
xorg-libxau-1.0.11 | hcd874cb_0 50 KB conda-forge
xorg-libxdmcp-1.1.3 | hcd874cb_0 66 KB conda-forge
xz-5.2.6 | h8d14728_0 213 KB conda-forge
yaml-0.2.5 | h8ffe710_2 62 KB conda-forge
zstd-1.5.5 | h12be248_0 335 KB conda-forge
------------------------------------------------------------
Total: 263.7 MB
The below NEW packages will be INSTALLED:
Preparing transaction: ...working... done
Verifying transaction: ...working... done
Executing transaction: ...working... done
Please restart your R session for the changes to take effect.
每次分析需要通过设置正确的 conda 工作环境来重新启动 GeneSelectR 分析。
GeneSelectR::set_reticulate_python()
# Set RETICULATE_PYTHON to D:\Program
# Files\Anaconda3\envs\GeneSelectR_env/python.exe for the current R session.
library(GeneSelectR)
# rest of your code
加载 GeneSelectR 软件包,发现还需要安装其他的依赖包,那就选择1. Yes 继续安装好了。
The following Bioconductor packages are required for full functionality of GeneSelectR: simplifyEnrichment.
Do you want to install them now?
1: Yes
2: No
Selection: 1
'getOption("repos")' replaces Bioconductor standard repositories, see 'help("repositories",
package = "BiocManager")' for details.
Replacement repositories:
CRAN: https://mirrors.tuna.tsinghua.edu.cn/CRAN/
Bioconductor version 3.18 (BiocManager 1.30.22), R 4.3.1 (2023-06-16 ucrt)
Installing package(s) 'simplifyEnrichment'
还安装相依关系‘NLP’, ‘tm’, ‘org.Hs.eg.db’, ‘slam’, ‘proxyC’
准备足够的资源
我是用的是工作站,all cores will be used,大概是1小时跑出结果,这还只是个测试数据而已,因此若要使用这个软件包记得保证有足够的资源哦。
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done 18 tasks | elapsed: 44.2s
[Parallel(n_jobs=-1)]: Done 50 out of 50 | elapsed: 2.2min finished
Performing Permuation Importance Calculation
Fitting the data split: 5
Fitting pipeline for Lasso feature selection method
数据读取
数据矩阵应该是一个数据框架,以样本为行,以基因为列。
该数据集是从149名非洲儿童的血液样本中获得的大量RNAseq数据集,这些儿童被分为患有特应性皮炎(AD)和健康对照(HC)的儿童。此外,整个数据集包含儿童所在位置(城市和农村)的分层变量。此数据片段仅包含Urban示例。列代表基因,行代表样本。
load("./GeneSelectR-master/tests/testthat/fixtures/UrbanRandomSubset.rda")
head(UrbanRandomSubset[, 1:10])
## treatment ENSG00000174371__EXO1 ENSG00000123600__METTL8
## CA002YF Urban_AD 2.984508e+00 1.816292e+00
## CA009ST Urban_AD 0.565690377 2.010920739
## CA010EB Urban_AD 3.6708983671 3.1615995794
## CA011LQ Urban_AD 1.408499e+00 2.674512e+00
## CA014LB Urban_AD 2.5183409223 1.8185148529
## CA015AM Urban_AD 2.6664729139 3.0494281366
## ENSG00000154124__OTULIN ENSG00000006607__FARP2 ENSG00000135686__KLHL36
## CA002YF 4.959833e+00 4.125659e+00 6.584945e+00
## CA009ST 4.601555569 4.873391176 6.677867796
## CA010EB 4.6202078459 4.8954199262 6.0682238886
## CA011LQ 3.909325e+00 3.852403e+00 6.041770e+00
## CA014LB 3.8273512783 4.6382172341 6.3209867192
## CA015AM 4.4946295998 4.4197758295 5.9998300955
## ENSG00000130348__QRSL1 ENSG00000268041__AC010616.1
## CA002YF 4.298858e+00 3.150753e+00
## CA009ST 4.043031406 3.090960290
## CA010EB 4.4417887589 3.3511595489
## CA011LQ 4.185302e+00 3.049633e+00
## CA014LB 4.3729324835 2.6071577494
## CA015AM 4.6819749406 2.6885696358
## ENSG00000163812__ZDHHC3 ENSG00000233041__PHGR1
## CA002YF 4.617841e+00 -2.934457e+00
## CA009ST 5.161640677 -2.934457404
## CA010EB 4.9131957592 -2.9344574043
## CA011LQ 5.061451e+00 -2.318961e+00
## CA014LB 5.1488088821 -2.9344574043
## CA015AM 5.0960550151 -2.9344574043
table(UrbanRandomSubset$treatment)
##
## Urban_AD Urban_Healthy
## 31 29
library(dplyr)
### Feature Selection Procedure Basic Usage
X <- UrbanRandomSubset %>%
dplyr::select(-treatment) # get the feature matrix
y <- UrbanRandomSubset["treatment"] # store the data point label in a separate vector
y <- as.factor(y[, 1])
实例操作
这个需要运行很久,注意时间问题,njobs = -1 表示 所有的cores都用上了,我做的时候是16个,大概1小时,没有资源的就别尝试了,多次卡断了哦。
或者出结果后立刻保持结果对象,以方便后面即使调取,不用反复等待。
默认设置:如果不提供,则建立默认的特征选择方法和超参数网格。默认情况下,有四种选择特征的方法:
1. 单变量特征选择(Univariate feature selection);
2. L1惩罚逻辑回归(Logistic regression with L1 penalty);
selection_results <- GeneSelectR(X = X, y = y,njobs = -1,
perform_test_split = FALSE,
calculate_permutation_importance=TRUE,
perform_test_split=TRUE) # all cores will be used used
saveRDS(selection_results,file = 'selection_results.rds') #保存 rds
保证好之后直接调取结果对象即可:
selection_results <- readRDS("selection_results.rds") # 读取 rd
结果解读
结果对象的解读
结果对象selection_results 包括 6 部分内容,如下:
The function returns an object of class “PipelineResults”, containing:
‘best_pipeline’: A named list containing parameters of the best performer pipeline.
‘cv_results’: 每个管道的交叉验证结果.
‘inbuilt_feature_importance’: 聚合的内置特性重要性得分.
‘test_metrics’: 如果perform_test_split参数设置为TRUE,则返回每个管道的测试指标的数据框.
‘cv_mean_score’: 汇总交叉验证平均分数的数据框.
‘permutation_importance’: 如果计算了排列重要性,则返回其平均值.
特征重要性绘图
最后,您可以通过调用绘图函数来可视化每个特征重要性方法计算的最重要特征:
plot_feature_importance(selection_results, top_n_features = 10)
## $Lasso
##
## $Univariate
##
## $RandomForest
##
## $boruta

机器学习的矩阵
您可以使用以下函数绘制特征选择过程度量:
plot_metrics(selection_results)
# or access it as a dataframe
selection_results@test_metrics
## # A tibble: 4 × 9
## method f1_mean f1_sd recall_mean recall_sd precision_mean precision_sd
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Lasso 0.697 0.127 0.7 0.112 0.725 0.143
## 2 RandomForest 0.694 0.103 0.683 0.109 0.761 0.0901
## 3 Univariate 0.762 0.146 0.75 0.156 0.811 0.135
## 4 boruta 0.691 0.0837 0.683 0.0913 0.719 0.0683
## # ℹ 2 more variables: accuracy_mean <dbl>, accuracy_sd <dbl>
selection_results@cv_mean_score
## method mean_score sd_score
## 1 boruta 0.6285714 0.06298283
## 2 Lasso 0.6650000 0.05293672
## 3 RandomForest 0.6714286 0.03144074
## 4 Univariate 0.7014286 0.04755502
不同算法中的基因列表重叠
此外,您还可以检查特征选择列表中的基因是否具有重叠的特征。要做到这一点,请使用以下命令:
overlap <- calculate_overlap_coefficients(selection_results)
overlap
## $inbuilt_feature_importance_coefficient
## $inbuilt_feature_importance_coefficient$overlap
## Lasso Univariate RandomForest boruta
## Lasso 1 1.00 1.00 1.00
## Univariate 1 1.00 0.96 0.78
## RandomForest 1 0.96 1.00 1.00
## boruta 1 0.78 1.00 1.00
##
## $inbuilt_feature_importance_coefficient$jaccard
## Lasso Univariate RandomForest boruta
## Lasso 1.00 0.52 0.94 0.18
## Univariate 0.52 1.00 0.52 0.25
## RandomForest 0.94 0.52 1.00 0.19
## boruta 0.18 0.25 0.19 1.00
##
## $inbuilt_feature_importance_coefficient$soerensen
## Lasso Univariate RandomForest boruta
## Lasso 1.00 0.68 0.97 0.31
## Univariate 0.68 1.00 0.68 0.40
## RandomForest 0.97 0.68 1.00 0.32
## boruta 0.31 0.40 0.32 1.00
##
##
## $permutation_importance_coefficients
## $permutation_importance_coefficients$overlap
## Lasso Univariate RandomForest boruta
## Lasso 1.00 0.37 0.47 0.33
## Univariate 0.37 1.00 0.47 0.33
## RandomForest 0.47 0.47 1.00 0.33
## boruta 0.33 0.33 0.33 1.00
##
## $permutation_importance_coefficients$jaccard
## Lasso Univariate RandomForest boruta
## Lasso 1.00 0.17 0.18 0.03
## Univariate 0.17 1.00 0.26 0.05
## RandomForest 0.18 0.26 1.00 0.06
## boruta 0.03 0.05 0.06 1.00
##
## $permutation_importance_coefficients$soerensen
## Lasso Univariate RandomForest boruta
## Lasso 1.00 0.29 0.31 0.06
## Univariate 0.29 1.00 0.41 0.09
## RandomForest 0.31 0.41 1.00 0.11
## boruta 0.06 0.09 0.11 1.00
这将返回一个数据框,其中演示了内置特征重要性和排列重要性(如果计算的话)的三种类型的重叠系数:Soerensen-Dice, overlap和Jaccard。这些系数也可以可视化为重叠热图。要做到这一点,请做到以下几点:
plot_overlap_heatmaps(overlap)
此外,如果您有任何自定义列表(例如差异基因表达列表),您可以将其作为这样的参数传递:
custom_list <- list(custom_list = c("char1", "char2", "char3", "char4", "char5"),
custom_list2 = c("char1", "char2", "char3", "char4", "char5"))
overlap1 <- calculate_overlap_coefficients(selection_results, custom_lists = custom_list)
plot_overlap_heatmaps(overlap1)
Upset plot
要获得特征列表之间交点的确切数量,您可以使用upset plot函数:
plot_upset(selection_results)
## $inbuilt_importance
##
## $permutation_importance
# plot upset with custom lists
plot_upset(selection_results, custom_lists = custom_list)
## $inbuilt_importance
##
## $permutation_importance
GO 富集分析
为了方便起见,实现了一个用于clusterprofiler(链接)GO富集的包装器函数,以及一个获取基因注释的函数。运行GeneSelectR后,要获取基因注释,请执行以下操作:
proxy <- httr::use_proxy(Sys.getenv("http_proxy"))
httr::set_config(proxy)
AnnotationHub::setAnnotationHubOption("PROXY", proxy) ## 添加以上三句
ah <- AnnotationHub::AnnotationHub()
# Assuming valid proxy connection through ':1' If you experience connection
# issues consider using 'localHub=TRUE'
# |===================================================================| 100%
human_ens <- AnnotationHub::query(ah, c("Homo sapiens", "EnsDb"))
human_ens <- human_ens[["AH98047"]]
# BiocManager::install('ensembldb')
annotations_ahb <- ensembldb::genes(human_ens, return.type = "data.frame") %>%
dplyr::select(gene_id, gene_name, entrezid, gene_biotype)
在做注释的时候发现生产结果变量selection_results中的feature为ENSG00000196405__EVL格式,所有我们需要将其分割为 ENSG00000196405或 EVL,这里面支持三种类型的基因ID 为:"ENTREZ", "ENSEMBL", "SYMBOL"。
selection_results@inbuilt_feature_importance$Lasso$feature = substr(selection_results@inbuilt_feature_importance$Lasso$feature,
1, 15)
selection_results@inbuilt_feature_importance$Univariate$feature = substr(selection_results@inbuilt_feature_importance$Univariate$feature,
1, 15)
selection_results@inbuilt_feature_importance$RandomForest$feature = substr(selection_results@inbuilt_feature_importance$RandomForest$feature,
1, 15)
selection_results@inbuilt_feature_importance$boruta$feature = substr(selection_results@inbuilt_feature_importance$boruta$feature,
1, 15)
selection_results@permutation_importance$Lasso$feature = substr(selection_results@permutation_importance$Lasso$feature,
1, 15)
selection_results@permutation_importance$Univariate$feature = substr(selection_results@permutation_importance$Univariate$feature,
1, 15)
selection_results@permutation_importance$RandomForest$feature = substr(selection_results@permutation_importance$RandomForest$feature,
1, 15)
selection_results@permutation_importance$boruta$feature = substr(selection_results@permutation_importance$boruta$feature,
1, 15)
有一个包装器函数可以使用clusterprofiler包运行GO富集分析。要使用默认设置运行GO富集分析,只需运行:
annotations_df <- annotate_gene_lists(pipeline_results = selection_results, annotations_ahb = annotations_ahb,
format = "ENSEMBL")
annotated_GO <- GO_enrichment_analysis(annotations_df)
## Visualization of Parent Term Fractions
annot_child_fractions <- compute_GO_child_term_metrics(GO_data = annotated_GO, GO_terms = c("GO:0002376",
"GO:0044419"), plot = TRUE)
Semantic Similarity Analysis
分析的最后一步是对每个列表中的GO术语进行聚类和语义相似度分析。这是通过simplifyenrichment R包完成的。为了方便数据输入,实现了simplifyGOFromMultipleLists()函数的包装器:
#install.packages("magick")
pdf("simplify_enrichment.pdf",h=8,w=10)
hmap <- run_simplify_enrichment(annotated_GO,
method = 'louvain',
measure = 'Resnik',
padj_cutoff=0.05,
ont = "BP")
dev.off()
Reference
Damir Zhakparov, Kathleen Moriarty, Damian Roqueiro, Katja Baerenfaller
bioRxiv 2024.01.22.576646; doi: https://doi.org/10.1101/2024.01.22.576646
基于机器学习构建临床预测模型
MachineLearning 2. 因子分析(Factor Analysis)
MachineLearning 3. 聚类分析(Cluster Analysis)
MachineLearning 4. 癌症诊断方法之 K-邻近算法(KNN)
MachineLearning 5. 癌症诊断和分子分型方法之支持向量机(SVM)
MachineLearning 6. 癌症诊断机器学习之分类树(Classification Trees)
MachineLearning 7. 癌症诊断机器学习之回归树(Regression Trees)
MachineLearning 8. 癌症诊断机器学习之随机森林(Random Forest)
MachineLearning 9. 癌症诊断机器学习之梯度提升算法(Gradient Boosting)
MachineLearning 10. 癌症诊断机器学习之神经网络(Neural network)
MachineLearning 11. 机器学习之随机森林生存分析(randomForestSRC)
MachineLearning 12. 机器学习之降维方法t-SNE及可视化(Rtsne)
MachineLearning 13. 机器学习之降维方法UMAP及可视化 (umap)
MachineLearning 14. 机器学习之集成分类器(AdaBoost)
MachineLearning 15. 机器学习之集成分类器(LogitBoost)
MachineLearning 16. 机器学习之梯度提升机(GBM)
MachineLearning 17. 机器学习之围绕中心点划分算法(PAM)
MachineLearning 18. 机器学习之贝叶斯分析类器(Naive Bayes)
MachineLearning 19. 机器学习之神经网络分类器(NNET)
MachineLearning 20. 机器学习之袋装分类回归树(Bagged CART)
MachineLearning 21. 机器学习之临床医学上的生存分析 (xgboost)
MachineLearning 22. 机器学习之有监督主成分分析筛选基因 (SuperPC)
MachineLearning 23. 机器学习之岭回归预测基因型和表型 (Ridge)
MachineLearning 24. 机器学习之似然增强Cox 比例风险模型筛选变量及预后估计 (CoxBoost)
MachineLearning 25. 机器学习之支持向量机应用于生存分析 (survivalsvm)
MachineLearning 26. 机器学习之弹性网络算法应用于生存分析 (Enet)
MachineLearning 27. 机器学习之逐步Cox回归筛选变量 (StepCox)
MachineLearning 28. 机器学习之偏最小二乘回归应用于生存分析 (plsRcox)
MachineLearning 29. 机器学习之嵌套交叉验证 (Nested CV)
MachineLearning 30. 机器学习之特征选择森林之神 (Boruta)
桓峰基因,铸造成功的您!
未来桓峰基因公众号将不间断的推出单细胞系列生信分析教程,
敬请期待!!
桓峰基因官网正式上线,请大家多多关注,还有很多不足之处,大家多多指正!http://www.kyohogene.com/
桓峰基因和投必得合作,文章润色优惠85折,需要文章润色的老师可以直接到网站输入领取桓峰基因专属优惠券码:KYOHOGENE,然后上传,付款时选择桓峰基因优惠券即可享受85折优惠哦!https://www.topeditsci.com/