ORA/GSA -- 学习记录

69 篇文章 2 订阅
21 篇文章 3 订阅
本文介绍了过表达分析(ORA)方法,用于整合多组RNA-seq数据中的基因集,通过预定义的基因列表解释表型。该方法通过计算样本中基因集的表达分数,提供了比单个差异基因更强的解释力。文章讨论了ORA的优点(如整合通路信息)和缺点(如阈值依赖、基因间相互作用考虑不足),并给出了使用R包clusterProfiler进行KEGG通路富集分析的代码示例。
摘要由CSDN通过智能技术生成

brief

over-representation analysis(ORA),过表“达”分析,就是我们做多分组的RNAseq数据解析后会得到一些差异表达的gene,有些时候是单独拿出一个差异gene去解释表型,缺点是欠缺证据力度。有些人就把一些相关的差异gene放在一块儿解释,比如这些差异gene在某个通路中高表达/低表达,从而引起了这种表型。

gene set(predetermined sets of genes that are related or coordinated in their expression in some way) 相关或者有某种程度关联的基因组成一个预先定义的gene list 。<------ 也就是我对这些来自样本数据解析后得到的基因感兴趣

所以,引申出来一种解释数据的方法,预先定义一个gene set,然后根据样本中属于gene set的gene 的表达量计算出一种分数,这个分数对表型的解释力度优先于一个差异 gene 表达量的解释力度。

ORA 就是一种计算gene set 分数的方法。其过程大致如下:

  • 通过差异分析(limma或DESeq2等R包)基于不同的阈值设定(p-value以及log2FC)得到不同组间的差异表达基因DEG。
  • 将DEG与感兴趣的基因集做交集(KEGG、GO、MSigDB等数据库),得到一些共同的基因。
  • 基于超几何分布的Fisher检验来评估,抽到这些共同的基因的的计数值是否显著高于随机,即待测功能集在基因列表中是否显著富集。<------- 类似于两联表的fisher test

优点:基因集代表了一种通路/功能/调控/代谢等相关基因的集合,通过对基因集的整合分析,可以更好地解释数据和表型的关系。

缺点:1.仅使用了基因数目信息,而没有利用基因表达水平或表达差异值,为了获得感兴趣或者差异表达基因,需要人为的设置阈值
2.ORA法通常仅使用最显著的基因,而忽略差异不显著的基因。在获得感兴趣的基因时, 往往需要选取合适的阈值, 有可能会丢失显著性较低但比较关键的基因, 导致检测灵敏性的降低
3.将基因同等对待,ORA法假设每个基因都是独立的,忽视了基因在通路内部生物学意义的不同(如调控和被调控基因的不同)及基因间复杂的相互作用
4.ORA方法只关心差异表达基因而不关心其上调、下调的方向,也许同一条通路里既有显著高表达的基因,也有显著低表达的基因,因此最后得到的通路结果很难结合表型进行分析

代码演示

检验感兴趣的一些gene 在KEGG通路中是否富集:

# Over-representation testing using clusterProfiler is based on a hypergeometric test (often referred to as Fisher’s exact test) (Yu 2020).
#  For more background on hypergeometric tests, this handy tutorial explains more about how hypergeometric tests work (Puthier and van Helden 2015).
# or refer to https://blog.csdn.net/Luciferchang/article/details/115684092

library(clusterProfiler)

if (!("org.Hs.eg.db" %in% installed.packages())) {
  # Install this package if it isn't installed yet
  BiocManager::install("org.Hs.eg.db", update = FALSE)
}

library(org.Hs.eg.db)
library(AnnotationDbi)

# step1
# Determine our genes of interest list <--------- predetermined gene set
gs <- read.table("../20240305-manual-gene-set.txt")
# get gene id 
gene_ids <- mapIds(org.Hs.eg.db, keys = gs$V1, keytype = "SYMBOL", column = "ENTREZID")

# step2
# Determine our background set gene list <----- all or detected gene from RNAseq
background_set <- rownames(expr) # expr is gene expression matrix and rownames is all gene
background_gene_id <- mapIds(org.Hs.eg.db, keys = background_set, 
                             keytype = "SYMBOL", column = "ENTREZID")
# step3
# get kegg iterm
library(msigdbr)

hs_msigdb_df <- msigdbr(species = "Homo sapiens")
# Filter the human data frame to the KEGG pathways that are included in the curated gene sets
hs_kegg_df <- hs_msigdb_df %>%
  dplyr::filter(
    gs_cat == "C2", # This is to filter only to the C2 curated gene sets
    gs_subcat == "CP:KEGG" # This is because we only want KEGG pathways
)

# step4
# run fisher test
kegg_ora_results <- enricher(
  gene = gene_ids, # A vector of your genes of interest
  pvalueCutoff = 0.1, # Can choose a FDR cutoff
  pAdjustMethod = "BH", # Method to be used for multiple testing correction
  universe = background_gene_id, # A vector containing your background set genes
  # The pathway information should be a data frame with a term name or
  # identifier and the gene identifiers
  TERM2GENE = dplyr::select(
    hs_kegg_df,
    gs_name,
    human_entrez_gene
  )
)

#  visualization
kegg_result_df <- data.frame(kegg_ora_results@result)
kegg_result_df %>%
  dplyr::filter(p.adjust < 0.1)
enrich_plot <- enrichplot::dotplot(kegg_ora_results)

# Note: using enrichKEGG() is a shortcut for doing ORA using KEGG,
#  but the approach we covered here can be used with any gene sets you’d like!

在这里插入图片描述

Usage: ora [-u user] [-i instance#] [] General -u user/pass use USER/PASS to log in -i instance# append # to ORACLE_SID -sid set ORACLE_SID to sid -top # limit some large queries to on # rows - repeat Repeat an coomand time. Sleep between two calls Command are: - execute: cursors currently being executed - longops: run progression monitor - sessions: currently open sessions - stack get process stack using oradebug - cursors [all] : [all] parsed cursors - sharing : print why cursors are not shared - events [px]: events that someone is waiting for - events [read_by_other_session] events that someone is read by other session - ash [duration] [-f ] active session history for specified period e.g. 'ash 30' to display from [now - 30min] to [now] e.g. 'ash 30 10 -f foo.txt' to display a 10 minutes period from [now - 30min] and store the result in file foo.txt - ash_wait_graph [duration] [-f ] PQ event wait graph using ASH data Arguments are the same as for ash except that the output must be shown with the mxgraph tool - ash_sql Show all ash rows group by sampli_time and event for the specified sql_id - [-u ] degree degree of objects for a given user - [-u ] colstats stats for each table, column - [-u ] tabstats stats for each table - params []: view all parameters, even hidden ones - snap: view all snapshots status - bc: view contents of buffer cache - temp: view used space in temp tbs - asm: Show asm space/free space - space []: view used/free space in a given tbs - binds : display bind capture information for specified cursor - fulltext : display the entire SQL text of the specified statement - last_sql_hash []: hash value of the last styatement executed by the specified sid. If no sid speficied, return the last hash_value of user sessions - openv []: display optimizer env parameters for specified cursor - plan []: get explain plan of a particular cursor - pxplan : get explain plan of a particular cursor and all connected cursor slave SQL - wplan []: get explain plan with work area information - pxwplan : get explain plan with work area information of a particular cursor and all connected cursor slave SQL - eplan []: get explain plan with execution statistics - pxeplan : get explain plan with execution statistics of a particular cursor and all connected cursor slave SQL - gplan : get graphical explain plan of a particular cursor using dot specification - webplan get graphical explain plan of a particular [/] cursor using gdl specification []: optional: child_number, default is zero. optional: decorate to print further node information. default is 0, 1 => print further node information such as cost, filter_predicates etc. 2 => in addition to the above, print row vector information sample usage: # ora webplan 4019453623 print more information (decorate 1) # ora webplan 4019453623/1 1 more information, overload! (decorate 2) # ora webplan 4019453623/1 2 using sql_id along with child number instead of hash value # ora webplan aca4xvmz0rzup/3 1 - hash_to_sqlid : get the sql_id of the cursor given its hash value - sqlid_to_hash : get the hash value of the cursor given its (unquoted) sql_id - exptbs: generate export tablespace script - imptbs: generate import tablespace script - smm [limited]: SQL memory manager stats for active workareas - onepass: Run an ora wplan on all one-pass cursors - mpass: Run an ora wplan on all multi-pass cursors - pga: tell how much pga memory is used - pga_detail | -mem : Gives details on how PGA memory is consumed by a process (given its os PID) or by the set of precesses consuming more than MB of PGA memory (-mem option) - pgasnap [] Snapshot the pga advice stats - pgaadv [-s []] [-o graphfile] [-m min_size]: generate a graph from v and display it or store it in a file if the -o option is used. -s [] to diff with a previous snapshot (see pgasnap cmd) -o [graphfile] to store the result in a file instead of displaying it -m [min_size] only consider workareas with a minimum size - pgaadvhist [-f []] display the advice history for all factors or for factor between f_min and f_max - sga: tell how much sga memory is used - sga_stats: tell how sga is dynamically used - sort_usage: tell how temp tablespace is used in detail - sgasnap [] Snapshot the sga advice stats - sgaadv [-s []] [-o graphfile] generate a graph from v and v and store it in a file if the -o option is used. -s [] to diff with a previous snapshot (see sgasnap cmd) -o [graphfile] to store the result in a file instead of displaying it - process []: display process info with pga memory - version: display Oracle version number - cur_mem [ ] display the memory used for a given or all cursors - shared_mem [ ] detailed dump of cursor shared mem allocations - runtime_mem [ ] detailed dump of cursor runtime memory allocations - all_mem [ ] do all of the memory dumps - pstack |all [] run pstack on specified process (or all if 'all' specified) and store files in specified dir ( when not specified) - idxdesc [username] list all indexes for a given user or for a given user and table - segsize [username] list size of all objects(segments) for given user for a given user and object - tempu list temporary ts usage of all users or for a given user - sqlstats [ ] list sql execution stats (like buffer_gets, phy. reads etc) for a given sql_id/hash_value of statement - optstats [username] list optimizer stats for all tables stored in dictionary for a given user or for a given user and table - userVs list all user Views (user_tables, user_indexes etc) - fixedVs list all V$ Views - fixedXs list all X$ Views - px_processes list all px processes (QC and slaves) - cursor_summary summarize stats about (un)pinned cursors - rowcache summarizes row cache statistics - monitor_list lists all the statements that have been monitored - monitor [xml]: wraps dbms_sqltune.report_sql_monitor(). Directly passe the arguments to the PL/SQL procedure. Args are: sql_id, session_id, session_serial, sql_exec_start, sql_exec_id, inst_id, instance_id_filter, parallel_filter, report_level, type. Examples: - monitor xml shows XML report - monitor show last monitored stmt - monitor sql_id=>'8vz99cy9bydv8', session_id=>105 will show monitor info for sql_id 8vz99cy9bydv8 and session_id 105 Use simply ora monitor 8vz99cy9bydv8 to display monitoring information for sql_id 8vz99cy9bydv8. Syntax for parallel filters is: [qc][servers([,] [,] )] Use /*+ monitor */ to force monitoring. - monitor_old [ash_all] [] [qc| [ []]] Old version of SQL monitoring, use a SQL query versus the report_sql_monitor() package. Display monitoring info for the LAST execution of the specified cursor. Cursor response time needs to be at least 5s for monitoring to start (use the monitor hint to force monitoring). Without any parameter, will display monitoring info for the last cursor that was monitored - ash_all will aggregate ash data over all executions of the cursor (useful for short queries that are executed many times). If parallel: - qc to see only data for qc - slave_grp# to see only data for one parallelizer - slave_grp# + slave_set# to see only data for one slave set of one parallelizer, - slave_grp# + slave_set# + slave# to see data only for the specified slave - sql_task [progress | interrupt | history | report ] progress: progress monitoring for executing sql tasks interrupt: interrupt an executing sql task history: print a history of last n executions report: get a sql tune report - sql_use_temp_segment Find Who And What SQL Is Using Temp Segments. - sh Run a shell command. E.g. ora repeat 5 10 sh 'ps -edf | grep DESC' - awr_dbid Show AWR dbid - awr_dbtime [dbid] Show AWR dbtime - awr_dbtime [dbid] [inst] Show AWR dbtime - awr_dbtime_order [dbid] Show AWR dbtime order by desc - awr_sql_elaps_time [dbid] Show AWR SQL elapsed time - awr_sql_elaps_time [dbid] [inst] Show AWR SQL elapsed time - awr_sql_elaps_time_order [dbid] Show AWR SQL elapsed time order by desc - awr_logical_reads_order [dbid] - awr_logical_reads [dbid] Show AWR logical reads M Show AWR logical reads M order by desc - awr_physical_reads [dbid] Show AWR physical reads M - awr_physical_reads_order [dbid] Show AWR physical reads M order by desc - awr_db_cpu_per [dbid] [inst] Show AWR db_cpu_time cpu percent - awr_user_cpu_per [dbid] [inst] Show AWR oracle user_time cpu percent including backgroud process - awr_sql sql_id [dbid] Show AWR sql_id executions, per elapsed time. - awr_fulltext sql_id [dbid] Show AWR sql fulltext - awr_plan sql_id plan_hash [dbid] Show AWR sql plan, if plan_hash is null, show all plans. - awr_binds sql_id end_snap_id [dbid] Show AWR bind values in end_snap_id. - tab_frag owner [frag_percent] Show table fragment. - index_frag owner [frag_percent] Show index fragment. - rman_fullrestore_scripts dest_dbfile_dir Generate rman full database restore scripts - top_buffers_gets Top 10 by buffer gets > 10000 - top_physical_reads Top 10 by Physical Reads (disk_reads > 1000) - top_executions Top 10 by Executions > 100 - top_parse_calls Top 10 by Parse Calls > 1000 - top_sharable_memory Top 10 by Sharable Memory > 1M - top_version_count Top 10 by Version Count > 20 - top_cpu_usage Top 10 by CPU usage (cpu_time) - top_running_time Top 10 by Running Time (first_load_time desc) - create_tbs path size Create test database's tablespace script - create_tbs path size [dbid]Create dbid's test database's tablespace script - hold_txlock Show sessions holding a TX lock - wait_txlock Show sessions waiting a TX lock - rowid Display rowid's file_id, file_name, block info, object info, extent_id Memory: The detailed memory dumps need to have events set to work. The events bellow can be added to the init.ora file event="10277 trace name context forever, level 10" # mutable mem event="10235 trace name context forever, level 4" # shared mem NOTE ==== - Set environment variable ORA_USE_HASH to 1 to get SQL hash values instead of SQL ids - Set environment variable DBUSER to change default connect string which is "/ as sysdba" - Set environment variable ORA_TMP to the default temp directory (default if /tmp when not set)
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值