SCS【32】基于scRNA-seq数据中推断单细胞的eQTLs (eQTLsingle)

f14007862e04569ce21b82fd1ad3eea0a.png


单细胞生信分析教程

桓峰基因公众号推出单细胞生信分析教程并配有视频在线教程,目前整理出来的相关教程目录如下:

Topic 6. 克隆进化之 Canopy

Topic 7. 克隆进化之 Cardelino

Topic 8. 克隆进化之 RobustClone

SCS【1】今天开启单细胞之旅,述说单细胞测序的前世今生

SCS【2】单细胞转录组 之 cellranger

SCS【3】单细胞转录组数据 GEO 下载及读取

SCS【4】单细胞转录组数据可视化分析 (Seurat 4.0)

SCS【5】单细胞转录组数据可视化分析 (scater)

SCS【6】单细胞转录组之细胞类型自动注释 (SingleR)

SCS【7】单细胞转录组之轨迹分析 (Monocle 3) 聚类、分类和计数细胞

SCS【8】单细胞转录组之筛选标记基因 (Monocle 3)

SCS【9】单细胞转录组之构建细胞轨迹 (Monocle 3)

SCS【10】单细胞转录组之差异表达分析 (Monocle 3)

SCS【11】单细胞ATAC-seq 可视化分析 (Cicero)

SCS【12】单细胞转录组之评估不同单细胞亚群的分化潜能 (Cytotrace)

SCS【13】单细胞转录组之识别细胞对“基因集”的响应 (AUCell)

SCS【14】单细胞调节网络推理和聚类 (SCENIC)

SCS【15】细胞交互:受体-配体及其相互作用的细胞通讯数据库 (CellPhoneDB)

SCS【16】从肿瘤单细胞RNA-Seq数据中推断拷贝数变化 (inferCNV)

SCS【17】从单细胞转录组推断肿瘤的CNV和亚克隆 (copyKAT)

SCS【18】细胞交互:受体-配体及其相互作用的细胞通讯数据库 (iTALK)

SCS【19】单细胞自动注释细胞类型 (Symphony)

SCS【20】单细胞数据估计组织中细胞类型(Music)

SCS【21】单细胞空间转录组可视化 (Seurat V5)

SCS【22】单细胞转录组之 RNA 速度估计 (Velocyto.R)

SCS【23】单细胞转录组之数据整合 (Harmony)

SCS【24】单细胞数据量化代谢的计算方法 (scMetabolism)

SCS【25】单细胞细胞间通信第一部分细胞通讯可视化(CellChat)

SCS【26】单细胞细胞间通信第二部分通信网络的系统分析(CellChat)

SCS【27】单细胞转录组之识别标记基因 (scran)

SCS【28】单细胞转录组加权基因共表达网络分析(hdWGCNA)

SCS【29】单细胞基因富集分析 (singleseqgset)

SCS【30】单细胞空间转录组学数据库(STOmics DB)

SCS【31】减少障碍,加速单细胞研究数据库(Single Cell PORTAL)


简  介

eQTL 研究对于理解基因组调控至关重要。遗传变异对基因调控的影响是细胞类型特异性和细胞环境相关的,因此在单细胞水平上研究 eQTL 是至关重要的。理想的解决方案是同时使用来自同一细胞的突变和表达数据。然而,目前这种单细胞配对数据的技术还不成熟。我们提出了一种新的方法,eQTLsingle 仅用 scRNA-seq 数据发现 eQTL,而不需要基因组数据。从 scRNA-seq 数据中检测突变,利用零膨胀负二项 (zero-inflated negative binomial, ZINB) 模型对不同基因型的基因表达进行建模,在单细胞水平上寻找基因型和表型之间的关联。基于胶质母细胞瘤和胶质瘤 scRNA-seq 数据集,eQTLsingle 发现了数百个细胞类型特异性肿瘤相关的 eQTL,其中大多数在批量eQTL研究中无法发现。对发现的 eQTL 的详细分析揭示了重要的潜在调控机制。eQTLsingle 是一个独特而强大的工具,可以利用大量的 scRNA-seq 资源进行单细胞 eQTL 研究。

a58d508d1b2690bf0c3bb0a80a9e2385.png

eQTLsingle has two modes. Cleaned-data mode: take input as SNV matrix and expression matrix from single cells. If you have had SNV and expression information of the same cell before, you can use this mode to discover eQTLs Raw-data mode: take input as raw scRNA-seq data (fastq data). eQTLsingle can generate both SNV matrix and gene matrix, and discovers eQTLs based them

软件包安装

if(!require(devtools)) 
  install.packages("devtools")

devtools::install_github("horsedayday/eQTLsingle", build_vignettes = TRUE)

Cleaned-data mode

eQTLsingle接受两个输入:snvMatrix和expressionMatrix。

snvMatrix:一个数据框,描述 SNV 的基因型(以细胞为单位的基因座),该数据框的行名是SNV id,该数据框的别名是CellId。(1) -1表示数据缺失(没有覆盖足够的读取);(2) 0表示该位点基因型为REF(非突变);(3) 1表示该基因座的基因型为ALT(突变);

expressionMatrix:一个数据框,描述基因表达(细胞的基因表达),该数据框的行名为Geneid,该数据框的别名为CellId。

Step1: Build SNV-gene pairs

只对有效的突变和基因进行eQTL分析。因此,需要首先选择高可靠性的SNV基因对(SNV和基因都覆盖足够的细胞)。

# Load eQTLsingle and test data (SNV matrix and expression matrix)
library(eQTLsingle)
library(knitr)
## Warning: 程辑包'knitr'是用R版本4.2.3 来建造的
data("toy_snvMatrix")
kable(toy_snvMatrix[1:5,])

cell_1cell_2cell_3cell_4cell_5cell_6cell_7cell_8cell_9cell_10cell_11cell_12cell_13cell_14cell_15cell_16cell_17cell_18cell_19cell_20cell_21cell_22cell_23cell_24cell_25cell_26cell_27cell_28cell_29cell_30cell_31cell_32cell_33cell_34cell_35cell_36cell_37cell_38cell_39cell_40cell_41cell_42cell_43cell_44cell_45cell_46cell_47cell_48cell_49cell_50cell_51cell_52cell_53cell_54cell_55cell_56cell_57cell_58cell_59cell_60cell_61cell_62cell_63cell_64cell_65cell_66cell_67cell_68cell_69cell_70cell_71cell_72cell_73cell_74cell_75cell_76cell_77cell_78cell_79cell_80cell_81cell_82cell_83cell_84cell_85cell_86cell_87cell_88cell_89cell_90cell_91cell_92cell_93cell_94cell_95cell_96cell_97cell_98cell_99cell_100cell_101cell_102cell_103cell_104cell_105cell_106cell_107cell_108cell_109cell_110cell_111cell_112cell_113cell_114cell_115cell_116cell_117cell_118cell_119cell_120cell_121cell_122cell_123cell_124cell_125cell_126cell_127cell_128cell_129cell_130cell_131cell_132cell_133cell_134cell_135cell_136cell_137cell_138cell_139cell_140cell_141cell_142cell_143cell_144cell_145cell_146cell_147cell_148cell_149cell_150cell_151cell_152cell_153cell_154cell_155cell_156cell_157cell_158cell_159cell_160cell_161cell_162cell_163cell_164cell_165cell_166cell_167cell_168cell_169cell_170cell_171cell_172cell_173cell_174cell_175cell_176cell_177cell_178cell_179cell_180cell_181cell_182cell_183cell_184cell_185cell_186cell_187cell_188cell_189cell_190cell_191cell_192cell_193cell_194cell_195cell_196cell_197cell_198cell_199cell_200
SNV_101011-11001011101-100111-1011110110111011111011100101-101011010111-101111110-10-1101111110100-100-1111000110011001-111010000100-1000011-1010101011001010011011-11100000000001-1010011110110-100100000010111111111110011
SNV_21-1-1-10-11-11-11-1-1-110-1-1-1-1-1-10-1-111-10-10-10-1-1-110-1-1-100-11-1-11-1-1-1-1010-1-1101-1-1-11-1-1-1-1-111-10-1-10-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-11-1-1-1-1-1-1-1-11-1-1-1-1-1-10-10-10-1-1-1-10-110-1-1-1-10-1-1-1-1-1-1-10-1-10-10-1-1-10-1-1-1-1-1-1-110-11-1-1-1-1-1-11-1-11-1-1-1-1-1-1-1-1-1-1-1-1-10-110-10-1-111-1-10-10001-1
SNV_3110000101111-111-101001100111000100-1001-1001000010001111-11-1-10110000110100-1100-10-101-111101100010-1000011001111000-100111101001011011101111101-100-1010-1010100-10-1100111001100010101-1-101011011110100111001100111001
SNV_401-11-11-100100-1-1-1-10-10-1-1-1-11-1-1-1-1-1-1-1-11-1-10-1-11-1-1-11-1-1-1-101-1-1-1-10-1-1-1-1-100-1-1-10-1-111-1-1-1-1-10-1-1-11-11-1-100-11-1-1-1100-1-1-10-1-1-1-1-110-1-1-1-1-1-1-11-1-11-10-11-1-11-1-1-1-1-1-1-1-1-1-111-1-11-1-1-1-11-110-1-1-1-1-11-1-1-11-1111-1-10-11-1-1-1-1-1-1-1-1-101-101-11-1-1-10-1-1-1-100-11-1-1-1-10-11-1
SNV_50111000111-1110001-1101111110001-1010001111101010101101100-11101100-11111101110-11010010111111-100011-1011100010101100100111011110110-100-11110110-1-1011-111-10111111100-10010110001101110111-10010100010-10110001-1010-10
data("toy_expressionMatrix")
kable(toy_expressionMatrix[1:5,])

cell_1cell_2cell_3cell_4cell_5cell_6cell_7cell_8cell_9cell_10cell_11cell_12cell_13cell_14cell_15cell_16cell_17cell_18cell_19cell_20cell_21cell_22cell_23cell_24cell_25cell_26cell_27cell_28cell_29cell_30cell_31cell_32cell_33cell_34cell_35cell_36cell_37cell_38cell_39cell_40cell_41cell_42cell_43cell_44cell_45cell_46cell_47cell_48cell_49cell_50cell_51cell_52cell_53cell_54cell_55cell_56cell_57cell_58cell_59cell_60cell_61cell_62cell_63cell_64cell_65cell_66cell_67cell_68cell_69cell_70cell_71cell_72cell_73cell_74cell_75cell_76cell_77cell_78cell_79cell_80cell_81cell_82cell_83cell_84cell_85cell_86cell_87cell_88cell_89cell_90cell_91cell_92cell_93cell_94cell_95cell_96cell_97cell_98cell_99cell_100cell_101cell_102cell_103cell_104cell_105cell_106cell_107cell_108cell_109cell_110cell_111cell_112cell_113cell_114cell_115cell_116cell_117cell_118cell_119cell_120cell_121cell_122cell_123cell_124cell_125cell_126cell_127cell_128cell_129cell_130cell_131cell_132cell_133cell_134cell_135cell_136cell_137cell_138cell_139cell_140cell_141cell_142cell_143cell_144cell_145cell_146cell_147cell_148cell_149cell_150cell_151cell_152cell_153cell_154cell_155cell_156cell_157cell_158cell_159cell_160cell_161cell_162cell_163cell_164cell_165cell_166cell_167cell_168cell_169cell_170cell_171cell_172cell_173cell_174cell_175cell_176cell_177cell_178cell_179cell_180cell_181cell_182cell_183cell_184cell_185cell_186cell_187cell_188cell_189cell_190cell_191cell_192cell_193cell_194cell_195cell_196cell_197cell_198cell_199cell_200
gene_100660070000555900000005800000000320003135000570000520410000006500003700646008500550050510000000004704253437449690044041000000000000000000006505200585300049041080000048482700506341000060460556300000000000000053000714303900000005400455200000650000004900000
gene_2082000012708983103008684092770661100000065000000000000000000000097000000114870044000000008307600700083097760000000123000074055008000000111000077000008173008200092000710000790078007600075890000000008506466860900720000090010008906707700001187782009475000820000069000
gene_3082960095007777730007600610000925900850005064000820000000840688070066081094850006806194000085000620797100008900660000787300075000831070901180748779073000000008682000920000088000000750680700067907600890007894099000700008400091010900000008275978608900008287000790085062000000
gene_40850880000008900080012900012900850004713701550009063094000944604500630610012695900566806501210000000000087000000002559620000048428959660000006408314840970000091604800000000077000034000043000155001202200000011100006861877175751040001440006771104045000011405701209101016900000067000070840
gene_5420006900000049000063095000071670005000000560069000460000000000000000735404300690000006100590000004500640069680540066580550000006600000052000004600000615654043000057630847400465244054000000000700420067000546500000007500005800053670000007100000000640430
# select valid SNV-gene pairs
# each genotype at least covers 30 cells
# each gene (expression level > 1) at least covers 30 cells
snv.gene.pair.metadata <- eQTLsingle_build_metadata(snvMatrix = toy_snvMatrix,
                                                    expressionMatrix = toy_expressionMatrix, 
                                                    snv.number.of.cells = 30, 
                                                    expression.min = 1, 
                                                    expression.number.of.cells = 30)

Format of snv.gene.pair.metadata,

SNVid: Id of SNV, represented as CHR__POS;

Num_cells_ref: Num_cells_ref: number of cells in REF group;

Num_cells_alt: Num_cells_alt: number of cells in ALT group;

Ref_cells: cell list of REF group;

Alt_cells: cell list of ALT group;

GeneList: gene list, these genes will be tested with the SNV in this row;

Num_gene: number of genes in gene list;

CellList: whole cell list of REF and ALT group.

实例操作

Step 2: Conduct eQTL analysis on valid SNV-gene pairs.

# conduct eQTL analysis on valid SNV-gene pairs
eQTL.result <- eQTLsingle(toy_expressionMatrix,
                          snv.gene.pair.metadata)
# Suppose we are interested at the pair between 
# SNV with snvid "SNV_3" 
# gene with geneid "gene_16"

Format of eQTL.result,

SNVid: Id of SNV, represented as CHR__POS;

Geneid: gene id of target gene;

sample_size_Ref, sample_size_Alt: number of cells in REF and ALT group;

theta_Ref, theta_Alt, mu_Ref, mu_Alt, size_Ref, size_Alt, prob_Ref, prob_Alt: estimated parameters of ZINB for REF group and ALT group;

total_mean_Ref, total_mean_Alt: mean of the REF group and ALT group;

foldChange: fold change of mean of REF group (total_mean_Ref) with respect to mean of ALT group (total_mean_Alt) chi2LR1: chi square statistic for the test;

pvalue, adjusted_pvalue: pvalue and adjusted pvalue of the test. If adjusted p-value is smaller than some threshold, this SNV shows significant eQTL effect on the target gene;

Remark: to record abnormity

Step 3: Visualize eQTL effect

# Suppose we are interested at the pair between 
# SNV with snvid "SNV_3" 
# gene with geneid "gene_16"


# select out a SNV-gene pair
eQTL.result.target <- eQTL.result[(eQTL.result$SNVid == "SNV_3") &
                                    (eQTL.result$Geneid == "gene_16"),]

kable(eQTL.result.target[1:5,])

SNVidGeneidsample_size_Refsample_size_Alttheta_Reftheta_Altmu_Refmu_Altsize_Refsize_Altprob_Refprob_Alttotal_mean_Reftotal_mean_AltfoldChangechi2LR1pvalueadjusted_pvalueRemark
18SNV_3gene_1691890.0329670.044943859.94318291.11779.24793312.171950.13365780.040133157.96703278.03370.2084892337.070500NA
NANANANANANANANANANANANANANANANANANANANA
NA.1NANANANANANANANANANANANANANANANANANANA
NA.2NANANANANANANANANANANANANANANANANANANA
NA.3NANANANANANANANANANANANANANANANANANANA
# to draw violin plots of gene expression in both REF group and ALT group
eQTLsingle_visualization(eQTL.result.target,toy_expressionMatrix, snv.gene.pair.metadata)

ba4122bf0b4f4c80534455453adec635.png

Raw-data mode

这个可以参考网站,这时就需要有一定 shell 基础的同学,并且对一些生物学软件很熟练,但是还算好做,思路很清晰。

048663465d5999416a9940d7f8ba60db.png

涉及到的软件:

  • samtools

  • featureCounts

  • GATK

  • picard

  • java

  • bam-readcount

  • Python3

  • Pandas

如果您只有FASTQ数据,并且希望使用我们的脚本生成表达式矩阵和SNV矩阵,则可以运行存储在。/inst/中的shell脚本。可以直接从R环境中执行这些脚本,但我们建议您在shell中使用,主要为两部,后面的就可以直接读入软件包 eQTLsingle 中操作了。

  1. 预处理:从FASTQ数据中生成突变谱和表达谱;

  2. SNV质量控制:对选中的SNV进行读取计数,确认有效的突变,以便进行进一步的eQTL分析。

后续分析我们可以结合一些临床预测的方法,比如文章中的后续分析 Discoveries on the rs3169950-PSMB6 cis-eQTL pair in gliomaspheres。

248b332c3bb3f57162cdabe2b4003f95.png

(a)胶质球中REF/ALT型细胞rs3169950位点PSMB6的表达水平 (bonferroni校正p值= 2.93 × 10−12)。

(b) PSMB6基因表达与 基于TCGA和GTEx数据的脑肿瘤(GBM和LGG)和正常脑样本。红色星号表示单因素方差分析检验p值≤0.05。

(c)按PSMB6基因表达水平分层的TCGA GBM和LGG患者Kaplan-Meier生存图患者分为低表达组和 高表达组以PSMB6表达水平中位数作为截止值。

(d) GTEx脑数据中YY1与PSMB6基因表达的相关性。Pearson相关系数为0.57 (p值< 0.001)。

(e) PSMB6上转录因子YY1的ChIP-seq信号和rs3169950调控PSMB6的假设模型 表达式。

Reference
  1. Ma T, Li H, Zhang X. Discovering single-cell eQTLs from scRNA-seq data only. Gene. 2022;829:146520. doi:10.1016/j.gene.2022.146520

利用 Cleaned-data mode 还是很快分析出来,但是从 Raw-data mode 开始做还是要求有一定生信分析基础的,有需求的老师可以联系桓峰基因,关注桓峰基因公众号,轻松学生信,高效发文章!

桓峰基因,铸造成功的您!

未来桓峰基因公众号将不间断的推出单细胞系列生信分析教程,

敬请期待!!

桓峰基因官网正式上线,请大家多多关注,还有很多不足之处,大家多多指正!

http://www.kyohogene.com/

桓峰基因和投必得合作,文章润色优惠85折,需要文章润色的老师可以直接到网站输入领取桓峰基因专属优惠券码:KYOHOGENE,然后上传,付款时选择桓峰基因优惠券即可享受85折优惠哦!https://www.topeditsci.com/

eae2a72591314c612eee5e12e800d1b8.png

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值