10X单细胞数据分析转录因子的前世今生---scenic

本文介绍了scenic算法,一种用于从单细胞RNA-seq数据中同时重建基因调控网络和识别细胞状态的方法。通过实例展示,scenic利用cis-regulatory分析指导转录因子和细胞状态的识别,揭示了细胞异质性的生物学机制。文章详细解释了scenic的工作原理,包括GENIE3的使用、cis-regulatorymotif分析以及AUCell算法的应用。
摘要由CSDN通过智能技术生成

相信现在很多童鞋都已经分析过了10X单细胞数据了,scenic分析目前也是常见的一种个性化分析了,很多同学都分析过, 今天我们温故而知新,深入理解scenic的分析原理和方法,从根本上理解scenic分析得到的结果。

scenic发表于2017年10月,期刊是nature methods,很早了,印象里2017年10X单细胞都才刚兴起,文章在SCENIC:single-cell regulatory network inference and clustering,对于这个软件,我们现在就来参透一下吧

一、abstract

We present SCENIC , a computational method for simultaneous gene regulatory network reconstruction and cell-state identification from single-cell RNA -seq data (http://scenic. aertslab.org)(同时重建基因调控网络和细胞状态的识别). On a compendium of single-cell data from tumors and brain, we demonstrate that cis-regulatory analysis can be exploited to (被利用,被用来)guide the identification of transcription factors and cell states(细胞状态和转录因子的识别). SCENIC provides critical biological insights into the mechanisms driving cellular heterogeneity.
这里我们需要知道以下问题:这里的cell states是指什么,转录因子又是如何识别的?带着问题,我们来往下看看。

二、简介

The transcriptional state of a cell emerges from an underlying gene regulatory network (GRN) in which a limited number of transcription factors (TFs) and cofactors regulate each other and their downstream target genes(细胞的转录状态收到GRN的调控,而GRN是由有限的factor和cofactor相互调控并且影响下游的靶基因)。这个地方说白了就是调控元件的活性决定细胞转录状态。Recent advances in single-cell transcriptome profiling have provided exciting opportunities for high-resolution identification of transcriptional states and of transitions between states,后面举了例子,比如细胞分化。Statistical techniques and bioinformatics methods that are optimized for single-cell RNA-seq have led to new biological insights,but it is still unclear whether specific and robust GRNs underlying stable cell states can be determined(缺点就是稳定细胞状态的调控网络基础仍然不清楚)。This may indeed be challenging given that at the single-cell level, gene expression may be partially disconnected from the dynamics of TF inputs on account of stochastic variation of gene expression from transcriptional bursting and other sources。(鉴于在单细胞水平上,由于转录bursting和其他来源的基因表达的随机变化,基因表达可能与TF输入的动力学部分脱节,这确实是一个挑战。 A few methods have been developed that infer coexpression networks from single-cell RNAseq data, but these methods do not use regulatory sequence analysis to predict interactions between TFs and target genes。(说白了就是预测TFs与基因的关系)。
We reasoned that linking cis-regulatory sequences to single-cell
gene expression could overcome dropouts and technical variation and thus optimize the discovery and characterization of cell states(这一点从目前来看,这个软件想多了)。To this end, we developed single-cell regulatory network inference and clustering (SCENIC) to map GRNs and then identify stable cell states by evaluating the activity of the GRNs in each cell. The SCENIC workflow consists of three steps。

步骤

In the first step, sets of genes that are coexpressed with TFs are identified using GENIE3
(利用GENIE3来识别与TFs共表达的基因),Since the GENIE3 modules are only based on coexpression, they may include many false positives and indirect targets.(纯基于共表达,包含了很多假阳性),这个地方容易理解。但是需要注意以下,TFs这是已知的,在我的认知里面TFs作用与什么基因好像都是可以推导出来的,有一些疑问,我们往后分析看看。


Second,To identify putative direct-binding targets, each coexpression module is subjected to cis-regulatory motif analysis using RcisTarget,Only modules with significant motif enrichment of the correct upstream regulator are retained, and they are pruned to remove indirect targets lacking motif support,We refer to these processed modules as regulons(调节子)。
Third,As part of SCENIC, we developed the AUCell algorithm to score the activity of each regulon in each cell。(AUCell分析我之前分享过,文章在深入理解R包AUcell对于分析单细胞的作用),For a given regulon, comparing
AUCell scores across cells makes it possible to identify which cells have significantly higher subnetwork activity.依据AUCell得到的分数矩阵可以进行下游分析。

接下俩就是一些案例,我们了解一下即可。
首先小鼠脑:

This analysis provided 151 regulons—out of 1,046 initial coexpression modules—with significantly enriched motifs for the corresponding TFs (7% of the initial TFs).Scoring regulon activity for each cell revealed the expected cell types alongside a list of potential master regulators for each cell type。
这与一些其他的案例,就不过多展示了,相对好用一些吧。

三、methods,这里我们关注一些需要注意的地方

关于GENIE3:it trains random forest models predicting the expression of each gene in the data set and uses as input the expression of the TFs(看来输入的并不是原始的矩阵,而实经过训练后的数据)。The different models are then used to derive weights for the TFs, measuring their respective relevance for the prediction of the expression of each target gene(真的是只基于共表达)。The highest weights can be translated into TF-target regulatory links。Since GENIE3 uses random-forest regression, it has the added value of allowing complex coexpression relationships between a TF and its candidate targets(这是其缺点)。
但是这里输入和均一化的方法好像不是直接为10X单细胞所设计的,需要我们关注一下。

至于代码有R版本,在这里R版本scenic
个人建议使用python版本,在这里python版本pyscenic

个人的观点

首先TFs是已知的,而TF作用的motif也是已知的,而motif对应的靶基因也是已知的,为什么不能正推高表达的TFs对应的基因呢?而选择反向推理??不知道有没有道友可以回答这个问题。

生活很好,有你更好

  • 22
    点赞
  • 28
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
好的,以下是一个使用Python编写的爬虫代码,可以爬取该网页的相关数据: ```python import requests from bs4 import BeautifulSoup # 设置请求头信息 headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3' } # 定义爬取网页数据的函数 def get_data(): # 指定要爬取的网页链接 url = 'https://www.mafengwo.cn/travel-scenic-spot/mafengwo/84711.html' # 发送HTTP请求 response = requests.get(url, headers=headers) # 解析HTML文档 soup = BeautifulSoup(response.text, 'html.parser') # 获取景点名称 spot_name = soup.find('h1', class_='t-title').text # 获取景点评分 score = soup.find('span', class_='score').text # 获取景点地址 spot_address = soup.find('span', class_='item-address').text.strip() # 获取景点介绍 spot_intro = soup.find('div', class_='summary').text.strip() # 获取景点图片链接 image_url = soup.find('img', class_='img-responsive')['src'] # 输出景点信息 print('景点名称:', spot_name) print('评分:', score) print('地址:', spot_address) print('介绍:', spot_intro) print('图片链接:', image_url) # 主函数 if __name__ == '__main__': # 调用爬取网页数据的函数 get_data() ``` 在代码中,我们首先使用`requests`库发送HTTP请求,然后使用`BeautifulSoup`库解析HTML文档。通过分析网页的HTML结构,我们可以使用`find`方法获取到需要的景点名称、评分、地址、介绍和图片链接等数据。最后,我们输出这些数据到控制台中。当然,您可以将这些数据存储到文件或数据库中,以便后续分析。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值