unsupervised embedding of single-cell Hi-C data
Abstract
motivataion
- significance of single cell Hi-C
- problem in data process of Hi-C data: [However, Hi-C data analysis requires methods that take into account the unique characteristics of this type of data]
- conclude the work of the paper: [In this work, we explore whether methods that have been developed previously for the analysis of bulk Hi-C data can be applied to scHi-C data. We apply methods designed for analysis of bulk Hi-C data to scHi-C data in conjunction with unsupervised embedding]
result
- 得出结论HiCRep+MDS组合 outperform others
- evidence: robust to extremely low per-cell sequencing depth;that this robustness is improved even further when high-coverage and low-coverage cells are projected together, and that the method can be used to jointly embed cells from multiple published datasets.
Introduction
- 从单细胞水平上的High-throughput DNA sequencing technology发展的角度写,点明这些技术provide scientists with the opportunity to understand many aspects of fundamental functional processes in the cell, including gene regulation and DNA replication;最后指明一个fail to accurately capture the complexities of these types of data的分析方法是不行的。
- 细化到单细胞引进的cell-to-cell variability分析问题,进一步点明FACS方法在state sorting上的局限性,需要methods for unsupervised analysis。
- 列举single cell RNA-seq上的一些细胞差异性分析方法及其technique。
- In this work,focus on single cell Hi-C data. The reasons: 1. sig cell Hi-C 的数据稀疏性;2. 该类数据的研究价值。
- 介绍了单细胞Hi-C数据中研究异质性的一个特征量CDP的概念,及应用在哪些工作中。
- 点名CDP这种特征量的局限性,忽略了了genome的结构化信息,不利于研究TAD,loop这些。
- 引出bulk Hi-C中重复性分析的几种方法。
- 提出paper的结论HiCRep+MDS结果较好,且robust to the number of contacts required per cell
Materials and Methods
数据集的基本信息
Similarity and distance measures for scHi-C contact maps
- bulk Hi-C上提出的几种评估重复性方法的具体公式及similarity和distance的转换公式
- cell orde具体角度计算
- 排序结果的评估,提出了circular ROC指标。
Results
- CROC的计算
- 和 Nagano方法比较结果;并提出combined方法和结果验证;
- few contacts per cell 鲁棒性实验;√
- 从提高low coverage cell 排序准确性的角度提出 Joint projection with high-coverage cells improves phasing of low-coverage cells
- Cell-cycle phased scHi-C data is indicative of
replication timing 有点儿生物学× - Nagano和Flyamer联合分析。从cell cycle角度出发,只分析了G1 phase,侧面证明了所述方法的有效性,并针对Flyamer结果分布太离散给出解释:coverage或其他原因。