免疫组库vdj的数据处理(TCR/BCR)


前言

记录一下首次接触免疫组库数据的笔记


一、免疫组库

免疫组库主要看可变区的CDR3序列

其他基础知识见:
10× Genomics单细胞免疫组库VDJ分析必知必会
单细胞免疫组库:TCR基因重排原理和TCR测序建库方法

We need a minimum of 30x coverage in order to confidently identify unique VDJ sequences as truly unique. In addition, we need to ensure that this coverage amount can be met for all samples, even though we have large variation in the concentration of T-cells within these samples. We assume that a highly diverse sample will have a correspondingly lower average read depth, but need to determine the correlation between initial T-cell concentration and read depth.
ref:T-Cell Concentration and Coverage Depth #10


二、免疫组库数据处理

1、组装得到CDR3序列

1.1、scTCR/BCR

单细胞免疫组测序比较常见的是10X Genomics平台,其他的还有Smart-Seq2等

1.1.1 10X Genomics平台

处理软件:cellRanger

cellranger vdj --id=sample_name \
                 --reference=/opt/refdata-cellranger-vdj-GRCh38-alts-ensembl-3.1.0 \
                 --fastqs=/fastq_path \
                 --sample=sample_name \
                 --localcores=8 \
                 --localmem=64 

结果中会有一个检测报告,进行质控

1.1.2 其他单细胞测序平台

平台信息见:单细胞TCR-Seq技术——更高效的TCR a/b 链配对分析
处理软件:
TraCeR – reconstruction of T cell receptor sequences from single-cell RNA-seq data.
scTCR-seq – an implementation of a pipeline for Single-cell RNAseq package for recovering TCR data in python
TRAPeS – TCR Reconstruction Algorithm for Paired-End Single-cell
VDJPuzzle2 – TCR and BCR reconstruction from scRNA-seq data
Mixcr – a universal software for fast and accurate analysis of raw T- or B- cell receptor repertoire sequencing data
Immunarch – Fast and Seamless Exploration of Single-cell and Bulk T-cell/Antibody Immune Repertoires in R

1.2、bulk TCR/BCR

1.2.1 处理流程

bulk TCR/BCR的原始数据处理和RNA-seq的原始数据处理步骤一样,此处不再详细记录
1、QC
2、trimmomatic后fastqc
3、之后可用mixcr、Immunarch等得到CDR3序列
详见:Bulk VDJ测序数据处理基本方法
全网第一篇免疫组库分析教程[MiXCR+VDJtools+Python+R]

1.2.2 注意事项

fastqc后,TCR/BCR的结果和RNA-seq的有点不一样,此处记录一下
1、Per base sequence content的TCGA配对碱基含量不一致和overrepresented sequence序列过多
这是由于免疫组库是mPCR的文库不是均一的,所以TCGA含量不一致,是正常现象
在这里插入图片描述
在这里插入图片描述
2、Per base N content含量过多
未知的碱基被标记为N,需要用trimmomatic去除N后符合fastqc的标准再进行后续分析
注:剪切过滤后会出现很多短序列
在这里插入图片描述

1.2.3 mixcr代码
mixcr analyze amplicon --species hs \
        --adapters no-adapters \
        --starting-material dna \
        --5-end v-primers \
        --3-end j-primers \ 
        --receptor-type tcr \
        --productive \
/home/zy/TCR/example_A_1_val_1.fq.gz /home/zy/TCR/example_A_2_val_2.fq.gz analysis
#有些用的J-primers有些用的c-primers,不知如何选择,试了之后发现两者跑出来的结果一致
# 问过师兄后说是直接不写5-end和3-end参数也可以

2、筛选functional的CDR3序列

通常使用的过滤条件:
1、序列为productive
2、核酸序列为3的倍数,氨基酸序列大于4

其他的过滤条件
例如:expression abundance(TPM):
已看到几篇文章在使用,alpha chain TPM < 10 or beta chain TPM < 15, 但不确定是公认标准还是个人经验设置

结果部分:In total, we detected full TCR sequences for 94% (3,792/4,032) T cells, with at least one paired productive TCR a-b chain for subsequent analyses (Table S5). While most cells expressed unique TCR a and b alleles, nonunique a and/or b could be detected in a fraction of T cells. After
eliminating non-productive alleles (e.g., out-of-frame transcripts) or low-abundance TCRs (Figure S6A), we found that 84% (3,174/3,792) contained unique and productive a chains and 94% (3,559/3,792) unique and productive b chains (Figure S6B), in agreement with previous reports

方法部分:
TCR analysis
The TCR sequences for each single T cell were assembled by the TraCeR method from single cell RNA-Seq data, leading to the identification of the CDR3 sequence, the rearranged TCR genes, and their expression abundance (transcripts per million, TPM). First, we discard those cells with no obvious TCR forms. Then we arrange TCR alpha and beta chain respectively with the following steps. The first TCR alpha (beta) chain was defined as follows: 1) keep all single T cells in which only one productive TCR alpha and beta chain was present. 2) if more than one TCR alpha or beta chain were identified in one T cell, we kept only the cells in which a dominant form of alpha and beta was detected. Often, one alpha/beta chain was productive and the other chain was non-productive, or the expression level of one was far higher than the alternative allele, and the productive or dominant form was identified. Next, we filtered out the second TCR alpha chains with TPM less than 10 and beta chains with TPM less than 15 to eliminate the biological and bioinformatics error based on the histogram analysis for the expression distribution (Figure S6A). From a total 4032 cells with successfully assembled TCR sequences, we identified the TCR alpha/beta pairs for 3792 cells.

cite: Zheng C, Zheng L, Yoo JK, et al. Landscape of Infiltrating T Cells in Liver Cancer Revealed by Single-Cell Sequencing. Cell. 2017;169(7):1342-1356.e16. doi:10.1016/j.cell.2017.05.035

TCR analysis
To reduce false positive assembly, we filtered out TCR assemblies with alpha chain TPM < 10 or beta chain TPM < 15…Only productive (that is, in frame) TCR alpha–beta pairs were considered to define the dominant TCR of a single cell.

cite: Guo, Xinyi et al. “Global characterization of T cells in non-small-cell lung cancer by single-cell sequencing.” Nature medicine vol. 24,7 (2018): 978-985. doi:10.1038/s41591-018-0045-3


总结

本文仅仅简单记录了一下首次处理免疫组库数据的一些收获,还待后续补充。
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值