Cutadapt

在对下机数据进行处理的时候,原始的数据一般都带有接头,如果不除去接头序列,会对接下来的基因组装和比对产生较大的误差,所以这个时候需要对原始数据进行接头处理并过滤掉质量较差的序列,其中Cutadapt是一个比较经典的能够对双端进行接头切除的软件。

接头去除原理图:
在这里插入图片描述

下载与安装

  • pip安装
pip install --user --upgrade cutadapt

# 安装结束之后需要讲Cutadapt的可执行文件加入环境变量中
echo 'export PATH=$PATH:/your path/cutadapt-1.10/bin ' >> ~/.bashrc
  • conda安装
conda install -c bioconda cutadapt
  • 源码安装

如果以上两种方法均为安装成功,那就只有使用源码安装。下载地址:https://pypi.python.org/pypi/cutadapt/

# 下载安装包
wget -c https://files.pythonhosted.org/packages/a3/30/4a889a6916d7480c153774777e634b89865f95cb02f2c3209762c7ef984b/cutadapt-4.1.tar.gz
tar -zxvf cutadapt-4.1.tar.gz
cd cutadapt-4.1
python setup.py install --user

使用说明

version 3.4

Copyright (C) 2010-2021 Marcel Martin <marcel.martin@scilifelab.se>

cutadapt removes adapter sequences from high-throughput sequencing reads.

Usage:
    cutadapt -a ADAPTER [options] [-o output.fastq] input.fastq #单端去接头

For paired-end reads:
    cutadapt -a ADAPT1 -A ADAPT2 [options] -o out1.fastq -p out2.fastq in1.fastq in2.fastq #双端去接头

参数说明

一般参数

-j CORES, --cores CORES Number of CPU cores to use. Use 0 to auto-detect. Default: 1,使用几个CPU
-m LEN[:LEN2], --minimum-length LEN[:LEN2] Discard reads shorter than LEN. Default: 0

通用接头参数

参数-a,-g,-b会去掉每条序列中的接头序列,如果是双端序列,那么就只会出去read1中的接头。

  -a ADAPTER, --adapter ADAPTER Sequence of an adapter ligated to the 3' end (paired data: of the first read). The adapter and subsequent bases are trimmed. If a '$' character is appended ('anchoring'), the adapter is only found if it is a suffix of the read.
  # 去除3`端的接头序列
 -g ADAPTER, --front ADAPTER Sequence of an adapter ligated to the 5' end (paired data: of the first read). The adapter and any preceding bases are trimmed. Partial matches at the 5' end are allowed. If a '^' character is prepended ('anchoring'), the adapter is only found if it is a prefix of the read.
  # 去除连接到5`端的接头序列
  -b ADAPTER, --anywhere ADAPTER Sequence of an adapter that may be ligated to the 5' or 3' end (paired data: of the first read). Both types of matches as described under -a and -g are allowed. If the first base of the read is part of the match, the behavior is as with -g, otherwise as with -a. This option is mostly for rescuing failed library preparations - do not use if you know which end your adapter was ligated to!
  # 去除连接到任意一端的接头序列。
    -n COUNT, --times COUNT Remove up to COUNT adapters from each read. Default: 1 #每条序列去除接头序列的次数
    -u LENGTH, --cut LENGTH  Remove bases from each read (first read only if paired). If LENGTH is positive, remove bases from the beginning. If LENGTH is negative, remove bases from the end. Can be used twice if LENGTHs have different signs. This is applied *before* adapter trimming.
    # 每端所需要去除的序列长度大小

双端接头参数

参数-A/-G/-B/-U类似于-a/-g/-b/-u,但是会将其运用到双端序列的read2.

  -A ADAPTER            3' adapter to be removed from second read in a pair. # 双端数据中pair中3`端接头序列
  -G ADAPTER            5' adapter to be removed from second read in a pair.# 双端数据中pair中3`端接头序列
  -B ADAPTER            5'/3 adapter to be removed from second read in a pair.
  -U LENGTH             Remove LENGTH bases from second read in a pair.
  -p FILE, --paired-output FILE Write second read in a pair to FILE.

输出参数

--quiet               Print only error messages.
-o FILE, --output FILE Write trimmed reads to FILE. FASTQ or FASTA format is chosen depending on input. Summary report is sent to standard output. Use '{name}' for demultiplexing (see docs). Default: write to standard output
  --fasta               Output FASTA to standard output even on FASTQ input.
  -Z                    Use compression level 1 for gzipped output files (faster, but uses more space)
  --info-file FILE      Write information about each read and its adapter matches into FILE. See the documentation for the file format.

-a和-A是左右端测序数据的3端接头,-g和-G是左右端测序数据的5端接头。

支持fastq和fasta格式的gz压缩文件,必要时用-f参数指定测序文件数据格式即可。

Q&A:

  1. 如果安装过程中,出现报错error: command 'gcc' failed with exit status 1 , 查看报错详细信息中显示,缺少Python.h文件,那么需要安装运行python-dev。

解决:以centos为例

yum search python |grep -i devel
yum install python-devel.x86_64

参考资料:

https://blog.csdn.net/weixin_26705191/article/details/116576615

http://www.bio-info-trainee.com/1920.html

  • 0
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值