Filter FASTA files

Use a regular expression for filtering sequences by id from a FASTA file, e.g. just certain chromosomes from a genome. There are other tools as part of bigger packages to install (and no regex support), mostly awk-based awkward (sorry for the pun) bash solutions, and scripts using packages that one needs to install and with still no support for regular expressions. This however is a simple, straightforward little python script for a simple task. It doesn’t do anything else and doesn’t need anything but a stock python installation. Based on the FASTA reader snippet

Download here. 

Usage:

python FASTAfilter.py [-h] regex infile outfile

From a FASTA-file with multiple >entries, filter by sequence ids using a
regex.

positional arguments:
regex Regex to filter entry ids, e.g. ‘chr[1-4]’. Note that the id does not contain the initial > character.
infile A FASTA input file, usually with multiple entries.
outfile The new file with only the matching entries.

optional arguments:
-h, –help show this help message and exit

 

INSTALL:

cd /data/software
wget http://dm516.user.srcf.net/fastafilter/FASTAfilter.zip
unzip FASTAfilter.zip
easy_install argparse

 

USAGE:

python FASTAfilter.py   [1-9,10,11,12,13,14,15,16,17,18,X]  \
/dat2/INPUT.fa \
/dat2/OUTPUT.fa

 

Error:

Traceback (most recent call last):
  File "FASTAfilter.py", line 3, in <module>
    import argparse
ImportError: No module named argparse


Solution:

run "easy_install argparse" as root user.

 

http://dm516.user.srcf.net/?p=314

data_dir='/public/work/Personal/wuxu/qiantao_17' for file1 in ${data_dir}/*.fasta; do for file2 in ${data_dir}/*.fasta; do if [ "$file1" != "$file2" ]; then touch snp_indel.end.sh && cat snp_indel.end.sh && \ export PATH=/public/work/Personal/pangshuai/software/conda/miniconda3/bin/:${PATH} && \ nucmer --mum -t 8 -g 1000 -p ${file1##*/}.${file2##*/}.ref_based.nucmer $file1 $file2 && \ delta-filter -1 -l 200 ${file1##*/}.${file2##*/}.ref_based.nucmer.delta > ${file1##*/}.${file2##*/}.ref_based.nucmer.delta.filter && \ dnadiff -d ${file1##*/}.${file2##*/}.ref_based.nucmer.delta.filter -p ${file1##*/}.${file2##*/}.ref_based.nucmer && \ show-coords -rcloT ${file1##*/}.${file2##*/}.ref_based.nucmer.delta.filter > ${file1##*/}.${file2##*/}.ref_based.nucmer.delta.filter.coords && \ show-coords -THrd ${file1##*/}.${file2##*/}.ref_based.nucmer.delta.filter > ${file1##*/}.${file2##*/}.ref_based.nucmer.delta.filter.syri.coords && \ show-snps -ClrTH ${file1##*/}.${file2##*/}.ref_based.nucmer.delta.filter > ${file1##*/}.${file2##*/}.ref_based.nucmer.delta.filter.snp && \ show-diff ${file1##*/}.${file2##*/}.ref_based.nucmer.delta.filter > ${file1##*/}.${file2##*/}.ref_based.nucmer.delta.filter.inv && \ perl /public/work/Pipline/Structural_Variation/pipeline/2.1.1/bin/filter_the_MUmmer_SNP_file.pl ${file1##*/}.${file2##*/}.ref_based.nucmer.delta.filter.snp ${file1##*/}.${file2##*/}.ref_based.nucmer.delta.filter.snp.SNPs ${file1##*/}.${file2##*/}.ref_based.nucmer.delta.filter.snp.Insertions ${file1##*/}.${file2##*/}.ref_based.nucmer.delta.filter.snp.Deletions 10000000 && \ touch snp_indel.end.tmp && \ mv snp_indel.end.tmp snp_indel.end && \ sleep 10 fi done done ,增加一个判断,使/public/work/Personal/wuxu/qiantao_17路径下以.fasta结尾的文件两两一组不分前后只组合一次,然后再执行touch 后面的代码
06-03
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值