二、submit assembly to NCBI
1、prepare data
首先要具有fasta格式数据(NO .gz),这是处理的基础,具体格式如下:
>Scaffold633
TCATTTCTCCACTCTCGATGAACAAATCTGGAGGGATTTTTTTTCATTCC
ACTCAATAGGTTGTCTATAAAGGTGTGATTCGTGGAACTTCTTCACACAG
CAGCTAGTCTATATAATACAGAAGATCG
>Scaffold553
AAAAAATTTTTTTTTTAAACTATCATCTCATGGATCAGCAGCAATTCTGA
GTGTAACGTCTTCATTAAATGCGTATATAAATTTGCATAAAGATATGCGA
CCAATATTGAGCCTGGAAATATATGCGCAGAGTGCAAAATTGTGTTTTTT
GATCGGTTAATTAAAGG
>Scaffold641
GTTTCCCAGTAGGTCTCTCCCGCTACGGCGTCCGCACGAACGCGATCTGC
CCTCGTGCCCGCACCGCCATGACGGCAGAAGCCTTCGGCGAGAACAACAC
CGGCGTCGTCGGCCTCGATCCGCTTGCACCCGAGCGCGTCGCGACCCTGG
TCAGCTACCTCGCATCCCCCGATTCCGACGAGATCAACGGACAGGTCTTC
GTCGTCTACGGCAAGATGGTGGCGTTGATGGAAGCACCCAAGGTCGAGAA
CCGTTTCGACGCAGCCGGATCCGCGTTCACCGTCGAAGAACTCGGTGGCC
AGCTCTCGTCTTACTTCTCCGGCCGTGGGCCGTACGAGACCTACTGGGAA
AC
2、处理数据
分为几步:
(1)生成.greater, short.list和ZERO_BASE_COUNT文件
perl ../ scaf_filter_2k.pl Ascaris_suum.scaf.fa
scaf_filter_2k.pl代码
#!/usr/bin/perl
use strict;
use warnings;
my $file=shift;
#my $cutoff=shift;
my $outfile="short.list";
my $outfile2="$file.greater";
my $outfile3="ZERO_BASE_COUNT";
open IN,"< $file" or die $!;
open OUT,"> $outfile" or die $!;
open OUT1,"> $outfile2" or die $!;
open OUT2, "> $outfile3" or die $!;
$