本帖最后由 hecm129 于 2014-06-05 19:03 编辑
我要找到位于染色体上某两个位点之间的全部基因序列,
loc.txt
1-1 chr1 192967435 195777777
1-2 chr3 10073404 10076403
1-3 chr1 27070214 27900131
...
data.fasta
>AC213524.3_FG003 seq=gene; coord=1:193514391..193515883:-1
ATGACTGTGTTGCATCATTCCCATTTGCCAACTGACACAAAGGTAGGTTGAGGAAGTATG
GCTAGACTATTTTGACAGCCAAGCAGGATAAGGAAGGAAAGAGGGAGAGTTTCTGCTACA
ACAGAAACATAGAAACAGCCTACTTCTGAATTCAAAAAAGAAAAACTCAACCGCACAATT
TTAATAGCTAACTATTATCTTTAAAAGTTTCAAACAGGGCTTAAGAATGAAAAAAAATAT
TATCATATAATCCATGATTGTGTATATATAAAAGTGAACGGATAATTTCTGAAGAAGTGA
TTCTGTCAATCAAGCTGATAACAGCTTTTTACTCATCCGTAGAGGCAGAGCGAAAGGGCT
GGTGGCCGCACACGAGCGTCAGCACCGCCGAGATCGCGGCCTTGTGCCGCGCGTCAGAGC
GAGTCACGCAGACGTGTAGCGGGCGGTGGTGCGCCTGCGGGCATCGCAGGTCTCTCATCG
TCTCGTTCACCTACCGCAAGCACCGACGGGGCGGGGAATCGCATGGTCGGTGTTCCACCC
ATCGCCCGGGGCGAACGGCTGGACGGGTGGGTCGGTGTTCCGGTGGCCGACCACCTGGCG
GGTCGCGCGATCGCAAGGGGCAGGCGCCTCTGTCTCGGCTCGGCCCAGTCACCATCTCTC
...
写如下程序:
#! usr/bin/perl -w
use strict;
use Bio::Perl;
open LOC,"loc.txt" or die "can't open file1";
open OUT,"> genes.obj" or die "can't open file2";
my (%genes);
my $seqin=Bio::SeqIO->new(-file=>"data.fasta",-format=>'fasta');
my $seqout=Bio::SeqIO->new(-file=>"> obj.fa",-format=>'fasta');
while(){
chomp;
my ($markers, $chr, $start, $end)=split/\s+/;
print "$markers, $chr, $start, $end\n";
while(my $seq=$seqin->next_seq){
my $id=$seq->display_id;
my $desc=$seq->desc;
if($desc=~/seq=gene; coord=(\d+):(\d+)..(\d+):.*/){
if(($chr==$1) &&( $start<=$2 ) &&($end>=$3)){
print OUT "$id\t$1\t$2\t$3\t$start\t$end\n";
# print "$1, $2, $3, $start, $end\n";
$seqout->write_seq($seq);
}
}
}
}
但运行的结果只能得到“1-1 chr1 192967435 195777777”的结果,从打印结果来看,while()能读出第二条位置及以后的位点信息,但是while(my $seq=$seqin->next_seq)循环在while()读出第二条位置时就不能再重新读入文件了 ,请假高手问题出现在哪儿了?有什么解决办法?