Python简单实现samtools tview可视化功能
问题分析:
字符串匹配可视化程序的设计与实现
Samtools方法:
// Using Samtools
samtools view -bS data/result1.sam > data/result.bam
samtools sort data/result.bam data/result.sort
samtools index data/result.sort.bam data/result.sort.bam.bai
samtools tview data/result.sort.bam data/ref.fa
使用Samtools效果图:
Python简单实现此功能:
一. 使用bowtie2进行预处理:
示例代码:
//产生result.sam文件
./bowtie/bowtie2-build data/ref.fa ref
./bowtie/bowtie2 -x ref -U data/reads1.fq -S data/result.sam
//产生排序号的文件,按照开始位置排序
sed -i '1,3d' data/result.sam
sort -n -k 4 -t $'\t' data/result.sam > data/result_sorted.sam
awk '$4!=0' data/result_sorted.sam > data/result_new_sorted.sam
more data/result_new_sorted.sam
二. 选择合适的结构体:
class part:
def __init__(self,id,begin,content):
self.id=id
self.begin=int(begin)
self.content=content
self.end=int(begin)+len(content)-1
三. 读取文件,存入结构体数组:
array=[]
file = open("data/result_new_sorted.sam")
for line in file:
line=line.split()
a=part(line[0],line[3],line[9])
array.append(a)
file.close()
四. 按照互补,补充原则,补充一行字符串后,写入文件:
str=[]
num_array=len(array)
fo=open("data/temp.sam","w+")
while(num_array>0):
i=0
end_flags=0
str_t=''
while(i<num_array):
if (array[i].begin > end_flags):
str_t += ' ' * (array[i].begin - end_flags - 1)
str_t= str_t + ''.join(array[i].content)
end_flags = array[i].end
array.remove(array[i])
num_array = num_array - 1
i=i-1
num_array=len(array)
i=i+1
fo.write(str_t+"\n")
str.append(str_t)
num_array=len(array)
fo.close()
这里补充一下:每一行字符串所编写的原则,假设reads1的起始位置是1,长度包含5个字符,即其结束位置为:begin+content-1,也就是1+5-1=5,然后移除结构体数组中这个元素,继续遍历,寻找第一个起始位置大于结束位置的reads,重复以上步骤,并将完成的每一行写入文件中,便于后续读取。
五. 显示动态刷新输出:
我的电脑系统为Ubuntu,于是采用了evdev第三方库来获取键盘按钮
按right键向右刷新60位
按right键向左刷新60位
按空格键可以跳转到指定范围
按ctrl可切换显示模式:基因<------>匹配情况,完全匹配表现为,
import keyboard
from evdev import InputDevice
from select import select
dev = InputDevice('/dev/input/event3')
#这里的路径地址,对于每一台Linux系统的电脑存在差异,具体可以看其他相关帖子的内容
state_r=0
sum=""
os.system('clear')
str_array=[]
fo=open("data/temp.sam","r+")
for str_line in fo:
str_array.append(str_line)
print len(str_array)
fo.close()
with open ('data/ref.fa') as file:
next(file)
for line in file:
sum=sum+line.replace('\n','')
small=0
max=60
line="<".rjust(61)
line1=list(line)
print len(line1)
while True:
line = "<".rjust(61)
if (small<0):
small = 0
max = 60
line1 = list(line)
for i in range(small+1,max+1):
if (i%10==1):
local=i
for j in range(1,len(str(i))+1):
c=str(i)[j-1]
line1[(local-1)%60]=c
local=local+1
line=''.join(line1)
print line
str_0=sum
str_sum=[]
print(sum[int(small):int(int(max))])
if (state_r==0):
for index in range(0,len(str_array)):
print (str_array[index])[small:max]
elif state_r==1:
for index_2 in range(0, len(str_array)):
str_temp = ""
for index_3 in range(small, max):
if (str_array[index_2][index_3] == sum[index_3]):
str_temp += ","
else:
str_temp += str_array[index_2][index_3]
print (str_temp)
str_sum.append(str_temp)
select([dev], [], [])
for event in dev.read():
if event.value == 0 and event.code == 106:
print("right")
os.system('clear')
small = small + 60
max = max + 60
break
if event.value == 0 and event.code == 105:
print("left")
os.system('clear')
small = small - 60
max = max - 60
break
if event.value == 0 and event.code == 57:
os.system('clear')
print("goto:")
user_index=raw_input()
user_index=user_index.split(" ")[-1]
ux=int(user_index)/60
small = ux*60
max = (ux+1)*60
break
if event.value == 0 and event.code == 16:
exit(0)
break
if event.value == 0 and event.code == 29:
if state_r==0:
state_r=1
elif state_r==1:
state_r=0
break
六. 运行结果演示:
比对成功模式 基因显示模式