python将一行作为字段_Python:将fi中的行转换为列-CSDN博客

我是一个python初学者，尝试编写一个脚本，使用制表符分隔的文本文件作为输入，将指定的行转换为列。以下是文件中的行示例：1 chr1 1008376 1258657 250281 4628 666 2832 565 16.6323226376 83.3676773624

1 chr1 1258657 1516806 258149 2544 601 1481 231 13.4929906542 86.5070093458

1 chr1 1516806 1766886 250080 1652 590 936 63 6.30630630631 93.6936936937

1 chr1 1766886 2017159 250273 5030 1608 2698 362 11.8300653595 88.1699346405

基本上，该文件会遍历一个个体染色体（第0列）中的区域列表（第2-3列），并给出该区域的统计信息（第9列）。文件首先列出个人1的所有区域，然后是2，直到最后一个个体。档案里有20个人。

我喜欢一个新文件，它不包括第0列或第4-8列，并且有新列，这些列是该行（现在是第1-2列）中每个人的分数。所以对于个人1，第3栏是之前的第9栏，第4栏是第2节中该区域的分数，依此类推。因此，每一行的第2列（chr1）作为第0列，而区域分数（第1-2列）后面的20列是20个人的得分。

当前分数是以行为单位的，所以文件有很多行。第1-3列中的每个个体值都是相同的，因此不存在区域不重叠的问题。而且所有的个体都有相同的行数。换句话说，第2+3列在文件中重复了20次。在

如果这太复杂/过于密集，下面的解释就是一个简单的例子来说明这个问题。在

下面是我想要的一个简单的虚拟示例：

原始文件：

^{pr2}$

更改为：chr1 10 20 30423 73476 34656.5

chr1 20 30 40556 43657 90848

因此，如果任何python用户有一些关于将行转换为列的提示，即使您没有时间专门解决这个问题，我也会发现行到列的转换是一个特别棘手的问题，尤其是当它以列中的值为条件时（这里是列0）。在

如果我能澄清这个问题，请告诉我。任何帮助或意见感谢。在

所以更新：谢谢你的评论，以下是我到目前为止的想法：ListofData = [] # make list

individual=1 # only interested in first individual to get list of windows for the chromosome

for line in file('/mnt/genotyping/Alex/wholegenome/LROH/LROHSplitbyChrom/Filtered_by_MappingQuality20/SimpleHomozygosityScore/HomozygosityStatisticsTameratsalllanesMinMQ20chr20'):

line = line.rstrip()

fields = line.split("\t")

if "chr" in line: #avoids header

if int(fields[0]) == individual:

ListofData.extend(fields[2:5]) # add start, end and size of window to list

else: # once iterated through windows, split the list into sets of three, making it one list per line

lol = [ListofData[i:i+3] for i in range(0, len(ListofData), 3)] #list of lists divided into 3's

smallcounter = 0

for i in lol: #for set of 3 in list

for line in file('/mnt/genotyping/Alex/wholegenome/LROH/LROHSplitbyChrom/Filtered_by_MappingQuality20/SimpleHomozygosityScore/HomozygosityStatisticsTameratsalllanesMinMQ20chr20'):

if "chr" in line: # avoids header

line = line.rstrip()

fields = line.split("\t")

if str(fields[2]) == lol.pop(0): #if start position in line matches start position in i

i.extend(fields[9]) #add homozygosity score to list

counter = counter + 1

if smallcounter == 20: #if gone through all individuals in file

smallcounter = 0 #reset counter for next try

print i

我浏览了这个文件，在第2-4列中获得了我想要的信息，并将其放入一个列表中。然后我把这张单子分成三组，每组对应一行。

然后在第二个循环中，我尝试对列表中的每一组3（因此对于列表中的每个列表）遍历文件，如果列表中的第一个位置与文件中的开始位置相同（字段[2]），则将字段[9]中的分数添加到该列表中。

然后我所要做的就是一个接一个地打印列表来得到我想要的。