python根据第一列合并文件_根据python中两个文件的列坐标合并文件

I have a file called snp.txt that looks like this:

chrom chromStart chromEnd name strand observed

chr1 259 260 rs72477211 + A/G single

chr1 433 433 rs56289060 + -/C insertion

chr1 491 492 rs55998931 + C/T single

chr1 518 519 rs62636508 + C/G single

chr1 582 583 rs58108140 + A/G single

I have a second file gene.txt

chrom chromStart chromEnd tf_title tf_score

chr1 200 270 NFKB1 123

chr1 420 440 IRF4 234

chr1 488 550 BCL3 231

chr1 513 579 TCF12 12

chr1 582 583 BAD170 89

The final output I want is: output.txt

chrom chromStart chromEnd name strand observed tf_title tf_score

chr1 259 260 rs72477211 + A/G NFKB1 123

chr1 433 433 rs56289060 + -/C IRF4 234

chr1 491 492 rs55998931 + C/T BCL3 231

chr1 518 519 rs62636508 + C/G TCF12 12

chr1 582 583 rs58108140 + A/G BAD170 89

The key thing I want to be able to do is to look at gene.txt and check if the rsnumber in the name column of snp.txt is in the same region established by chrom, chromStart and chromEnd.

For example:

In the first row of snp.txt

the rsid rs72477211 is on chr1 between positions 259 and 260.

Now in gene.txt, NFKB1 is also on chr1 but between positions 200 and 270,

this means that rsid rs72477211 is located the NFKB1 region, so this is noted in output txt.

I am unable to do this in using pandas merge function and I'm not sure where to even start.

the files are extremely large so a loop would be highly inefficient.

Can someone please help? Thanks!

解决方案

If it fits in memory, you can merge the two dataframes with an outer method base only on chrom column, then filter your result by doing the range inclusion math:

df = snp.merge(gene, how='outer', on='chrom')

df = df[(df.chromStart_x>=df.chromStart_y) & (df.chromEnd_x<=df.chromEnd_y)]

You can eventually remove the duplicate columns:

del test['chromStart_y']

del test['chromEnd_y']

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值