python根据第一列合并文件_根据python中两个文件的列坐标合并文件

最新推荐文章于 2022-03-27 09:06:20 发布

weixin_39999859

最新推荐文章于 2022-03-27 09:06:20 发布

阅读量331

点赞数

文章标签： python根据第一列合并文件

I have a file called snp.txt that looks like this:

chrom chromStart chromEnd name strand observed

chr1 259 260 rs72477211 + A/G single

chr1 433 433 rs56289060 + -/C insertion

chr1 491 492 rs55998931 + C/T single

chr1 518 519 rs62636508 + C/G single

chr1 582 583 rs58108140 + A/G single

I have a second file gene.txt

chrom chromStart chromEnd tf_title tf_score

chr1 200 270 NFKB1 123

chr1 420 440 IRF4 234

chr1 488 550 BCL3 231

chr1 513 579 TCF12 12

chr1 582 583 BAD170 89

The final output I want is: output.txt

chrom chromStart chromEnd name strand observed tf_title tf_score

chr1 259 260 rs72477211 + A/G NFKB1 123

chr1 433 433 rs56289060 + -/C IRF4 234

chr1 491 492 rs55998931 + C/T BCL3 231

chr1 518 519 rs62636508 + C/G TCF12 12

chr1 582 583 rs58108140 + A/G BAD170 89

The key thing I want to be able to do is to look at gene.txt and check if the rsnumber in the name column of snp.txt is in the same region established by chrom, chromStart and chromEnd.

For example:

In the first row of snp.txt

the rsid rs72477211 is on chr1 between positions 259 and 260.

Now in gene.txt, NFKB1 is also on chr1 but between positions 200 and 270,

this means that rsid rs72477211 is located the NFKB1 region, so this is noted in output txt.

I am unable to do this in using pandas merge function and I'm not sure where to even start.

the files are extremely large so a loop would be highly inefficient.

Can someone please help? Thanks!

解决方案

If it fits in memory, you can merge the two dataframes with an outer method base only on chrom column, then filter your result by doing the range inclusion math:

df = snp.merge(gene, how='outer', on='chrom')

df = df[(df.chromStart_x>=df.chromStart_y) & (df.chromEnd_x<=df.chromEnd_y)]

You can eventually remove the duplicate columns:

del test['chromStart_y']

del test['chromEnd_y']

weixin_39999859

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。