python怎么提取文件内容_用 python提取两个文件之间的内容

我有两个文件:

一个文件叫exemple_data.csv 里面包含3个id,每个id一行

ZINC04203483

ZINC26895155

ZINC03651026

一个文件叫exemple.sdf里面包含有十个分子,每个分子有id号和它空间结构数据,每个分子以四个"$$$$"作为结尾

ZINC04203483

7 6 0 0 0 0 0 0 0 0999 V2000

1.7848 -1.3593 -0.0709 C 0 0 0 0 0

1.2676 -3.5870 0.7267 C 0 0 0 0 0

1.0097 -2.1011 0.9436 C 0 0 0 0 0

1.6939 -0.0371 -0.0717 N 0 0 0 0 0

2.5202 -2.0619 -0.9208 N 0 0 0 0 0

2.4714 -3.9467 0.8577 O 0 0 0 0 0

0.2468 -4.2712 0.4339 O 0 0 0 0 0

1 4 1 0 0 0

2 6 1 0 0 0

3 1 1 0 0 0

3 2 1 0 0 0

1 5 2 0 0 0

2 7 2 0 0 0

M CHG 2 5 1 6 -1

M END

>

0.238019541

$$$$

ZINC02034713

7 6 0 0 0 0 0 0 0 0999 V2000

1.4359 -3.6052 0.4738 C 0 0 0 0 0

1.9307 -1.1052 0.7490 C 0 0 0 0 0

1.5337 -2.2272 -0.1964 C 0 0 0 0 0

1.5927 0.2012 0.1266 N 0 0 0 0 0

2.4694 -4.0171 1.0694 O 0 0 0 0 0

0.3107 -4.1689 0.3418 O 0 0 0 0 0

2.5239 -2.3360 -1.2177 O 0 0 0 0 0

1 5 1 0 0 0

2 3 1 0 0 0

2 4 1 0 0 0

3 1 1 0 0 0

3 7 1 0 0 0

1 6 2 0 0 0

M CHG 2 4 1 5 -1

M END

>

0.0787463188

$$$$

ZINC02034711

7 6 0 0 0 0 0 0 0 0999 V2000

1.6225 -3.6225 0.5829 C 0 0 0 0 0

1.0839 -1.1178 0.4821 C 0 0 0 0 0

2.0739 -2.2211 0.1469 C 0 0 0 0 0

1.6545 0.1920 0.0735 N 0 0 0 0 0

0.5089 -4.0191 0.1414 O 0 0 0 0 0

2.4376 -4.2168 1.3471 O 0 0 0 0 0

2.2421 -2.2653 -1.2693 O 0 0 0 0 0

1 5 1 0 0 0

2 3 1 0 0 0

2 4 1 0 0 0

3 1 1 0 0 0

3 7 1 0 0 0

1 6 2 0 0 0

M CHG 2 4 1 5 -1

M END

>

0.279566735

$$$$

ZINC26895155

8 7 0 0 0 0 0 0 0 0999 V2000

2.1705 -1.5475 -0.5415 C 0 0 0 0 0

1.3387 -3.5612 0.6628 C 0 0 0 0 0

1.3018 -2.0375 0.6037 C 0 0 0 0 0

2.2100 -0.2617 -0.7298 N 0 0 0 0 0

2.8130 -2.5199 -1.2719 N 0 0 0 0 0

2.4811 -4.0619 0.8624 O 0 0 0 0 0

0.2238 -4.1310 0.4963 O 0 0 0 0 0

1.4055 0.3868 0.2119 O 0 0 0 0 0

1 5 1 0 0 0

2 6 1 0 0 0

3 1 1 0 0 0

3 2 1 0 0 0

4 8 1 0 0 0

1 4 2 0 0 0

2 7 2 0 0 0

M CHG 1 6 -1

M END

>

0.274481624

$$$$

ZINC01695856

8 7 0 0 0 0 0 0 0 0999 V2000

1.4057 -3.6199 0.4828 C 0 0 0 0 0

0.6383 -0.9506 1.9111 C 0 0 0 0 0

1.4135 -2.2167 -0.1491 C 0 0 0 0 0

1.6928 -1.0605 0.8132 C 0 0 0 0 0

2.4525 -3.9696 1.0940 O 0 0 0 0 0

0.3286 -4.2614 0.3095 O 0 0 0 0 0

2.4250 -2.2353 -1.1545 O 0 0 0 0 0

1.6953 0.1565 0.0693 O 0 0 0 0 0

1 5 1 0 0 0

2 4 1 0 0 0

3 1 1 0 0 0

3 4 1 0 0 0

3 7 1 0 0 0

4 8 1 0 0 0

1 6 2 0 0 0

M CHG 1 5 -1

M END

>

0.0781114399

$$$$

ZINC01695854

8 7 0 0 0 0 0 0 0 0999 V2000

1.6021 -3.5832 0.5544 C 0 0 0 0 0

-0.1123 -1.0849 -0.8065 C 0 0 0 0 0

2.0136 -2.1983 0.0239 C 0 0 0 0 0

0.9936 -1.0796 0.2454 C 0 0 0 0 0

0.5225 -4.0604 0.1088 O 0 0 0 0 0

2.4141 -4.0828 1.3866 O 0 0 0 0 0

2.2393 -2.3565 -1.3754 O 0 0 0 0 0

1.6735 0.1723 0.1761 O 0 0 0 0 0

1 5 1 0 0 0

2 4 1 0 0 0

3 1 1 0 0 0

3 4 1 0 0 0

3 7 1 0 0 0

4 8 1 0 0 0

1 6 2 0 0 0

M CHG 1 5 -1

M END

>

0.284852803

$$$$

ZINC13352867

8 7 0 0 0 0 0 0 0 0999 V2000

1.3740 -3.6291 0.4754 C 0 0 0 0 0

0.5507 -0.9450 1.8830 C 0 0 0 0 0

1.3678 -2.2326 -0.1626 C 0 0 0 0 0

1.6446 -1.1066 0.8351 C 0 0 0 0 0

1.7289 0.1781 0.0725 N 0 0 0 0 0

2.4415 -3.9229 1.0855 O 0 0 0 0 0

0.3299 -4.3189 0.3058 O 0 0 0 0 0

2.4081 -2.2413 -1.1410 O 0 0 0 0 0

1 6 1 0 0 0

2 4 1 0 0 0

3 1 1 0 0 0

3 4 1 0 0 0

3 8 1 0 0 0

4 5 1 0 0 0

1 7 2 0 0 0

M CHG 2 5 1 6 -1

M END

>

0.0959857255

$$$$

ZINC01695855

8 7 0 0 0 0 0 0 0 0999 V2000

1.6218 -3.6149 0.5878 C 0 0 0 0 0

0.9014 -0.9485 2.0417 C 0 0 0 0 0

2.0724 -2.2038 0.1703 C 0 0 0 0 0

1.1102 -1.0715 0.5348 C 0 0 0 0 0

0.5070 -4.0057 0.1448 O 0 0 0 0 0

2.4420 -4.2214 1.3368 O 0 0 0 0 0

2.2392 -2.2394 -1.2457 O 0 0 0 0 0

1.6552 0.1562 0.0551 O 0 0 0 0 0

1 5 1 0 0 0

2 4 1 0 0 0

3 1 1 0 0 0

3 4 1 0 0 0

3 7 1 0 0 0

4 8 1 0 0 0

1 6 2 0 0 0

M CHG 1 5 -1

M END

>

0.280759811

$$$$

ZINC03651026

8 7 0 0 0 0 0 0 0 0999 V2000

1.4934 -3.7154 0.5054 C 0 0 0 0 0

2.4732 -1.3603 0.9745 C 0 0 0 0 0

2.6877 0.0369 0.4066 C 0 0 0 0 0

1.7876 -2.3003 -0.0110 C 0 0 0 0 0

2.5054 -4.3269 0.9548 O 0 0 0 0 0

0.2927 -4.0978 0.4363 O 0 0 0 0 0

1.4341 0.6134 0.0639 O 0 0 0 0 0

2.6547 -2.4571 -1.1350 O 0 0 0 0 0

1 5 1 0 0 0

2 3 1 0 0 0

2 4 1 0 0 0

3 7 1 0 0 0

4 1 1 0 0 0

4 8 1 0 0 0

1 6 2 0 0 0

M CHG 1 5 -1

M END

>

0.315417558

$$$$

ZINC13352859

8 7 0 0 0 0 0 0 0 0999 V2000

1.6269 -3.5849 0.5524 C 0 0 0 0 0

-0.0728 -1.1104 -0.9226 C 0 0 0 0 0

2.0361 -2.2127 -0.0019 C 0 0 0 0 0

0.9669 -1.1360 0.1908 C 0 0 0 0 0

1.6474 0.1967 0.2000 N 0 0 0 0 0

0.5319 -4.0279 0.1019 O 0 0 0 0 0

2.4154 -4.0983 1.3946 O 0 0 0 0 0

2.2503 -2.4010 -1.4013 O 0 0 0 0 0

1 6 1 0 0 0

2 4 1 0 0 0

3 1 1 0 0 0

3 4 1 0 0 0

3 8 1 0 0 0

4 5 1 0 0 0

1 7 2 0 0 0

M CHG 2 5 1 6 -1

M END

>

0.302429646

$$$$

我希望通过第一个文件里的3个ID在第二个文件查找相对应的分子信息,然后输入到一个新文件里。或者输入到6个新文件里每个文件包含一个分子的所有信息内容包括"$$$$"结尾。

我自己编了个程序怎么都不成功,有没有神人可以帮我修改或重新写一个

我编的程序可以用python tire_database_sdf.py exemple_data.csv exemple.sdf result.csv

现在的问题是我result里的内容exemple.sdf 一样有十个分子,可我只希望result内容只包含六个分子信息对照exemple_data.csv 里边的3个id。

#!/usr/bin/env python

# -*- coding: utf-8 -*-

import sys

import re

filename = sys.argv[1]

inputfile = sys.argv[2]

outfile = sys.argv[3]

def liste_id(filename):

list_id = []

with open(filename,"r") as f:

for i in f:

i = i.strip("\n")

list_id.append(i)

return list_id

identifiant = liste_id(filename)

filout = open(outfile,"w")

with open(inputfile,"r") as filin:

newmol = False

element = []

for line in filin:

for ele in identifiant:

if re.search(ele,line):

newmol = True

if line == "$$$$":

newmol = False

if newmol == True:

filout.write(line)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值