之前在做分子对接,在网上查了一圈,没有找到合适的工具从Alphafold批量下载指定蛋白的pdb文件。于是打算用爬虫解决这个问题,但是只能从Alphafold下载第一个蛋白,后面的请求就被拒绝了,于是选择自动化模拟点击完成这个工作,缺点是速度慢。下面上代码正文:
import os
from selenium import webdriver
from selenium.webdriver.common.by import By
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.service import Service
from selenium.webdriver import ChromeOptions
from selenium.webdriver.chrome.options import Options
import requests
import time
import shutil
chrome_options = Options()
# chrome_options.add_argument('--headless')
chrome_options.add_argument('--disable-gpu')
option = ChromeOptions()
option.add_experimental_option('excludeSwitcher', ['enable-automation'])
service = Service(executable_path=ChromeDriverManager().install())
# 设置工作路径
path=r"D:\知乎\Alphafold蛋白下载器"
os.chdir(path)
# 记录需要下载的蛋白的文件
file="test.txt"
with open(file,"r")as f:
protein=f.readlines()
faild=""
for ID in protein:
ID=ID.replace("\n","")
print(ID+" is downloading!")
url="https://alphafold.ebi.ac.uk/entry/"+ID
driver = webdriver.Chrome(service=service, options=chrome_options)
driver.set_page_load_timeout(1800)
driver.get(url)
try:
driver.find_element(By.XPATH, "//*[@id=\"main-content-area\"]/app-entry/div[1]/div/app-summary-text/div/div[1]/div[2]/a[1]").click()
time.sleep(10)
driver.close()
# 将下载的文件从浏览器默认的下载位置,转移到目标位置,默认的下载位置可以自己检查自己的计算机
shutil.move(r"C:\Users\Administrator\Downloads\AF-"+ID+"-F1-model_v4.pdb",
r"D:\知乎\Alphafold蛋白下载器\\"+ID+".pdb")
print(ID+" succeed!")
except:
print(ID+" download faild!")
driver.close()
faild=faild+ID+"\n"
with open("faild.txt","w") as df:
df.write(faild)
需要准备的目标蛋白文件如上,注意蛋白后面不要有空格,一个蛋白一行,该ID由uniport网站导出。
下载结果
*另外可关注微信公众号私信定制脚本