很少有人爬pubmed的,我来填个空,直接上干货。
import requests
import re
#获取文章PMID号
def html_re():
response = requests.get(url).text
id = re.findall('data-chunk-ids="(.*?)>', response, re.S)
ids = id[0].split(',')
return ids
#获取标题
def html_title():
content = []
for i in url1:
html1 = requests.get(i)
html1 = html1.text
title = re.findall('<title>(.*?) - PubMed</title>',html1)
content.append(title)
return content
if __name__ == '__main__':
content=input('请输入关键词:')
url='https://pubmed.ncbi.nlm.nih.gov/?term='+content+'&sort=date'
ids=html_re()
url1=['https://pubmed.ncbi.nlm.nih.gov/'+i+'/' for i in ids]
title=html_title()
print(title)
执行后,输入要搜索的关键词,因为是国外的网站,关键词必须是英文,例如KIT,然后回车,这里只显示第一页所有文章的标题,
请输入内容:melanin
[['MiR-139 protects against oxygen-glucose deprivation/reoxygenation (OGD/R)-induced nerve injury through targeting c-Jun to inhibit NLRP3 inflammasome activation'],
['Anti-cancer effect of Urginea maritima bulb extract invitro through cell cycle arrest and induction of apoptosis in human breast cancer cell lines'],
['Pilot Study of an Overdose First Aid Program in Juvenile Detention'],
['CD117 Is a Specific Marker of Intraductal Papillary Mucinous Neoplasms (IPMN) of the Pancreas, Oncocytic Subtype'],
['Exploring Seminal Plasma GSTM3 as a Quality and In Vivo Fertility Biomarker in Pigs-Relationship with Sperm Morphology'],
['Reproductive dysfunction linked to alteration of endocrine activities in zebrafish exposed to mono-(2-ethylhexyl) phthalate (MEHP)'],
['Increasing the Functional Group Diversity in Helical β-Peptoids: Achievement of Solvent- and pH-Dependent Folding'],
['Inhibition of B7-H4 promotes hepatocellular carcinoma cell apoptosis and autophagy through the PI3K signaling pathway'],
['Switch Control Inhibition of KIT and PDGFRA in Patients With Advanced Gastrointestinal Stromal Tumor: A Phase I Study of Ripretinib']]
。。。。。。。