一、下载文章,查看文章结构
相信大部分做生物医学领域的,对pubmed文献数据库都比较熟悉,今天主要是利用pubmed提供的文献检索数据库,下载对应的xml文章结构,利用python对其进行解析,导入mysql数据库中。
首先,pubmed文献数据库的网址是https://www.ncbi.nlm.nih.gov/pubmed/ ,下载了部分年份的文章,文章结构主要如下:
<PubmedArticle>
<MedlineCitation Status="MEDLINE" Owner="NLM">
<PMID Version="1">25534978</PMID>
<DateCompleted>
<Year>2015</Year>
<Month>08</Month>
<Day>21</Day>
</DateCompleted>
<DateRevised>
<Year>2016</Year>
<Month>12</Month>
<Day>15</Day>
</DateRevised>
<Article PubModel="Print">
<Journal>
<ISSN IssnType="Electronic">1744-8409</ISSN>
<JournalIssue CitedMedium="Internet">
<Volume>11</Volume>
<Issue>1</Issue>
<PubDate>
<Year>2015</Year>
<Month>Jan</Month>
</PubDate>
</JournalIssue>
<Title>Expert review of clinical immunology</Title>
<ISOAbbreviation>Expert Rev Clin Immunol</ISOAbbreviation>
</Journal>
<ArticleTitle>Autoimmune disease in the epigenetic era: how has epigenetics changed our understanding of disease and how can we expect the field to evolve?</ArticleTitle>
<Pagination>
<MedlinePgn>45-58</MedlinePgn>
</Pagination>
<ELocationID EIdType="doi" ValidYN="Y">10.1586/1744666X.2015.994507</ELocationID>
<Abstract>
<AbstractText>Autoimmune diseases are complex and enigmatic, and have presented particular challenges to researchers seeking to define their etiology and explain progression. Previous studies have implicated epigenetic influences in the development of autoimmunity. Epigenetics describes changes in gene expression related to environmental influences without alterations in the underlying genomic sequence, generally classified into three main groups: cytosine genomic DNA methylation, modification of various sidechain positions of histone proteins and noncoding RNAs feedback. The purpose of this article is to review the most relevant literature describing alterations of epigenetic marks in the development and progression of four common autoimmune diseases: systemic lupus erythematosus, rheumatoid arthritis, systemic sclerosis and Sjögren's syndrome. The contribution of DNA methylation, histone modification and noncoding RNA for each of these disorders is discussed, including examples both of candidate gene studies and larger epigenomics surveys, and in various tissue types important for the pathogenesis of each. The future of the field is speculated briefly, as is the possibility of therapeutic interventions targeting the epigenome. </AbstractText>
</Abstract>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Jeffries</LastName>
<ForeName>Matlock A</ForeName>
<Initials>MA</Initials>
<AffiliationInfo>
<Affiliation>Department of Internal Medicine, Division of Rheumatology, Immunology and Allergy, University of Oklahoma Health Sciences Center, Oklahoma City, OK, USA.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Sawalha</LastName>
<ForeName>Amr H</ForeName>
<Initials>AH</Initials>
</Author>
</AuthorList>
<Language>eng</Language>
<GrantList CompleteYN="Y">
<Grant>
<GrantID>R01 AI097134</GrantID>
<Acronym>AI</Acronym>
<Agency>NIAID NIH HHS</Agency>
<Country>United States</Country>
</Grant>
<Grant>
<GrantID>R01AI097134</GrantID>
<Acronym>AI</Acronym>
<Agency>NIAID NIH HHS</Agency>
<Country>United States</Country>
</Grant>
</GrantList>
<PublicationTypeList>
<PublicationType UI="D016428">Journal Article</PublicationType>
<PublicationType UI="D052061">Research Support, N.I.H., Extramural</PublicationType>
<PublicationType UI="D016454">Review</PublicationType>
</PublicationTypeList>
</Article>
<MedlineJournalInfo>
<Country>England</Country>
<MedlineTA>Expert Rev Clin Immunol</MedlineTA>
<NlmUniqueID>101271248</NlmUniqueID>
<ISSNLinking>1744-666X</ISSNLinking>
</MedlineJournalInfo>
<ChemicalList>
<Chemical>
<RegistryNumber>0</RegistryNumber>
<NameOfSubstance UI="D006657">Histones</NameOfSubstance>
</Chemical>
<Chemical>
<RegistryNumber>0</RegistryNumber>
<NameOfSubstance UI="D022661">RNA, Untranslated</NameOfSubstance>
</Chemical>
</ChemicalList>
<CitationSubset>IM</CitationSubset>
<CommentsCorrectionsList>
<CommentsCorrections RefType="Cites">
<RefSource>Eur J Immunol. 2007 May;37(5):1407-13</RefSource>
<PMID Version="1">17429846</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Nature. 2007 May 24;447(7143):396-8</RefSource>
<PMID Version="1">17522671</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Arthritis Rheum. 2011 May;63(5):1376-86</RefSource>
<PMID Version="1">21538319</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Arthritis Rheum. 2011 May;63(5):1452-8</RefSource>
<PMID Version="1">21538322</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Epigenetics. 2011 May;6(5):593-601</RefSource>
<PMID Version="1">21436623</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Eur J Immunol. 2011 Jul;41(7):2029-39</RefSource>
<PMID Version="1">21469088</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Blood. 2011 Aug 11;118(6):1472-80</RefSource>
<PMID Version="1">21613261</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Cell Cycle. 2011 Aug 15;10(16):2662-8</RefSource>
<PMID Version="1">21811096</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Genes Immun. 2011 Dec;12(8):643-52</RefSource>
<PMID Version="1">21753787</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>J Transl Med. 2011;9:192</RefSource>
<PMID Version="1">22060015</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>PLoS One. 2011;6(11):e28104</RefSource>
<PMID Version="1">22140515</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Clin Immunol. 2012 Apr;143(1):39-44</RefSource>
<PMID Version="1">22306512</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>J Immunol. 2010 Mar 1;184(5):2718-28</RefSource>
<PMID Version="1">20100935</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Genes Immun. 2010 Mar;11(2):124-33</RefSource>
<PMID Version="1">19710693</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Arthritis Rheum. 2010 May;62(5):1438-47</RefSource>
<PMID Version="1">20131288</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>J Autoimmun. 2010 Aug;35(1):58-69</RefSource>
<PMID Version="1">20223637</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>J Immunol. 2010 Jun 15;184(12):6773-81</RefSource>
<PMID Version="1">20483747</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Arthritis Rheum. 2010 Jun;62(6):1733-43</RefSource>
<PMID Version="1">20201077</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Clin Rev Allergy Immunol. 2010 Aug;39(1):78-84</RefSource>
<PMID Version="1">19662539</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>J Biomed Biotechnol. 2010;2010:931018</RefSource>
<PMID Version="1">20589076</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Arthritis Res Ther. 2010;12(3):R81</RefSource>
<PMID Version="1">20459811</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Cancer. 2010 Sep 1;116(17):4043-53</RefSource>
<PMID Version="1">20564122</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>J Dermatol Sci. 2010 Sep;59(3):198-203</RefSource>
<PMID Version="1">20724115</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>J Biol Chem. 2013 Jul 26;288(30):21936-44</RefSource>
<PMID Version="1">23775084</PMID>