解析pubmed文献数据库的xml文章结构

第一步:下载pubmed文章

以下是pubmed文献数据库的网址,
https://www.ncbi.nlm.nih.gov/pubmed/

文章结构如下:

<PubmedArticle>
    <MedlineCitation Status="MEDLINE" Owner="NLM">
        <PMID Version="1">25534978</PMID>
        <DateCompleted>
            <Year>2015</Year>
            <Month>08</Month>
            <Day>21</Day>
        </DateCompleted>
        <DateRevised>
            <Year>2016</Year>
            <Month>12</Month>
            <Day>15</Day>
        </DateRevised>
        <Article PubModel="Print">
            <Journal>
                <ISSN IssnType="Electronic">1744-8409</ISSN>
                <JournalIssue CitedMedium="Internet">
                    <Volume>11</Volume>
                    <Issue>1</Issue>
                    <PubDate>
                        <Year>2015</Year>
                        <Month>Jan</Month>
                    </PubDate>
                </JournalIssue>
                <Title>Expert review of clinical immunology</Title>
                <ISOAbbreviation>Expert Rev Clin Immunol</ISOAbbreviation>
            </Journal>
            <ArticleTitle>Autoimmune disease in the epigenetic era: how has epigenetics changed our understanding of disease and how can we expect the field to evolve?</ArticleTitle>
            <Pagination>
                <MedlinePgn>45-58</MedlinePgn>
            </Pagination>
            <ELocationID EIdType="doi" ValidYN="Y">10.1586/1744666X.2015.994507</ELocationID>
            <Abstract>
                <AbstractText>Autoimmune diseases are complex and enigmatic, and have presented particular challenges to researchers seeking to define their etiology and explain progression. Previous studies have implicated epigenetic influences in the development of autoimmunity. Epigenetics describes changes in gene expression related to environmental influences without alterations in the underlying genomic sequence, generally classified into three main groups: cytosine genomic DNA methylation, modification of various sidechain positions of histone proteins and noncoding RNAs feedback. The purpose of this article is to review the most relevant literature describing alterations of epigenetic marks in the development and progression of four common autoimmune diseases: systemic lupus erythematosus, rheumatoid arthritis, systemic sclerosis and Sjögren's syndrome. The contribution of DNA methylation, histone modification and noncoding RNA for each of these disorders is discussed, including examples both of candidate gene studies and larger epigenomics surveys, and in various tissue types important for the pathogenesis of each. The future of the field is speculated briefly, as is the possibility of therapeutic interventions targeting the epigenome. </AbstractText>
            </Abstract>
            <AuthorList CompleteYN="Y">
                <Author ValidYN="Y">
                    <LastName>Jeffries</LastName>
                    <ForeName>Matlock A</ForeName>
                    <Initials>MA</Initials>
                    <AffiliationInfo>
                        <Affiliation>Department of Internal Medicine, Division of Rheumatology, Immunology and Allergy, University of Oklahoma Health Sciences Center, Oklahoma City, OK, USA.</Affiliation>
                    </AffiliationInfo>
                </Author>
                <Author ValidYN="Y">
                    <LastName>Sawalha</LastName>
                    <ForeName>Amr H</ForeName>
                    <Initials>AH</Initials>
                </Author>
            </AuthorList>
            <Language>eng</Language>
            <GrantList CompleteYN="Y">
                <Grant>
                    <GrantID>R01 AI097134</GrantID>
                    <Acronym>AI</Acronym>
                    <Agency>NIAID NIH HHS</Agency>
                    <Country>United States</Country>
                </Grant>
                <Grant>
                    <GrantID>R01AI097134</GrantID>
                    <Acronym>AI</Acronym>
                    <Agency>NIAID NIH HHS</Agency>
                    <Country>United States</Country>
                </Grant>
            </GrantList>
            <PublicationTypeList>
                <PublicationType UI="D016428">Journal Article</PublicationType>
                <PublicationType UI="D052061">Research Support, N.I.H., Extramural</PublicationType>
                <PublicationType UI="D016454">Review</PublicationType>
            </PublicationTypeList>
        </Article>
        <MedlineJournalInfo>
            <Country>England</Country>
            <MedlineTA>Expert Rev Clin Immunol</MedlineTA>
            <NlmUniqueID>101271248</NlmUniqueID>
            <ISSNLinking>1744-666X</ISSNLinking>
        </MedlineJournalInfo>
        <ChemicalList>
            <Chemical>
                <RegistryNumber>0</RegistryNumber>
                <NameOfSubstance UI="D006657">Histones</NameOfSubstance>
            </Chemical>
            <Chemical>
                <RegistryNumber>0</RegistryNumber>
                <NameOfSubstance UI="D022661">RNA, Untranslated</NameOfSubstance>
            </Chemical>
        </ChemicalList>
        <CitationSubset>IM</CitationSubset>
        <CommentsCorrectionsList>
            <CommentsCorrections RefType="Cites">
                <RefSource>Eur J Immunol. 2007 May;37(5):1407-13</RefSource>
                <PMID Version="1">17429846</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Nature. 2007 May 24;447(7143):396-8</RefSource>
                <PMID Version="1">17522671</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Arthritis Rheum. 2011 May;63(5):1376-86</RefSource>
                <PMID Version="1">21538319</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Arthritis Rheum. 2011 May;63(5):1452-8</RefSource>
                <PMID Version="1">21538322</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Epigenetics. 2011 May;6(5):593-601</RefSource>
                <PMID Version="1">21436623</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Eur J Immunol. 2011 Jul;41(7):2029-39</RefSource>
                <PMID Version="1">21469088</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Blood. 2011 Aug 11;118(6):1472-80</RefSource>
                <PMID Version="1">21613261</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Cell Cycle. 2011 Aug 15;10(16):2662-8</RefSource>
                <PMID Version="1">21811096</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Genes Immun. 2011 Dec;12(8):643-52</RefSource>
                <PMID Version="1">21753787</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>J Transl Med. 2011;9:192</RefSource>
                <PMID Version="1">22060015</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>PLoS One. 2011;6(11):e28104</RefSource>
                <PMID Version="1">22140515</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Clin Immunol. 2012 Apr;143(1):39-44</RefSource>
                <PMID Version="1">22306512</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>J Immunol. 2010 Mar 1;184(5):2718-28</RefSource>
                <PMID Version="1">20100935</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Genes Immun. 2010 Mar;11(2):124-33</RefSource>
                <PMID Version="1">19710693</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Arthritis Rheum. 2010 May;62(5):1438-47</RefSource>
                <PMID Version="1">20131288</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>J Autoimmun. 2010 Aug;35(1):58-69</RefSource>
                <PMID Version="1">20223637</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>J Immunol. 2010 Jun 15;184(12):6773-81</RefSource>
                <PMID Version="1">20483747</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Arthritis Rheum. 2010 Jun;62(6):1733-43</RefSource>
                <PMID Version="1">20201077</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Clin Rev Allergy Immunol. 2010 Aug;39(1):78-84</RefSource>
                <PMID Version="1">19662539</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>J Biomed Biotechnol. 2010;2010:931018</RefSource>
                <PMID Version="1">20589076</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Arthritis Res Ther. 2010;12(3):R81</RefSource>
                <PMID Version="1">20459811</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Cancer. 2010 Sep 1;116(17):4043-53</RefSource>
                <PMID Version="1">20564122</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>J Dermatol Sci. 2010 Sep;59(3):198-203</RefSource>
                <PMID Version="1">20724115</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>J Biol Chem. 2013 Jul 26;288(30):21936-44</RefSource>
                <PMID Version="1">23775084</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>PLoS Genet. 2013;9(8):e1003678</RefSource>
                <PMID Version="1">23950730</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Clin Immunol. 2013 Oct;149(1):46-54</RefSource>
                <PMID Version="1">23891737</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>J Immunol. 2012 Apr 15;188(8):3567-71</RefSource>
                <PMID Version="1">22422882</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Ann Rheum Dis. 2000 Jun;59(6):455-61</RefSource>
                <PMID Version="1">10834863</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Arthritis Rheum. 2000 Dec;43(12):2634-47</RefSource>
                <PMID Version="1">11145021</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Arthritis Rheum. 2000 Dec;43(12):2807-17</RefSource>
                <PMID Version="1">11145040</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Arthritis Rheum. 2002 May;46(5):1282-91</RefSource>
                <PMID Version="1">12115234</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Arthritis Rheum. 2012 Jul;64(7):2338-45</RefSource>
                <PMID Version="1">22231486</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Genes Immun. 2012 Jul;13(5):388-98</RefSource>
                <PMID Version="1">22495533</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Rheumatology (Oxford). 2012 Sep;51(9):1550-6</RefSource>
                <PMID Version="1">22661558</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Arthritis Rheum. 2012 Sep;64(9):2964-74</RefSource>
                <PMID Version="1">22549474</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Clin Immunol. 2012 Oct;145(1):13-8</RefSource>
                <PMID Version="1">22889643</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Ann Rheum Dis. 2013 Jan;72(1):110-7</RefSource>
                <PMID Version="1">22736089</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>J Immunol. 2013 Feb 1;190(3):1297-303</RefSource>
                <PMID Version="1">23277489</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Arthritis Rheum. 2013 Feb;65(2):481-91</RefSource>
                <PMID Version="1">23045159</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Nat Genet. 2013 Feb;45(2):124-30</RefSource>
                <PMID Version="1">23263488</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Nat Biotechnol. 2013 Feb;31(2):142-7</RefSource>
                <PMID Version="1">23334450</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Ann Rheum Dis. 2013 Apr;72(4):614-20</RefSource>
                <PMID Version="1">22915621</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>J Autoimmun. 2013 Mar;41:6-16</RefSource>
                <PMID Version="1">23306098</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>J Autoimmun. 2013 Mar;41:168-74</RefSource>
                <PMID Version="1">23428850</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>J Autoimmun. 2013 Mar;41:175-81</RefSource>
                <PMID Version="1">23478041</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>J Autoimmun. 2013 Jun;43:78-84</RefSource>
                <PMID Version="1">23623029</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>J Clin Immunol. 2013 Aug;33(6):1100-9</RefSource>
                <PMID Version="1">23657402</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Clin Immunol. 2013 Aug;148(2):254-7</RefSource>
                <PMID Version="1">23773924</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>FEBS Lett. 2014 Nov 17;588(22):4244-9</RefSource>
                <PMID Version="1">24873878</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Curr Opin Immunol. 2014 Dec;31:16-23</RefSource>
                <PMID Version="1">25214301</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Ann Rheum Dis. 2015 Jun;74(6):1265-74</RefSource>
                <PMID Version="1">24562503</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Ann Rheum Dis. 2015 Aug;74(8):1612-20</RefSource>
                <PMID Version="1">24812288</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Annu Rev Immunol. 2005;23:307-36</RefSource>
                <PMID Version="1">15771573</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>J Biol Chem. 2005 Dec 9;280(49):40749-56</RefSource>
                <PMID Version="1">16230360</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>J Proteome Res. 2005 Nov-Dec;4(6):2032-42</RefSource>
                <PMID Version="1">16335948</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Curr Dir Autoimmun. 2006;9:173-87</RefSource>
                <PMID Version="1">16394661</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Arthritis Res Ther. 2010;12(4):R133</RefSource>
                <PMID Version="1">20609223</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Mol Biotechnol. 2010 Nov;46(3):243-9</RefSource>
                <PMID Version="1">20563671</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Mod Rheumatol. 2010 Oct;20(5):458-65</RefSource>
                <PMID Version="1">20490598</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Genes Immun. 2010 Oct;11(7):554-60</RefSource>
                <PMID Version="1">20463746</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Rheumatol Int. 2010 Nov;30(12):1627-33</RefSource>
                <PMID Version="1">20049450</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>J Immunol. 2010 Nov 15;185(10):6355-63</RefSource>
                <PMID Version="1">20952683</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Arthritis Rheum. 2012 Jun;64(6):1809-17</RefSource>
                <PMID Version="1">22170508</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Mol Ther. 2012 Jun;20(6):1251-60</RefSource>
                <PMID Version="1">22395530</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>J Biol Chem. 2003 Feb 14;278(7):4806-12</RefSource>
                <PMID Version="1">12473678</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Eur J Immunol. 2003 Oct;33(10):2792-800</RefSource>
                <PMID Version="1">14515263</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>J Immunol. 2004 Mar 15;172(6):3652-61</RefSource>
                <PMID Version="1">15004168</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Autoimmunity. 2004 Feb;37(1):57-65</RefSource>
                <PMID Version="1">15115313</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Arthritis Rheum. 2004 Jun;50(6):1850-60</RefSource>
                <PMID Version="1">15188362</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Arthritis Rheum. 2004 Oct;50(10):3365-76</RefSource>
                <PMID Version="1">15476220</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Proc Natl Acad Sci U S A. 1967 May;57(5):1394-400</RefSource>
                <PMID Version="1">5231746</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Proc Natl Acad Sci U S A. 1985 Dec;82(24):8629-33</RefSource>
                <PMID Version="1">2417226</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Hum Immunol. 1986 Dec;17(4):456-70</RefSource>
                <PMID Version="1">2432050</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Arthritis Rheum. 1990 Nov;33(11):1665-73</RefSource>
                <PMID Version="1">2242063</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>J Rheumatol. 1991 Apr;18(4):530-4</RefSource>
                <PMID Version="1">2066944</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>J Immunol. 1991 Sep 1;147(5):1477-83</RefSource>
                <PMID Version="1">1715359</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>J Chromatogr. 1991 May 31;566(2):481-91</RefSource>
                <PMID Version="1">1939459</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>J Clin Invest. 1993 Jul;92(1):38-53</RefSource>
                <PMID Version="1">7686923</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>J Immunol. 1995 Feb 1;154(3):1470-80</RefSource>
                <PMID Version="1">7529804</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>J Immunol. 1995 Mar 15;154(6):3025-35</RefSource>
                <PMID Version="1">7533191</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Lupus. 1997;6(3):326-7</RefSource>
                <PMID Version="1">9296780</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Nat Genet. 1998 Jun;19(2):187-91</RefSource>
                <PMID Version="1">9620779</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Arthritis Rheum. 2005 Jan;52(1):201-11</RefSource>
                <PMID Version="1">15641052</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Ann Rheum Dis. 2005 Mar;64(3):481-3</RefSource>
                <PMID Version="1">15708899</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Epigenetics. 2013 Jul;8(7):679-84</RefSource>
                <PMID Version="1">23803967</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Autoimmun Rev. 2013 Oct;12(12):1160-5</RefSource>
                <PMID Version="1">23860189</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Nat Rev Rheumatol. 2013 Nov;9(11):674-86</RefSource>
                <PMID Version="1">24100461</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Nat Biotechnol. 2013 Dec;31(12):1137-42</RefSource>
                <PMID Version="1">24108092</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Arthritis Rheumatol. 2014 Mar;66(3):549-59</RefSource>
                <PMID Version="1">24574214</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Ann Rheum Dis. 2014 Jun;73(6):1232-9</RefSource>
                <PMID Version="1">23698475</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Acta Histochem. 2014 Jun;116(5):891-7</RefSource>
                <PMID Version="1">24657071</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Clin Exp Immunol. 2014 Sep;177(3):641-51</RefSource>
                <PMID Version="1">24816316</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Genome Biol. 2013;14(3):R21</RefSource>
                <PMID Version="1">23497655</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Genome Biol. 2014;15(2):R31</RefSource>
                <PMID Version="1">24495553</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Arthritis Rheumatol. 2014 Oct;66(10):2804-15</RefSource>
                <PMID Version="1">24980887</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Nat Commun. 2012;3:735</RefSource>
                <PMID Version="1">22415826</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>J Immunol. 2012 Apr 1;188(7):3323-31</RefSource>
                <PMID Version="1">22379029</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Arthritis Rheum. 2007 Jun;56(6):1921-33</RefSource>
                <PMID Version="1">17530637</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Arthritis Rheum. 2007 Aug;56(8):2755-64</RefSource>
                <PMID Version="1">17665426</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>J Immunol. 2007 Oct 15;179(8):5553-63</RefSource>
                <PMID Version="1">17911642</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>J Immunol. 2007 Nov 1;179(9):6352-8</RefSource>
                <PMID Version="1">17947713</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Rheumatology (Oxford). 2007 Dec;46(12):1796-803</RefSource>
                <PMID Version="1">18032537</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Ann Rheum Dis. 2008 Jun;67(6):867-72</RefSource>
                <PMID Version="1">17823201</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Nat Genet. 2008 Jun;40(6):741-50</RefSource>
                <PMID Version="1">18488029</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Genes Immun. 2008 Jun;9(4):368-78</RefSource>
                <PMID Version="1">18523434</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Arthritis Rheum. 2008 Aug;58(8):2511-7</RefSource>
                <PMID Version="1">18668569</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Arthritis Rheum. 2008 Nov;58(11):3562-73</RefSource>
                <PMID Version="1">18975310</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Arthritis Rheum. 2009 May;60(5):1519-29</RefSource>
                <PMID Version="1">19404935</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>J Biol Chem. 2009 Jul 3;284(27):17897-901</RefSource>
                <PMID Version="1">19342379</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Clin Immunol. 2009 Sep;132(3):362-70</RefSource>
                <PMID Version="1">19520616</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>J Rheumatol. 2009 Aug;36(8):1580-9</RefSource>
                <PMID Version="1">19531758</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>J Immunol. 2009 Sep 1;183(5):3109-17</RefSource>
                <PMID Version="1">19648272</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>PLoS One. 2009;4(8):e6718</RefSource>
                <PMID Version="1">19701459</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>ISME J. 2011 Jan;5(1):82-91</RefSource>
                <PMID Version="1">20613793</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Immunol Lett. 2011 Mar 30;135(1-2):96-9</RefSource>
                <PMID Version="1">20937307</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>J Dermatol Sci. 2009 Oct;56(1):33-6</RefSource>
                <PMID Version="1">19651491</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Lupus. 2009 Oct;18(12):1037-44</RefSource>
                <PMID Version="1">19762376</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Nat Genet. 2009 Nov;41(11):1228-33</RefSource>
                <PMID Version="1">19838195</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Nature. 2009 Nov 19;462(7271):315-22</RefSource>
                <PMID Version="1">19829295</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Mol Immunol. 2009 Dec;47(2-3):511-6</RefSource>
                <PMID Version="1">19747733</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Arthritis Rheum. 2009 Dec;60(12):3613-22</RefSource>
                <PMID Version="1">19950268</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Genome Res. 2010 Feb;20(2):170-9</RefSource>
                <PMID Version="1">20028698</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Scand J Rheumatol. 2009;38(5):369-74</RefSource>
                <PMID Version="1">19444718</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Arthritis Rheum. 2006 Mar;54(3):779-87</RefSource>
                <PMID Version="1">16508942</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>J Autoimmun. 2006 May;26(3):165-71</RefSource>
                <PMID Version="1">16621447</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Arthritis Rheum. 2006 Jul;54(7):2271-9</RefSource>
                <PMID Version="1">16802366</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Br J Pharmacol. 2007 Apr;150(7):862-72</RefSource>
                <PMID Version="1">17325656</PMID>
            </CommentsCorrections>
            <CommentsCorrections RefType="Cites">
                <RefSource>Clin Rheumatol. 2007 May;26(5):723-8</RefSource>
                <PMID Version="1">17103120</PMID>
            </CommentsCorrections>
        </CommentsCorrectionsList>
        <MeshHeadingList>
            <MeshHeading>
                <DescriptorName UI="D000818" MajorTopicYN="N">Animals</DescriptorName>
            </MeshHeading>
            <MeshHeading>
                <DescriptorName UI="D001327" MajorTopicYN="Y">Autoimmune Diseases</DescriptorName>
                <QualifierName UI="Q000235" MajorTopicYN="N">genetics</QualifierName>
                <QualifierName UI="Q000276" MajorTopicYN="N">immunology</QualifierName>
            </MeshHeading>
            <MeshHeading>
                <DescriptorName UI="D019175" MajorTopicYN="N">DNA Methylation</DescriptorName>
                <QualifierName UI="Q000276" MajorTopicYN="Y">immunology</QualifierName>
            </MeshHeading>
            <MeshHeading>
                <DescriptorName UI="D044127" MajorTopicYN="N">Epigenesis, Genetic</DescriptorName>
                <QualifierName UI="Q000276" MajorTopicYN="Y">immunology</QualifierName>
            </MeshHeading>
            <MeshHeading>
                <DescriptorName UI="D059647" MajorTopicYN="Y">Gene-Environment Interaction</DescriptorName>
            </MeshHeading>
            <MeshHeading>
                <DescriptorName UI="D006657" MajorTopicYN="Y">Histones</DescriptorName>
                <QualifierName UI="Q000235" MajorTopicYN="N">genetics</QualifierName>
                <QualifierName UI="Q000276" MajorTopicYN="N">immunology</QualifierName>
            </MeshHeading>
            <MeshHeading>
                <DescriptorName UI="D006801" MajorTopicYN="N">Humans</DescriptorName>
            </MeshHeading>
            <MeshHeading>
                <DescriptorName UI="D011499" MajorTopicYN="N">Protein Processing, Post-Translational</DescriptorName>
                <QualifierName UI="Q000235" MajorTopicYN="N">genetics</QualifierName>
                <QualifierName UI="Q000276" MajorTopicYN="Y">immunology</QualifierName>
            </MeshHeading>
            <MeshHeading>
                <DescriptorName UI="D022661" MajorTopicYN="Y">RNA, Untranslated</DescriptorName>
                <QualifierName UI="Q000235" MajorTopicYN="N">genetics</QualifierName>
                <QualifierName UI="Q000276" MajorTopicYN="N">immunology</QualifierName>
            </MeshHeading>
        </MeshHeadingList>
        <KeywordList Owner="NOTNLM">
            <Keyword MajorTopicYN="N">Sjögren’s syndrome</Keyword>
            <Keyword MajorTopicYN="N">autoimmune disease</Keyword>
            <Keyword MajorTopicYN="N">epigenetics</Keyword>
            <Keyword MajorTopicYN="N">histone modification</Keyword>
            <Keyword MajorTopicYN="N">methylation</Keyword>
            <Keyword MajorTopicYN="N">miRNA</Keyword>
            <Keyword MajorTopicYN="N">rheumatoid arthritis</Keyword>
            <Keyword MajorTopicYN="N">systemic lupus erythematosus</Keyword>
            <Keyword MajorTopicYN="N">systemic sclerosis</Keyword>
        </KeywordList>
    </MedlineCitation>
    <PubmedData>
        <History>
            <PubMedPubDate PubStatus="entrez">
                <Year>2014</Year>
                <Month>12</Month>
                <Day>24</Day>
                <Hour>6</Hour>
                <Minute>0</Minute>
            </PubMedPubDate>
            <PubMedPubDate PubStatus="pubmed">
                <Year>2014</Year>
                <Month>12</Month>
                <Day>24</Day>
                <Hour>6</Hour>
                <Minute>0</Minute>
            </PubMedPubDate>
            <PubMedPubDate PubStatus="medline">
                <Year>2015</Year>
                <Month>8</Month>
                <Day>22</Day>
                <Hour>6</Hour>
                <Minute>0</Minute>
            </PubMedPubDate>
        </History>
        <PublicationStatus>ppublish</PublicationStatus>
        <ArticleIdList>
            <ArticleId IdType="pubmed">25534978</ArticleId>
            <ArticleId IdType="doi">10.1586/1744666X.2015.994507</ArticleId>
            <ArticleId IdType="pmc">PMC4636192</ArticleId>
            <ArticleId IdType="mid">NIHMS732942</ArticleId>
        </ArticleIdList>
    </PubmedData>
</PubmedArticle>

第二步: 解析文章

1、通过python中提供的xml解析工具,ElemenTree结构,Element类型是一个容器对象,主要就是在内存中存储一些结构化的数据。接下来上代码。由于老师项目的原因,要基于pubmed中的文章结构,建立全文检索数据库。所以首先是提取了以下的字段信息,代码如下:

import xml.etree.ElementTree as ET
import pandas as pd
import numpy as np


# 国家信息
# 存在问题:有的作者单位中的国家名称不统一,例如:美国(USA, UNITED STATES)
#         有的作者单位中没国家,只写了州
# 目前解决方法:通过查看缺失或无法识别国家信息的xml文档,添加可以识别国家信息的词到词库
import pycountry
country_name = [
    str.strip(str.split(i.name.upper(), ",")[0])
    for i in list(pycountry.countries)
]

# 美国和英国有多种表示方式
# 添加已找到的一些国家或省份信息
country_name.extend([
    "USA", "UK.", "UK ", "LONDON", "São Paulo", "IRAN", "México", "Birmingham",
    "Chicago", "Deutschland", "Tokyo", "Nagoya ", "España", "serbia", "paris",
    "pennsylvania", "birmingham", "chicago", "nagoya", "España",
    "Belo Horizonte", "CHINESE","San Pietro Vernotico"
])
country_name = [i.upper() for i in country_name]

# 整合用于识别同一个国家的多种代表性词
# 例如:整合USA, UNITED STATES 为 USA
def CombineCountry(CountryInfo):

    usa = [
        "USA", "UNITED STATES", "CHICAGO", "BIRMINGHAM", "PENNSYLVANIA",
        "BIRMINGHAM"
    ]
    uk = ["UK.", "UK ", "UNITED KINGDOM", "LONDON"]
    germany = ["GERMANY", "DEUTSCHLAND"]
    mexico = ["MÉXICO", "MEXICO"]
    japan = ["JAPAN", "TOKYO", "NAGOYA"]
    barzil = ["BRAZIL", "SÃO PAULO", "BELO HORIZONTE"]
    france = ["FRANCE", "PARIS"]
    spain = ["SPAIN", "ESPAÑA"]
    china = ["CHINA", "HONG KONG", "MACAO", "CHINESE", "TAIWAN"]
    italy=["ITALY","SAN PIETRO VERNOTICO"]
    CountryInfo_arr = []
    for i in np.arange(len(CountryInfo)):
        if CountryInfo[i] in usa:
            CountryInfo_arr.append("USA")
            continue
        elif CountryInfo[i] in uk:
            CountryInfo_arr.append("UK")
            continue
        elif CountryInfo[i] in germany:
            CountryInfo_arr.append("GERMANY")
            continue
        elif CountryInfo[i] in mexico:
            CountryInfo_arr.append("MEXICO")
            continue
        elif CountryInfo[i] in japan:
            CountryInfo_arr.append("JAPAN")
            continue
        elif CountryInfo[i] in barzil:
            CountryInfo_arr.append("BRAZIL")
            continue
        elif CountryInfo[i] in france:
            CountryInfo_arr.append("FRANCE")
            continue
        elif CountryInfo[i] in spain:
            CountryInfo_arr.append("SPAIN")
            continue
        elif CountryInfo[i] in china:
            CountryInfo_arr.append("CHINA")
            continue
        elif CountryInfo[i] in italy:
            CountryInfo_arr.append("ITALY")
            continue
        CountryInfo_arr.append(CountryInfo[i])
    return (CountryInfo_arr)

# 用于识别单位
def IdentifyInstitute(authorAff):
        # 用于识别机构的词库,有的无法识别时需要更新词库
    org=["UNIVERSITY","COMPANY","INSTITUTET","COLLEGE","ACADEMY"]
    # 用“,”分割单位信息,再用org词库去识别机构
    string_list=[str.strip(i.upper()) for i in str.split(authorAff,",")]
    author_institutet=""
    for i in string_list:
        for j in org:
            if j in i:
                author_institutet=i
                break
        if author_institutet!="":
            break
    return(author_institutet)


# 识别一作的国别,机构
def FirstAuthorCountry(Affiliation,country_name):
    firstAuthorCountry = []
    firstAuthorInstitute=[]
    # 提取第一作者单位信息
    # count = 0
    for i in Affiliation:
        # count += 1
        # print(count)
        firstAuthorCountry_temp = ""
        firstAuthorInstitute_temp=""
        if i != [] and i[0] != []:
            # 一作单位信息
            firstAuthorAff = i[0][0].upper()
            firstAuthorInstitute_temp=IdentifyInstitute(firstAuthorAff)
            # 匹配国家
            for j in country_name:
                if j in firstAuthorAff:
                    firstAuthorCountry_temp = j
                    break
        firstAuthorCountry.append(firstAuthorCountry_temp)
        firstAuthorCountry=CombineCountry(firstAuthorCountry)
        firstAuthorInstitute.append(firstAuthorInstitute_temp)
    return ([firstAuthorCountry,firstAuthorInstitute])


# 根据email地址识别通讯作者
# 匹配邮箱地址,如果多个作者单位信息包含邮箱地址,则返回最后一位,
# 如果没有作者单位信息包含邮箱地址,也返回最后一位作者为通讯作者
import re
def IdentifyContactIndex(Affiliation,country_name):
    # 匹配邮箱地址正则表达式
    pattern = re.compile(r'\S+@\S+')
    # 保存通讯作者的下标
    contect_index_arr = []
    # count = 0
    for i in Affiliation:
        # count += 1
        # print(count, "\n")
        # 每篇论文所有作者单位信息
        contect_index_temp = []
        author_index = 0
        # flag=1表示存在作者单位包含邮箱,如果都不包含邮箱,则最后一位作者为通讯作者
        flag = 0
        # 每个作者单位信息
        for j in i:
            # 有的作者挂了多个单位,如果单位地址包含邮箱则返回作者下标
            for k in j:
                if len(pattern.findall(k)) > 0:
                    contect_index_temp.append(author_index)
                    flag = 1
                    break
            author_index += 1
        if flag == 0:
            contect_index_arr.append(len(i) - 1)
        else:
            # 存在多个作者有邮箱地址,取最后一位作者为通讯作者
            contect_index_arr.append(contect_index_temp[-1])
    return (contect_index_arr)


# 通讯作者国别
def ContectAuthorCountry(Affiliation,country_name):
    contectAuthorIndex = IdentifyContactIndex(Affiliation,country_name)
    # # 取存在邮箱地址最后一位作者为通讯作者,一般都是最后一位为通讯作者
    # connAuthorIndex=[i[-1] for i in connAuthorIndex]
    contectAuthorCountry = []
    contectAuthorInstitute=[]
    # 提取第一作者单位信息
    # count = 0
    for (i, j) in zip(Affiliation, contectAuthorIndex):
        # count += 1
        # print(count)
        contectAuthorCountry_temp = ""
        contectAuthorInstitute_temp=""
        if i != [] and i[j] != []:
            contectAuthorAff = i[j][0].upper()
            contectAuthorInstitute_temp=IdentifyInstitute(contectAuthorAff)
            # 匹配国家
            for k in country_name:
                if k in contectAuthorAff:
                    contectAuthorCountry_temp = k
                    break
        contectAuthorCountry.append(contectAuthorCountry_temp)
        contectAuthorCountry=CombineCountry(contectAuthorCountry)
        contectAuthorInstitute.append(contectAuthorInstitute_temp)
    return ([contectAuthorCountry,contectAuthorInstitute])


# 识别每个作者的国别,用于生成合作关系图
def EachAuthorCountry(Affiliation,country_name):
    authorCountry_arr = []
    # count = 0
    for i in Affiliation:
        # count += 1
        # print(count)
        authorCountry = []
        if i != []:
            # 遍历作者单位信息
            for j in i:
                if j==[]:
                    continue
                authorAff = j[0].upper()
                # 匹配国家
                authorCountry_temp = ""
                for k in country_name:
                    if k in authorAff:
                        authorCountry_temp = k
                        break
                if authorCountry_temp!="":
                    authorCountry.append(authorCountry_temp)
        authorCountry_arr.append(authorCountry)
    # 国家名规范化
    authorCountry_std = []
    for country in authorCountry_arr:
        authorCountry_std.append(CombineCountry(country))
    return (authorCountry_std)


# 作者国别合作连线
from itertools import combinations
# def CountryLink(Affiliation,country_name):
#     authorCountry=EachAuthorCountry(Affiliation,country_name)
#     countryLink = []
#     for i in authorCountry:
#         if len(np.unique(i)) > 1:
#             countryLink.extend(list(combinations(np.unique(i), r=2)))
#     countryLink=[[i[0],i[1]] for i in countryLink]
#     countryLink=pd.DataFrame(countryLink)
#     # countryLink.to_csv("countryLink.csv")
#     return (countryLink)

def CountryLink(EachAuthorCountryInfo):
    countryLink = []
    for i in EachAuthorCountryInfo:
        if len(np.unique(i)) > 1:
            countryLink.extend(list(combinations(np.unique(i), r=2)))
    countryLink=[[i[0],i[1]] for i in countryLink]
    countryLink=pd.DataFrame(countryLink)
    countryLink.to_csv("CountryLink.csv")
    return (countryLink)
# 论文的所有作者的国别拼接成字符串
def read_xml(path):
    tree = ET.parse(path)
    all = tree.findall("./")
    book = tree.findall("PubmedBookArticle")
    art = tree.findall("PubmedArticle")
    print("the numbers of article: ", len(art), "\n")
    print("the numbers of book: ", len(book), "\n")
    print("the numbers of all iterms: ", len(all), "\n")
    # 2018影响因子
    if2018=pd.read_csv("IF2018.csv").values
    pmid_arr = []
    articleTitle_arr = []
    articleAbstract_arr = []
    pubData_arr = []
    MESH_majorTerms_arr = []
    MESH_allTerms_arr = []
    jornalName_arr = []
    jornalNameAbbr_arr = []
    authorNameList_arr = []
    authorAff_arr = []
    citedList_arr = []
    grantInfoList_arr = []
    if2018_arr=[]
    count = 0
    for paper in art:
        pmid = "None"
        pubData = "None"
        MESH_majorTerms = []
        MESH_allTerms = []
        articleTitle = "None"
        articleAbstract = "None"
        jornalName = "None"
        jornalNameAbbr = "None"
        authorNameList = []
        citedList = []
        authorAff = []
        temp = paper.find("MedlineCitation")
        pmid = temp.find("PMID").text
        print(pmid)
        # RetractionOf=1时,文章被撤稿
        RetractionOf = 0
        # pubData 只保存了发表年,pubmed xml 文件中pubData有的只有年,有的只有年月,有的有年月日
        if temp.find("Article").find("Journal").find("JournalIssue").find(
                "PubDate") != None:
            if temp.find("Article").find("Journal").find("JournalIssue").find(
                    "PubDate").find("Year") != None:
                pubData = temp.find("Article").find("Journal").find(
                    "JournalIssue").find("PubDate").find("Year").text
            elif temp.find("Article").find("Journal").find(
                    "JournalIssue").find("PubDate").find(
                        "MedlineDate") != None:
                # The date of publication of the article will be found in <MedlineDate> when parsing for the separate fields is not possible.
                # i.e.,<MedlineDate>1998 Dec-1999 Jan</MedlineDate>, <MedlineDate>2000 Spring</MedlineDate>
                # from url: https://www.nlm.nih.gov/bsd/licensee/elements_descriptions.html#pubdate
                pubData = str.split(
                    temp.find("Article").find("Journal").find("JournalIssue")
                    .find("PubDate").find("MedlineDate").text, " ")[0]
        jornalName = temp.find("Article").find("Journal").find("Title").text
        # article title
        # pmid: 26623013
        # <ArticleTitle><i>Tripterygium</i> glycosides inhibit inflammatory mediators in the rat synovial RSC-364 cell line stimulated with interleukin-1β.</ArticleTitle>
        if temp.find("Article").find("ArticleTitle") != None:
            if temp.find("Article").find("ArticleTitle").find("i") != None:
                articleTitle = temp.find("Article").find("ArticleTitle").find(
                    "i").text + temp.find("Article").find("ArticleTitle").find(
                        "i").tail
            else:
                articleTitle = temp.find("Article").find("ArticleTitle").text
        # article abstract
        # pmid: 26623013
        # <AbstractText><i>Tripterygium</i> glycosides ***** </AbstractText>
        if temp.find("Article").find("Abstract") != None:
            for i in temp.find("Article").find("Abstract").findall(
                    "AbstractText"):
                if i.findall("./")!=[]:
                    for j in i.findall("./"):
                        if j.text !=None:
                            articleAbstract+=" "+j.text
                        if j.tail!=None:
                            articleAbstract+=j.tail
                elif i.text!=None:
                    articleAbstract += " " + i.text
            articleAbstract = str.strip(articleAbstract)
        # 期刊名缩写,没有ISOAbbreviation时就用Title
        if temp.find("Article").find("Journal").find(
                "ISOAbbreviation") != None:
            jornalNameAbbr = temp.find("Article").find("Journal").find(
                "ISOAbbreviation").text
        else:
            jornalNameAbbr = jornalName
        # author name, affiliationInfo
        if temp.find("Article").find("AuthorList") != None:
            authorList = temp.find("Article").find("AuthorList").findall(
                "Author")
            for i in authorList:
                name_temp = []
                if i.find("LastName") != None:
                    name_temp.append(i.find("LastName").text)
                if i.find("ForeName") != None:
                    name_temp.append(i.find("ForeName").text)
                if name_temp==[]:
                    authorNameList.append(["None"])
                else:
                    authorNameList.append(name_temp)
            authorAff = [[
                j.find("Affiliation").text
                for j in i.findall("AffiliationInfo")
            ] for i in authorList]
        # MESH terms
        if temp.find("MeshHeadingList") != None:
            for i in temp.find("MeshHeadingList").findall("MeshHeading"):
                # save MESH terms
                MESH_allTerms.append(i.find("DescriptorName").text)
                # save MESH major terms
                if i.find("DescriptorName").attrib['MajorTopicYN'] == "Y":
                    MESH_majorTerms.append(i.find("DescriptorName").text)
                # QualifierName 的属性MajorTopicYN=Y,则添加DescriptorName至 major terms
                elif i.find("QualifierName") != None:
                    for j in i.findall("QualifierName"):
                        if j.attrib['MajorTopicYN'] == "Y":
                            MESH_majorTerms.append(
                                i.find("DescriptorName").text)
                            break                   
        # grant
        grantInfoList = []
        if temp.find("Article").find("GrantList") != None:
            grantList = temp.find("Article").find("GrantList")
            count2 = 0
            for i in grantList.findall("Grant"):
                count2 += 1
                # print(count2, "\n")
                # GrantID, Agency, Country
                GrantID = ""
                Agency = ""
                Country = ""
                if i.find("GrantID") != None:
                    GrantID = i.find("GrantID").text
                if i.find("Agency") != None:
                    Agency = i.find("Agency").text
                if i.find("Country") != None:
                    Country = i.find("Country").text
                grantInfoList.append([GrantID, Agency, Country])
        # cites
        # 如果CommentsCorrections,RefType=RetractionOf,代表被撤稿,去除这篇文章
        if temp.find("CommentsCorrectionsList") != None:
            commentList = temp.find("CommentsCorrectionsList").findall(
                "CommentsCorrections")
            for i in commentList:
                # Cites lists items in the bibliography or list of references at the end of an article.
                if list(i.attrib.values())[0] == "RetractionOf":
                    RetractionOf = 1
                    break
                if list(i.attrib.values())[0] != "Cites":
                    continue
                citedList.append([
                    i.find("RefSource").text.split(".")[0],
                    i.find("PMID").text
                ])
            # citedList=[[i.find("RefSource").text.split(".")[0],i.find("PMID").text] for i in commentList]
        # 被撤稿,跳过
        if RetractionOf == 1:
            continue
        else:
            count += 1
            print(count, "\n")
        pmid_arr.append(pmid)
        pubData_arr.append(pubData)
        articleTitle_arr.append(articleTitle)
        articleAbstract_arr.append(articleAbstract)
        jornalName_arr.append(jornalName)
        jornalNameAbbr_arr.append(jornalNameAbbr)
        authorNameList_arr.append(authorNameList)
        authorAff_arr.append(authorAff)
        MESH_allTerms_arr.append(MESH_allTerms)
        MESH_majorTerms_arr.append(MESH_majorTerms)
        grantInfoList_arr.append(grantInfoList)
        citedList_arr.append(citedList)
    # 添加期刊的2018影响因子
    if2018_jornalName_upper=[i.upper() for i in if2018[:,0]]
    for i in jornalName_arr:
        flag=0
        for j in np.arange(len(if2018_jornalName_upper)):
            if if2018_jornalName_upper[j] == i.upper():
                if2018_arr.append(if2018[j,1])
                flag=1
                break
            elif if2018_jornalName_upper[j] in i.upper():
                if2018_arr.append(if2018[j,1])
                flag=1
                break
        if flag==0:
            if2018_arr.append("None")
    # 识别一作、通讯作者和所有作者的国别
    firstAuthorCountryInstitute = FirstAuthorCountry(authorAff_arr,
                                                     country_name)
    contectAuthorCountryInstitute = ContectAuthorCountry(
        authorAff_arr, country_name)
    eachAuthorCountry = EachAuthorCountry(authorAff_arr, country_name)
    #将所有作者的国别拼接成字符串
    eachAuthorCountry=[",".join(i) for i in eachAuthorCountry]
    return ([
        pmid_arr, jornalName_arr, pubData_arr, jornalNameAbbr_arr,
        articleTitle_arr, articleAbstract_arr, authorNameList_arr,
        authorAff_arr, MESH_allTerms_arr, MESH_majorTerms_arr,
        grantInfoList_arr, citedList_arr, firstAuthorCountryInstitute[0],
        firstAuthorCountryInstitute[1], contectAuthorCountryInstitute[0],
        contectAuthorCountryInstitute[1], eachAuthorCountry,if2018_arr
    ])
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值