使用Entrez下载文献

All_Will_Be_Fine噻

于 2021-12-22 11:35:52 发布

阅读量1.1k

点赞数

分类专栏： Biopython python 文章标签： Biopython python

本文链接：https://blog.csdn.net/jiangshandaiyou/article/details/122081978

版权

python 同时被 2 个专栏收录

39 篇文章

订阅专栏

Biopython

11 篇文章

订阅专栏

使用entrez下载文献

# 使用ESpell 纠正输入的search iterm正确与否
from Bio import Entrez

Entrez.email = "A.N.Other@example.com"
record = Entrez.read(Entrez.espell(term="biopytho00n"))

record["Query"]

'biopytho00n'

record["CorrectedQuery"]

'biopython'

后续我打算开启自动纠错，然后使用自动纠错提供的词组作为搜索的iterms

现在我们测试一下EGquery & Esearch & Efetch如何以一起使用

# In this example, we will query PubMed for all articles having to do with orchids
from Bio import Entrez

Entrez.email = "A.N.Other@example.com" # Always tell NCBI who you are
handle = Entrez.egquery(term="orchid") #check how many of such articles there are
record = Entrez.read(handle)

for row in record["eGQueryResult"]:
    if row["DbName"]=="pubmed":
        print(row["Count"])

# Now we use the Bio.Entrez.efetch function to download the PubMed IDs of these 463 articles
handle = Entrez.esearch(db="pubmed", term="orchid", retmax=463)
record = Entrez.read(handle)
handle.close()
idlist = record["IdList"] #This returns a Python list containing all of the PubMed IDs of articles related to orchids

# get the corresponding Medline records and extract the information from them
from Bio import Medline

handle = Entrez.efetch(db="pubmed", id=idlist, rettype="medline",retmode="text")

records = Medline.parse(handle)

records = list(records)
count = 0
while record in records and count < 5:
    print("title:", record.get("TI", "?"))
    print("authors:", record.get("AU", "?"))
    print("source:", record.get("SO", "?"))
    print("")
    count += 1

title: Unreduced Male Gamete Formation in Cymbidium and Its Use for Developing Sexual Polyploid Cultivars.
authors: ['Zeng RZ', 'Zhu J', 'Xu SY', 'Du GH', 'Guo HR', 'Chen J', 'Zhang ZS', 'Xie L']
source: Front Plant Sci. 2020 May 15;11:558. doi: 10.3389/fpls.2020.00558. eCollection 2020.

title: Unreduced Male Gamete Formation in Cymbidium and Its Use for Developing Sexual Polyploid Cultivars.
authors: ['Zeng RZ', 'Zhu J', 'Xu SY', 'Du GH', 'Guo HR', 'Chen J', 'Zhang ZS', 'Xie L']
source: Front Plant Sci. 2020 May 15;11:558. doi: 10.3389/fpls.2020.00558. eCollection 2020.

title: Unreduced Male Gamete Formation in Cymbidium and Its Use for Developing Sexual Polyploid Cultivars.
authors: ['Zeng RZ', 'Zhu J', 'Xu SY', 'Du GH', 'Guo HR', 'Chen J', 'Zhang ZS', 'Xie L']
source: Front Plant Sci. 2020 May 15;11:558. doi: 10.3389/fpls.2020.00558. eCollection 2020.

title: Unreduced Male Gamete Formation in Cymbidium and Its Use for Developing Sexual Polyploid Cultivars.
authors: ['Zeng RZ', 'Zhu J', 'Xu SY', 'Du GH', 'Guo HR', 'Chen J', 'Zhang ZS', 'Xie L']
source: Front Plant Sci. 2020 May 15;11:558. doi: 10.3389/fpls.2020.00558. eCollection 2020.

title: Unreduced Male Gamete Formation in Cymbidium and Its Use for Developing Sexual Polyploid Cultivars.
authors: ['Zeng RZ', 'Zhu J', 'Xu SY', 'Du GH', 'Guo HR', 'Chen J', 'Zhang ZS', 'Xie L']
source: Front Plant Sci. 2020 May 15;11:558. doi: 10.3389/fpls.2020.00558. eCollection 2020.

建议使用搜索历史记录下载文献，官网也是这样推荐的，不然会给服务器带来压力

# In this example, we will query PubMed for all articles having to do with orchids
from Bio import Entrez

Entrez.email = "A.N.Other@example.com" # Always tell NCBI who you are
handle = Entrez.egquery(term="orchid") #check how many of such articles there are
record = Entrez.read(handle)

for row in record["eGQueryResult"]:
    if row["DbName"]=="pubmed":
        print(row["Count"])

# search for iterm  using the history
handle = Entrez.esearch(db="pubmed", term="orchid", retmax=463,usehistory="y")
record = Entrez.read(handle)
handle.close()
idlist = record["IdList"] #This returns a Python list containing all of the PubMed IDs of articles related to orchids

len(idlist)

webenv = record["WebEnv"]
webenv

'MCID_61c18a283797170eda43b3b0'

query_key = record["QueryKey"]
query_key

'1'

# Now we use the Bio.Entrez.efetch function to download the PubMed IDs of these 463 articles
# get the corresponding Medline records and extract the information from them
from Bio import Medline

handle = Entrez.efetch(db="pubmed", id=idlist, rettype="medline",retmode="text",webenv=record["WebEnv"],query_key=record["QueryKey"])

records = Medline.parse(handle)

records = list(records)
count = 0
for record in records:
    if count < 3:
        print("title:", record.get("TI", "?"))
        print("authors:", record.get("AU", "?"))
        print("source:", record.get("SO", "?"))
        print("")
        count += 1
    else:
        break

title: Comparative analysis of Phytophthora genomes data.
authors: ['Gao RF', 'Wang JY', 'Liu KW', 'Wang ZW', 'Zhang D', 'Zhao X', 'Zhong WY', 'Tsai WC', 'Liu ZJ', 'Zhang GM']
source: Data Brief. 2021 Dec 2;39:107663. doi: 10.1016/j.dib.2021.107663. eCollection 2021 Dec.

title: Floral organ-specific proteome profiling of the floral ornamental orchid (Cymbidium goeringii) reveals candidate proteins related to floral organ development.
authors: ['Chen Y', 'Xu Z', 'Shen Q', 'Sun C']
source: Bot Stud. 2021 Dec 18;62(1):23. doi: 10.1186/s40529-021-00330-9.

title: Genomic landscape of a relict fir-associated fungus reveals rapid convergent adaptation towards endophytism.
authors: ['Yuan Z', 'Wu Q', 'Xu L', 'Druzhinina IS', 'Stukenbrock EH', 'Nieuwenhuis BPS', 'Zhong Z', 'Liu ZJ', 'Wang X', 'Cai F', 'Kubicek CP', 'Shan X', 'Wang J', 'Shi G', 'Peng L', 'Martin FM']
source: ISME J. 2021 Dec 16. pii: 10.1038/s41396-021-01176-6. doi: 10.1038/s41396-021-01176-6.

将上述文献的medline格式文档分批次下载并保存在本地

from Bio import Entrez

Entrez.email = "A.N.Other@example.com" # Always tell NCBI who you are
# search for iterm  using the history
handle = Entrez.esearch(db="pubmed", term="orchid", retmax=463,usehistory="y")
record = Entrez.read(handle)
handle.close()
idlist = record["IdList"] #This returns a Python list containing all of the PubMed IDs of articles related to orchids

count = int(len(idlist)) #找到了count个相关文献

# Now we use the Bio.Entrez.efetch function to download the PubMed IDs of these 463 articles
# get the corresponding Medline records and extract the information from them
from Bio import Medline

batch_size = 10
out_handle = open("recent_orchid_papers.txt","w")

for start in range(0,count,batch_size):
    end = min(count,start+batch_size)
    print("Going to download record %i to %i" %(start+1,end))
    
    handle = Entrez.efetch(db="pubmed", id=idlist, rettype="medline",retmode="text",retstart=start,retmax=batch_size,
                           webenv=record["WebEnv"],query_key=record["QueryKey"])
    data = handle.read()
    handle.close()
    out_handle.write(data)
    
out_handle.close()

Going to download record 1 to 10
Going to download record 11 to 20
Going to download record 21 to 30
Going to download record 31 to 40
Going to download record 41 to 50
Going to download record 51 to 60
Going to download record 61 to 70
Going to download record 71 to 80
Going to download record 81 to 90
Going to download record 91 to 100
Going to download record 101 to 110
Going to download record 111 to 120
Going to download record 121 to 130
Going to download record 131 to 140
Going to download record 141 to 150
Going to download record 151 to 160
Going to download record 161 to 170
Going to download record 171 to 180
Going to download record 181 to 190
Going to download record 191 to 200
Going to download record 201 to 210
Going to download record 211 to 220
Going to download record 221 to 230
Going to download record 231 to 240
Going to download record 241 to 250
Going to download record 251 to 260
Going to download record 261 to 270
Going to download record 271 to 280
Going to download record 281 to 290
Going to download record 291 to 300
Going to download record 301 to 310
Going to download record 311 to 320
Going to download record 321 to 330
Going to download record 331 to 340
Going to download record 341 to 350
Going to download record 351 to 360
Going to download record 361 to 370
Going to download record 371 to 380
Going to download record 381 to 390
Going to download record 391 to 400
Going to download record 401 to 410
Going to download record 411 to 420
Going to download record 421 to 430
Going to download record 431 to 440
Going to download record 441 to 450
Going to download record 451 to 460
Going to download record 461 to 463

def correct_query(Input_query):
    Entrez.email = "A.N.Other@example.com"
    record = Entrez.read(Entrez.espell(term=Input_query))
    iterm = record["CorrectedQuery"]
    return iterm

def count_query(iterm)
    # In this example, we will query PubMed for all articles having to do with orchids
    handle = Entrez.egquery(term=iterm) #check how many of such articles there are
    record = Entrez.read(handle)

    for row in record["eGQueryResult"]:
        if row["DbName"]=="pubmed":
            print(row["Count"])

def search_query(iterm):
    # search for iterm  using the history
    handle = Entrez.esearch(db="pubmed", term=iterm, retmax=463,usehistory="y")
    record = Entrez.read(handle)
    handle.close()
    idlist = record["IdList"] #This returns a Python list containing all of the PubMed IDs of articles related to orchids

    count = int(len(idlist)) #找到了count个相关文献
    return idlist,count,record

def batch_fetch(idlist,count,record):
    # Now we use the Bio.Entrez.efetch function to download the PubMed IDs of these 463 articles
    # get the corresponding Medline records and extract the information from them

    batch_size = 10
    out_handle = open("recent_orchid_papers.txt","w")

    for start in range(0,count,batch_size):
        end = min(count,start+batch_size)
        print("Going to download record %i to %i" %(start+1,end))

        handle = Entrez.efetch(db="pubmed", id=idlist, rettype="medline",retmode="text",retstart=start,retmax=batch_size,
                               webenv=record["WebEnv"],query_key=record["QueryKey"])
        data = handle.read()
        handle.close()
        out_handle.write(data)

    out_handle.close()
    
if __name__ == "__main__":
    from Bio import Entrez
    from Bio import Medline

    try:
        Input_query = str(input("输入query"))
        iterm = correct_query(Input_query)
        count_query(iterm)
        idlist,count,record = search_query(iterm)
        batch_fetch(idlist,count,record)
    except:
        print("erro")

esearch或者egquery搜索term时的检索模式

ESearch参数

baseURL

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi
db

Database to search. Value must be a valid Entrez database name (default = pubmed).
term

Entrez text query. All special characters must be URL encoded. Spaces may be replaced by ‘+’ signs. For very long queries (more than several hundred characters long), consider using an HTTP POST call.
usehistory

When usehistory is set to ‘y’, ESearch will post the UIDs resulting from the search operation onto the History server so that they can be used directly in a subsequent E-utility call. Also, usehistory must be set to ‘y’ for ESearch to interpret query key values included in term or to accept a WebEnv as input.
WebEnv

Web environment string returned from a previous ESearch, EPost or ELink call. When provided, ESearch will post the results of the search operation to this pre-existing WebEnv, thereby appending the results to the existing environment. In addition, providing WebEnv allows query keys to be used in term so that previous search sets can be combined or limited. As described above, if WebEnv is used, usehistory must be set to ‘y’.
query_key

Integer query key returned by a previous ESearch, EPost or ELink call. When provided, ESearch will find the intersection of the set specified by query_key and the set retrieved by the query in term.For query_key to function, WebEnv must be assigned an existing WebEnv string and usehistory must be set to ‘y’.
retstart

Sequential index of the first UID in the retrieved set to be shown in the XML output (default=0, corresponding to the first record of the entire set). This parameter can be used in conjunction with retmax to download an arbitrary subset of UIDs retrieved from a search.
retmax

Total number of UIDs from the retrieved set to be shown in the XML output (default=20). By default, ESearch only includes the first 20 UIDs retrieved in the XML output. If usehistory is set to ‘y’, the remainder of the retrieved set will be stored on the History server; otherwise these UIDs are lost. Increasing retmax allows more of the retrieved UIDs to be included in the XML output, up to a maximum of 100,000 records. To retrieve more than 100,000 UIDs, submit multiple esearch requests while incrementing the value of retstart (see Application 3).
rettype

Retrieval type. There are two allowed values for ESearch: ‘uilist’ (default), which displays the standard XML output, and ‘count’, which displays only the tag.
retmode

Retrieval type. Determines the format of the returned output. The default value is ‘xml’ for ESearch XML, but ‘json’ is also supported to return output in JSON format.
sort

Specifies the method used to sort UIDs in the ESearch output. The available values vary by database (db) and may be found in the Display Settings menu on an Entrez search results page. If usehistory is set to ‘y’, the UIDs are loaded onto the History Server in the specified sort order and will be retrieved in that order by ESummary or EFetch. Example values are ‘relevance’ and ‘name’ for Gene and ‘first+author’ and ‘pub+date’ for PubMed. Users should be aware that the default value of sort varies from one database to another, and that the default value used by ESearch for a given database may differ from that used on NCBI web search pages.
field

Search field. If used, the entire search term will be limited to the specified Entrez field. The following two URLs are equivalent:

esearch.fcgi?db=pubmed&term=asthma&field=title

esearch.fcgi?db=pubmed&term=asthma[title]
idtype

Specifies the type of identifier to return for sequence databases (nuccore, nucest, nucgss, popset, protein). By default, ESearch returns GI numbers in its output. If idtype is set to ‘acc’, ESearch will return accession.version identifiers rather than GI numbers.
datetype

Type of date used to limit a search. The allowed values vary between Entrez databases, but common values are ‘mdat’ (modification date), ‘pdat’ (publication date) and ‘edat’ (Entrez date). Generally an Entrez database will have only two allowed values for datetype.
reldate

When reldate is set to an integer n, the search returns only those items that have a date specified by datetype within the last n days.
mindate, maxdate

Date range used to limit a search result by the date specified by datetype. These two parameters (mindate, maxdate) must be used together to specify an arbitrary date range. The general date format is YYYY/MM/DD, and these variants are also allowed: YYYY, YYYY/MM.

from Bio import Entrez

Entrez.email = "A.N.Other@example.com" # Always tell NCBI who you are
# search for iterm  using the history
handle = Entrez.esearch(db="pubmed", term="PTB AND PSF", retmax=463,usehistory="y")
record = Entrez.read(handle)
handle.close()
idlist = record["IdList"] #This returns a Python list containing all of the PubMed IDs of articles related to orchids

len(idlist)

from Bio import Medline

handle = Entrez.efetch(db="pubmed", id=idlist, rettype="medline",retmode="text",webenv=record["WebEnv"],query_key=record["QueryKey"])

records = Medline.parse(handle)

records = list(records)
count = 0
for record in records:
    if count < 3:
        print("title:", record.get("TI", "?"))
        print("authors:", record.get("AU", "?"))
        print("source:", record.get("SO", "?"))
        print("abstract",record.get("AB","?"))
        print("")
        count += 1
    else:
        break

title: Annexin A2 binds the internal ribosomal entry site of c-myc mRNA and regulates its translation.
authors: ['Strand E', 'Hollas H', 'Sakya SA', 'Romanyuk S', 'Saraste MEV', 'Grindheim AK', 'Patil SS', 'Vedeler A']
source: RNA Biol. 2021 Oct 15;18(sup1):337-354. doi: 10.1080/15476286.2021.1947648. Epub 2021 Aug 4.
abstract The expression and localization of the oncoprotein c-Myc is highly regulated at the level of transcription, mRNA transport, translation, as well as stability of the protein. We previously showed that Annexin A2 (AnxA2) binds to a specific localization element in the 3'untranslated region (UTR) of c-myc mRNA and is involved in its localization to the perinuclear region. In the present study, we demonstrate that AnxA2 binds in a Ca(2+)-dependent manner to the internal ribosomal entry site (IRES) containing two pseudo-knots in the 5 UTR of the c-myc mRNA. Here, we employ an in vitro rabbit reticulocyte lysate system with chimeric c-myc reporter mRNAs to demonstrate that binding of AnxA2 to the c-myc IRES modulates the expression of c-Myc. Notably, we show that low levels of AnxA2 appear to increase, while high levels of AnxA2 inhibits translation of the chimeric mRNA. However, when both the AnxA2-binding site and the ribosomal docking site in the c-myc IRES are deleted, AnxA2 has no effect on the translation of the reporter mRNA. Forskolin-treatment of PC12 cells results in upregulation of Ser25 phosphorylated AnxA2 expression while c-Myc expression is down-regulated. The effect of forskolin on c-Myc expression and the level of Ser25 phosphorylated AnxA2 was abolished in the presence of EGTA. These findings indicate that AnxA2 regulates both the transport and subsequent translation of the c-myc mRNA, possibly by silencing the mRNA during its transport. They also suggest that AnxA2 act as a switch to turn off the c-myc IRES activity in the presence of calcium.Abbreviations: AnxA2, Annexin A2; beta2--microglob, beta2-microglobulin; cpm, counts per minute; hnRNP, heterogenous nuclear ribonucleoprotein; IRES, internal ribosomal entry site; ITAF, IRES trans-acting factor; MM, multiple myeloma; PABP, poly(A)-binding protein; PCBP, poly(rC) binding protein; PSF, PTB-associated splicing factor; PTB, polypyrimidine tract binding protein; RRL, rabbit reticulocyte lysate; UTR, untranslated region; YB, Y-box binding protein.

title: Targeting Epigenetic and Posttranscriptional Gene Regulation by PSF Impairs Hormone Therapy-Refractory Cancer Growth.
authors: ['Takayama KI', 'Honma T', 'Suzuki T', 'Kondoh Y', 'Osada H', 'Suzuki Y', 'Yoshida M', 'Inoue S']
source: Cancer Res. 2021 Jul 1;81(13):3495-3508. doi: 10.1158/0008-5472.CAN-20-3819. Epub 2021 May 11.
abstract RNA-binding protein PSF functions as an epigenetic modifier by interacting with long noncoding RNAs and the corepressor complex. PSF also promotes RNA splicing events to enhance oncogenic signals. In this study, we conducted an in vitro chemical array screen and identified multiple small molecules that interact with PSF. Several molecules inhibited RNA binding by PSF and decreased prostate cancer cell viability. Among these molecules and its derivatives was a promising molecule, No. 10-3 [7,8-dihydroxy-4-(4-methoxyphenyl)chromen-2-one], that was the most effective at blocking PSF RNA-binding ability and suppressing treatment-resistant prostate and breast cancer cell proliferation. Exposure to No. 10-3 inhibited PSF target gene expression at the mRNA level. Treatment with No. 10-3 reversed epigenetically repressed PSF downstream targets, such as cell-cycle inhibitors, at the transcriptional level. Chromatin immunoprecipitation sequencing in prostate cancer cells revealed that No. 10-3 enhances histone acetylation to induce expression of apoptosis as well as cell-cycle inhibitors. Furthermore, No. 10-3 exhibited antitumor efficacy in a hormone therapy-resistant prostate cancer xenograft mouse model, suppressing treatment-resistant tumor growth. Taken together, this study highlights the feasibility of targeting PSF-mediated epigenetic and RNA-splicing activities for the treatment of aggressive cancers. SIGNIFICANCE: This study identifies small molecules that target PSF-RNA interactions and suppress hormone therapy-refractory cancer growth, suggesting the potential of targeting PSF-mediated gene regulation for cancer treatment.

title: PSF Promotes ER-Positive Breast Cancer Progression via Posttranscriptional Regulation of ESR1 and SCFD2.
authors: ['Mitobe Y', 'Iino K', 'Takayama KI', 'Ikeda K', 'Suzuki T', 'Aogi K', 'Kawabata H', 'Suzuki Y', 'Horie-Inoue K', 'Inoue S']
source: Cancer Res. 2020 Jun 1;80(11):2230-2242. doi: 10.1158/0008-5472.CAN-19-3095. Epub 2020 Mar 25.
abstract Endocrine therapy is standard treatment for estrogen receptor (ER)-positive breast cancer, yet long-term treatment often causes acquired resistance, which results in recurrence and metastasis. Recent studies have revealed that RNA-binding proteins (RBP) are involved in tumorigenesis. Here, we demonstrate that PSF/SFPQ is an RBP that potentially predicts poor prognosis of patients with ER-positive breast cancer by posttranscriptionally regulating ERalpha (ESR1) mRNA expression. Strong PSF immunoreactivity correlated with shorter overall survival in patients with ER-positive breast cancer. PSF was predominantly expressed in a model of tamoxifen-resistant breast cancer cells, and depletion of PSF attenuated proliferation of cultured cells and xenografted tumors. PSF expression was significantly associated with estrogen signaling. PSF siRNA downregulated ESR1 mRNA by inhibiting nuclear export of the RNA. Integrative analyses of microarray and RNA immunoprecipitation sequencing also identified SCFD2, TRA2B, and ASPM as targets of PSF. Among the PSF targets, SCFD2 was a poor prognostic indicator of breast cancer and SCFD2 knockdown significantly suppressed breast cancer cell proliferation. Collectively, this study shows that PSF plays a pathophysiologic role in ER-positive breast cancer by posttranscriptionally regulating expression of its target genes such as ESR1 and SCFD2. Overall, PSF and SCFD2 could be potential diagnostic and therapeutic targets for primary and hormone-refractory breast cancers. SIGNIFICANCE: This study defines oncogenic roles of RNA-binding protein PSF, which exhibits posttranscriptional regulation in ER-positive breast cancer.

Entrez.email = "A.N.Other@example.com" # Always tell NCBI who you are
# search for iterm  using the history
handle = Entrez.esearch(db="pubmed", term="U2AF AND PSF", retmax=463,usehistory="y")
record = Entrez.read(handle)
handle.close()
idlist = record["IdList"] #This returns a Python list containing all of the PubMed IDs of articles related to orchids

len(idlist)

handle = Entrez.efetch(db="pubmed", id=idlist, rettype="medline",retmode="text",webenv=record["WebEnv"],query_key=record["QueryKey"])

records = Medline.parse(handle)

records = list(records)
count = 0
for record in records:
    if count < 3:
        print("title:", record.get("TI", "?"))
        print("authors:", record.get("AU", "?"))
        print("source:", record.get("SO", "?"))
        print("abstract",record.get("AB","?"))
        print("")
        count += 1
    else:
        break

title: Triplex DNA-binding proteins are associated with clinical outcomes revealed by proteomic measurements in patients with colorectal cancer.
authors: ['Nelson LD', 'Bender C', 'Mannsperger H', 'Buergy D', 'Kambakamba P', 'Mudduluru G', 'Korf U', 'Hughes D', 'Van Dyke MW', 'Allgayer H']
source: Mol Cancer. 2012 Jun 8;11:38. doi: 10.1186/1476-4598-11-38.
abstract BACKGROUND: Tri- and tetra-nucleotide repeats in mammalian genomes can induce formation of alternative non-B DNA structures such as triplexes and guanine (G)-quadruplexes. These structures can induce mutagenesis, chromosomal translocations and genomic instability. We wanted to determine if proteins that bind triplex DNA structures are quantitatively or qualitatively different between colorectal tumor and adjacent normal tissue and if this binding activity correlates with patient clinical characteristics. METHODS: Extracts from 63 human colorectal tumor and adjacent normal tissues were examined by gel shifts (EMSA) for triplex DNA-binding proteins, which were correlated with clinicopathological tumor characteristics using the Mann-Whitney U, Spearman's rho, Kaplan-Meier and Mantel-Cox log-rank tests. Biotinylated triplex DNA and streptavidin agarose affinity binding were used to purify triplex-binding proteins in RKO cells. Western blotting and reverse-phase protein array were used to measure protein expression in tissue extracts. RESULTS: Increased triplex DNA-binding activity in tumor extracts correlated significantly with lymphatic disease, metastasis, and reduced overall survival. We identified three multifunctional splicing factors with biotinylated triplex DNA affinity: U2AF65 in cytoplasmic extracts, and PSF and p54nrb in nuclear extracts. Super-shift EMSA with anti-U2AF65 antibodies produced a shifted band of the major EMSA H3 complex, identifying U2AF65 as the protein present in the major EMSA band. U2AF65 expression correlated significantly with EMSA H3 values in all extracts and was higher in extracts from Stage III/IV vs. Stage I/II colon tumors (p=0.024). EMSA H3 values and U2AF65 expression also correlated significantly with GSK3 beta, beta-catenin, and NF- B p65 expression, whereas p54nrb and PSF expression correlated with c-Myc, cyclin D1, and CDK4. EMSA values and expression of all three splicing factors correlated with ErbB1, mTOR, PTEN, and Stat5. Western blots confirmed that full-length and truncated beta-catenin expression correlated with U2AF65 expression in tumor extracts. CONCLUSIONS: Increased triplex DNA-binding activity in vitro correlates with lymph node disease, metastasis, and reduced overall survival in colorectal cancer, and increased U2AF65 expression is associated with total and truncated beta-catenin expression in high-stage colorectal tumors.

title: Consensus PP1 binding motifs regulate transcriptional corepression and alternative RNA splicing activities of the steroid receptor coregulators, p54nrb and PSF.
authors: ['Liu L', 'Xie N', 'Rennie P', 'Challis JR', 'Gleave M', 'Lye SJ', 'Dong X']
source: Mol Endocrinol. 2011 Jul;25(7):1197-210. doi: 10.1210/me.2010-0517. Epub 2011 May 12.
abstract Originally identified as essential pre-mRNA splicing factors, non-POU-domain-containing, octamer binding protein (p54nrb) and PTB-associated RNA splicing factor (PSF) are also steroid receptor corepressors. The mechanisms by which p54nrb and PSF regulate gene transcription remain unclear. Both p54nrb and PSF contain protein phosphatase 1 (PP1) consensus binding RVxF motifs, suggesting that PP1 may regulate phosphorylation status of p54nrb and PSF and thus their function in gene transcription. In this report, we demonstrated that PP1 forms a protein complex with both p54nrb and PSF. PP1 interacts directly with the RVxF motif only in p54nrb, but not in PSF. Association with PP1 results in dephosphorylation of both p54nrb and PSF in vivo and the loss of their transcriptional corepressor activities. Using the CD44 minigene as a reporter, we showed that PP1 regulates p54nrb and PSF alternative splicing activities that determine exon skipping vs. inclusion in the final mature RNA for translation. In addition, changes in transcriptional corepression and RNA splicing activities of p54nrb and PSF are correlated with alterations in protein interactions of p54nrb and PSF with transcriptional corepressors such as Sin3A and histone deacetylase 1, and RNA splicing factors such as U1A and U2AF. Furthermore, we demonstrated a novel function of the RVxF motif within PSF that enhances its corepression and RNA splicing activities independent of PP1. We conclude that the RVxF motifs play an important role in controlling the multifunctional properties of p54nrb and PSF in the regulation of gene transcription.

title: Reorganization of nuclear factors during myeloid differentiation.
authors: ['Shav-Tal Y', 'Lee BC', 'Bar-Haim S', 'Schori H', 'Zipori D']
source: J Cell Biochem. 2001;81(3):379-92. doi: 10.1002/1097-4644(20010601)81:3<379::aid-jcb1052>3.0.co;2-8.
abstract Differentiation in several stem cell systems is associated with major morphological changes in global nuclear shape. We studied the fate of inner-nuclear structures, splicing factor-rich foci and Cajal (coiled) bodies in differentiating hemopoietic, testis and skin tissues. Using antibodies to the splicing factors PSF, U2AF(65) and snRNPs we find that these proteins localize in foci throughout the nuclei of immature bone marrow cells. Yet, when granulocytic cells differentiate and their nuclei condense and become segmented, the staining localizes in a unique compact and thread-like structure. The splicing factor-rich foci concentrate in the interior of these nuclei while the nuclear periphery and areas of highly compact chromatin remain devoid of these molecules. Differentiated myeloid cells do not stain for p80 coilin, the marker for Cajal bodies. Immature myeloid cells contain Cajal bodies although these usually do not coloclaize with PSF-rich foci. Following complete inhibition of transcription in myeloid cells, the threaded PSF pattern becomes localized in several foci in the different lobes of mature granulocytes while in human HL-60 immature myeloid leukemia cells PSF is found in the perinucleolar compartment. Studies of other differentiating stem cell systems show that PSF staining disappears completely in differentiated, transcriptionally inactive sperm cells, is scarce as cells migrate from the inner skin layers outward and is lost as cells of the hair follicle mature. We conclude that the formation and distribution of splicing factor-rich foci in the nucleus during differentiation of various cell lineages is dependent on the levels of chromatin condensation and the differentiation status of the cell.

使用field限定搜索范围

Entrez.email = "A.N.Other@example.com" # Always tell NCBI who you are
# search for iterm  using the history
handle = Entrez.esearch(db="pubmed", term="PTB AND PSF", retmax=463,usehistory="y",field="title")
record = Entrez.read(handle)
handle.close()
idlist = record["IdList"] #This returns a Python list containing all of the PubMed IDs of articles related to orchids

len(idlist)

Entrez.email = "A.N.Other@example.com" # Always tell NCBI who you are
# search for iterm  using the history
handle = Entrez.esearch(db="pubmed", term="PTB AND PSF", retmax=463,usehistory="y",field="abstract")
record = Entrez.read(handle)
handle.close()
idlist = record["IdList"] #This returns a Python list containing all of the PubMed IDs of articles related to orchids

len(idlist)

使用日期限定搜索范围

Entrez.email = "A.N.Other@example.com" # Always tell NCBI who you are
# search for iterm  using the history
handle = Entrez.esearch(db="pubmed", term="PTB AND PSF", retmax=463,usehistory="y",field="abstract",datetype="pdat",reldate="180")
record = Entrez.read(handle)
handle.close()
idlist = record["IdList"] #This returns a Python list containing all of the PubMed IDs of articles related to orchids

len(idlist)

handle = Entrez.efetch(db="pubmed", id=idlist, rettype="medline",retmode="text",webenv=record["WebEnv"],query_key=record["QueryKey"])

records = Medline.parse(handle)

records = list(records)
count = 0
for record in records:
    if count < 3:
        print("title:", record.get("TI", "?"))
        print("authors:", record.get("AU", "?"))
        print("source:", record.get("SO", "?"))
        print("abstract",record.get("AB","?"))
        print("")
        count += 1
    else:
        break

title: Annexin A2 binds the internal ribosomal entry site of c-myc mRNA and regulates its translation.
authors: ['Strand E', 'Hollas H', 'Sakya SA', 'Romanyuk S', 'Saraste MEV', 'Grindheim AK', 'Patil SS', 'Vedeler A']
source: RNA Biol. 2021 Oct 15;18(sup1):337-354. doi: 10.1080/15476286.2021.1947648. Epub 2021 Aug 4.
abstract The expression and localization of the oncoprotein c-Myc is highly regulated at the level of transcription, mRNA transport, translation, as well as stability of the protein. We previously showed that Annexin A2 (AnxA2) binds to a specific localization element in the 3'untranslated region (UTR) of c-myc mRNA and is involved in its localization to the perinuclear region. In the present study, we demonstrate that AnxA2 binds in a Ca(2+)-dependent manner to the internal ribosomal entry site (IRES) containing two pseudo-knots in the 5 UTR of the c-myc mRNA. Here, we employ an in vitro rabbit reticulocyte lysate system with chimeric c-myc reporter mRNAs to demonstrate that binding of AnxA2 to the c-myc IRES modulates the expression of c-Myc. Notably, we show that low levels of AnxA2 appear to increase, while high levels of AnxA2 inhibits translation of the chimeric mRNA. However, when both the AnxA2-binding site and the ribosomal docking site in the c-myc IRES are deleted, AnxA2 has no effect on the translation of the reporter mRNA. Forskolin-treatment of PC12 cells results in upregulation of Ser25 phosphorylated AnxA2 expression while c-Myc expression is down-regulated. The effect of forskolin on c-Myc expression and the level of Ser25 phosphorylated AnxA2 was abolished in the presence of EGTA. These findings indicate that AnxA2 regulates both the transport and subsequent translation of the c-myc mRNA, possibly by silencing the mRNA during its transport. They also suggest that AnxA2 act as a switch to turn off the c-myc IRES activity in the presence of calcium.Abbreviations: AnxA2, Annexin A2; beta2--microglob, beta2-microglobulin; cpm, counts per minute; hnRNP, heterogenous nuclear ribonucleoprotein; IRES, internal ribosomal entry site; ITAF, IRES trans-acting factor; MM, multiple myeloma; PABP, poly(A)-binding protein; PCBP, poly(rC) binding protein; PSF, PTB-associated splicing factor; PTB, polypyrimidine tract binding protein; RRL, rabbit reticulocyte lysate; UTR, untranslated region; YB, Y-box binding protein.

title: Targeting Epigenetic and Posttranscriptional Gene Regulation by PSF Impairs Hormone Therapy-Refractory Cancer Growth.
authors: ['Takayama KI', 'Honma T', 'Suzuki T', 'Kondoh Y', 'Osada H', 'Suzuki Y', 'Yoshida M', 'Inoue S']
source: Cancer Res. 2021 Jul 1;81(13):3495-3508. doi: 10.1158/0008-5472.CAN-20-3819. Epub 2021 May 11.
abstract RNA-binding protein PSF functions as an epigenetic modifier by interacting with long noncoding RNAs and the corepressor complex. PSF also promotes RNA splicing events to enhance oncogenic signals. In this study, we conducted an in vitro chemical array screen and identified multiple small molecules that interact with PSF. Several molecules inhibited RNA binding by PSF and decreased prostate cancer cell viability. Among these molecules and its derivatives was a promising molecule, No. 10-3 [7,8-dihydroxy-4-(4-methoxyphenyl)chromen-2-one], that was the most effective at blocking PSF RNA-binding ability and suppressing treatment-resistant prostate and breast cancer cell proliferation. Exposure to No. 10-3 inhibited PSF target gene expression at the mRNA level. Treatment with No. 10-3 reversed epigenetically repressed PSF downstream targets, such as cell-cycle inhibitors, at the transcriptional level. Chromatin immunoprecipitation sequencing in prostate cancer cells revealed that No. 10-3 enhances histone acetylation to induce expression of apoptosis as well as cell-cycle inhibitors. Furthermore, No. 10-3 exhibited antitumor efficacy in a hormone therapy-resistant prostate cancer xenograft mouse model, suppressing treatment-resistant tumor growth. Taken together, this study highlights the feasibility of targeting PSF-mediated epigenetic and RNA-splicing activities for the treatment of aggressive cancers. SIGNIFICANCE: This study identifies small molecules that target PSF-RNA interactions and suppress hormone therapy-refractory cancer growth, suggesting the potential of targeting PSF-mediated gene regulation for cancer treatment.

Entrez.email = "A.N.Other@example.com" # Always tell NCBI who you are
# search for iterm  using the history
handle = Entrez.esearch(db="pubmed", term="PTB AND PSF", retmax=463,usehistory="y",field="abstract",
                        datetype="pdat",mindate="2000/01/01",maxdate="2021/12/22")
record = Entrez.read(handle)
handle.close()
idlist = record["IdList"] #This returns a Python list containing all of the PubMed IDs of articles related to orchids

len(idlist)