利用R处理PDF文件

pdftools

pdftools是一个专门用来处理pdf文件的包 pdftools

pdf_text()

pdf_text()#将pdf每页返回成(return)成一个character vector.
> #举个例子
> a <- pdf_text("41375_2012_BFleu2012127_MOESM29_ESM.pdf")
> #查看pdf页数
> length(a)
[1] 23
> #看看第一页,不过好像不能读取公式
> a[1]

pdf文件
R读取文件
接着按照自己的需求提取pdf中的信息就好了~

#还是举个例子,我想提取第十页的gene symbol
> b <- pdf_text("41375_2012_BFleu2012127_MOESM29_ESM.pdf")
> #查看pdf页数
> length(b)
[1] 23
> #看看第一页,不过好像不能读取公式
> b[1]
[1] "                                      SUPPLEMENTAL METHODS\n\nPatients in training dataset\nThe HOVON-65/GMMG-HD4 randomized clinical trial (ISRCTN64455289) consists of newly diagnosed,\ntransplant-eligible patients with multiple myeloma. Patients were randomly assigned to either bortezomib based\ntreatment or vincristine based treatment. Vincristine based treatment: three cycles of induction treatment with\nvincristine 0.4 mg intravenously on days 1-4, doxorubicin 9 mg/m² intravenously on days 1-4, and dexamethasone\n40 mg orally on days 1-4, 9-12, and 17-20; bortezomib based treatment: bortezomib 1.3 mg/m² intravenously on\ndays 1, 4, 8, and 11, doxorubicin 9 mg/m² intravenously on days 1-4, and dexamethasone 40 mg orally on days 1-4,\n9-12, and 17-20. Stem-cells were mobilized by use of cyclophosphamide 1000 mg/m² intravenously on day 1,\ndoxorubicin 15 mg/m² intravenously on days 1-4, dexamethasone 40 mg orally on days 1-4, and granulocyte colony-\nstimulating factor (filgrastim) 10 μg/kg per day subcutaneously, divided in two doses per day, from day 5 until last\nstem cell collection. A minimum of 2.5 × 106 CD34+ cells per transplantation procedure was required. After\ninduction therapy, patients received one (HOVON-65) or two (GMMG-HD4) cycles of high-dose melphalan (200\nmg/m² intravenously) with autologous stem-cell rescue followed by maintenance treatment with thalidomide (50 mg\nper day orally; group assigned to vincristine-based induction treatment) or bortezomib (1.3 mg/m² intravenously\nonce every 2 weeks; group assigned to bortezomib-based induction treatment) for 2 years. Treatment was not\nmasked for physicians and patients (see Figure S1).\nInformed consent to treatment protocols and sample procurement was obtained for all cases included in this study, in\naccordance with the Declaration of Helsinki. Use of diagnostic tumour material was approved by the institutional\nreview board of the Erasmus Medical Centre.\n\nPatients in validation datasets\nUAMS-TT2 is a randomized trial in which patients received thalidomide during all treatment phases (UAMS-TT2;\nn=351; GSE2658; NCT00573391).1 UAMS-TT3 is a similar regimen with the addition of bortezomib to the\nthalidomide arm (UAMS-TT3; n=142; E-TABM-1138; NCT00081939).2 The MRC-IX trial (n=247; GSE15695;\nISRCTN68454111) included both transplant-eligible and non-transplant-eligible newly diagnosed patients. For\ntransplant-eligible patients treatment consisted of induction high-dose therapy while non-transplant-eligible patients\nwere treated initially with either thalidomide or melphalan. Maintenance for both age classes was a comparison of\nthalidomide vs. no thalidomide.3, 4 The trial and dataset denoted here as APEX consisted of the three trials APEX,\nSUMMIT and CREST (n=264; GSE9782; registered under M34100-024, M34100-025 and\nNCT00049478/NCT00048230).5-8 The APEX trial included patients with relapsed myeloma who received either\nbortezomib or high-dose dexamethasone, with the possibility to cross-over to receive bortezomib after disease\nprogression.5 In the SUMMIT trial patients received bortezomib. In patients with a suboptimal response, oral\ndexamethasone was added to the regimen.8 The CREST trial included relapsed or refractory patients who received\nbortezomib. Dexamethasone was permitted in patients with progressive or stable disease. 7\n\nSurvival signature\nThe MAS5 normalized, log2 transformed and mean-variance scaled HOVON-65/GMMG-HD4 dataset was used as a\ntraining set for building a GEP based survival classifier.9, 10 The model was built using a Supervised Principal\nComponent Analysis (SPCA) framework.11 This technique is widely used in biological settings.12-19 The underlying\nassumption is the existence of a high-risk group which can be separated from a standard-risk group on the basis of\nprogression free survival. A Principal Component Analysis (PCA) is a rotation of a n  m centered feature space\nX in such a way that the largest variance in the data is projected on the top principal components. 20 This rotation\ncan be described by a m  m rotation matrix R pca\n\n                                                    X rot  XR pca\n\nX rot is rotated in such a way that the first principal component (PC) is the axis that points in the direction\nexhibiting the largest variance. Every subsequent PC is perpendicular to all previous while capturing as much as\npossible of the remaining variance. SPCA is a PCA whereby the feature space has undergone a selection X sel . In\nthis study the initial selection is based on selecting the top  probe sets that were ranked by a univariate Cox\nproportional hazard regression. This will result in high variance due to survival so it is likely survival is projected\nonto the top  PC's on which a Cox proportional hazard regression is applied. This yields regression coefficients β.\nThe resulting model can be summarized as:\n\n\n                                                          1\n"
> #提取第十页信息
> b[10]#杂乱无章!
[1] "                                      SUPPLEMENTAL TABLES\n\nTable S1. EMC-92 gene signature. Probe sets are ordered by decreasing magnitude of weighting coefficient\n(beta)\n                              Weighting         Symbol\nRank   Probes                                                   GO-term/description1\n                           coefficient (beta)\n  1    202728_s_at             -0.1105          LTBP1           negative regulation of TGFbeta receptor signaling\n  2    239054_at               -0.1088          SFMBT1          regulation of transcription\n  3    208942_s_at             -0.0997          SEC62           cotranslational protein targeting to membrane\n  4    208747_s_at             -0.0874          C1S             proteolysis\n  5    202542_s_at              0.0870          AIMP1           negative regulation of endothelial cell proliferation\n  6    214482_at                0.0861          ZBTB25          transcription\n  7    228416_at               -0.0778          ACVR2A          transmembrane receptor protein serine/threonine kinase signaling\n  8    217728_at                0.0773          S100A6          signal transduction\n  9    215177_s_at             -0.0768          ITGA6           cell-substrate junction assembly\n  10   225601_at                0.0750          HMGB3           multicellular organismal development\n  11   207618_s_at              0.0746          BCS1L           mitochondrion organization\n  12   231989_s_at              0.0730          LOC100271836    ---\n  13   202884_s_at              0.0714          PPP2R1B         control of cell growth and division\n  14   231738_at                0.0686          PCDHB7          calcium-dependent cell-cell adhesion\n  15   238116_at                0.0661          DYNLRB2         microtubule-based movement\n  16   226218_at               -0.0644          IL7R            regulation of DNA recombination\n  17   202842_s_at             -0.0626          DNAJB9          protein folding\n  18   208732_at               -0.0618          RAB2A           ER to Golgi vesicle-mediated transport\n  19   204379_s_at              0.0594          FGFR3           MAPKKK cascade\n  20   242180_at               -0.0585          TSPAN16         cellular activation and adhesion\n  21   216473_x_at             -0.0576          DUX4            regulation of transcription, DNA-dependent\n  22   209683_at               -0.0561          FAM49A          ---\n  23   219550_at                0.0559          ROBO3           axon guidance\n  24   223811_s_at              0.0556          SUN1 / GET4     cytoskeletal anchoring at nuclear membrane\n  25   202813_at                0.0548          TARBP1          regulation of transcription from RNA polymerase II promoter\n  26   212282_at                0.0530          TMEM97          cholesterol homeostasis\n  27   238780_s_at             -0.0529          EST/ BX647543   ---\n  28   M97935_MA_at2            0.0525          STAT1           transcription from RNA polymerase II promoter\n  29   221041_s_at             -0.0520          SLC17A5         anion transport\n  30   224009_x_at             -0.0520          DHRS9           androgen metabolic process\n  31   214612_x_at              0.0496          MAGEA6          ---\n  32   208232_x_at             -0.0493          ---             ---\n  33   238662_at                0.0490          ATPBD4          ---\n  34   206204_at                0.0477          GRB14           signal transduction\n  35   233437_at                0.0446          GABRA4          transport\n  36   200875_s_at              0.0437          NOP56           rRNA processing\n  37   38158_at                 0.0423          ESPL1           apoptosis\n  38   217548_at               -0.0423          C15orf38        ---\n  39   220351_at                0.0420          CCRL1           chemotaxis\n  40   213002_at               -0.0418          MARCKS          actin filament crosslinking\n  41   243018_at                0.0407          EST/BE568408    ---\n  42   221755_at                0.0396          EHBP1L1         ---\n  43   208667_s_at             -0.0390          ST13            protein folding\n  44   212055_at                0.0384          C18orf10        cytoskeleton\n  45   201292_at               -0.0372          TOP2A           DNA ligation\n  46   201102_s_at              0.0349          PFKL            fructose 6-phosphate metabolic process\n  47   214150_x_at             -0.0349          ATP6V0E1        proton transport\n  48   226742_at               -0.0345          SAR1B           transport\n  49   215181_at               -0.0342          CDH22           cell adhesion\n  50   208904_s_at             -0.0334          RPS28           rRNA processing\n\n                                                         10\n"
> #去掉分隔符"\n"
> b[10] %>% str_split("\n")
[[1]]
 [1] "                                      SUPPLEMENTAL TABLES"                                                                       
 [2] ""                                                                                                                                
 [3] "Table S1. EMC-92 gene signature. Probe sets are ordered by decreasing magnitude of weighting coefficient"                        
 [4] "(beta)"                                                                                                                          
 [5] "                              Weighting         Symbol"                                                                          
 [6] "Rank   Probes                                                   GO-term/description1"                                            
 [7] "                           coefficient (beta)"                                                                                   
 [8] "  1    202728_s_at             -0.1105          LTBP1           negative regulation of TGFbeta receptor signaling"               
 [9] "  2    239054_at               -0.1088          SFMBT1          regulation of transcription"                                     
[10] "  3    208942_s_at             -0.0997          SEC62           cotranslational protein targeting to membrane"                   
[11] "  4    208747_s_at             -0.0874          C1S             proteolysis"                                                     
[12] "  5    202542_s_at              0.0870          AIMP1           negative regulation of endothelial cell proliferation"           
[13] "  6    214482_at                0.0861          ZBTB25          transcription"                                                   
[14] "  7    228416_at               -0.0778          ACVR2A          transmembrane receptor protein serine/threonine kinase signaling"
[15] "  8    217728_at                0.0773          S100A6          signal transduction"                                             
[16] "  9    215177_s_at             -0.0768          ITGA6           cell-substrate junction assembly"                                
[17] "  10   225601_at                0.0750          HMGB3           multicellular organismal development"                            
[18] "  11   207618_s_at              0.0746          BCS1L           mitochondrion organization"                                      
[19] "  12   231989_s_at              0.0730          LOC100271836    ---"                                                             
[20] "  13   202884_s_at              0.0714          PPP2R1B         control of cell growth and division"                             
[21] "  14   231738_at                0.0686          PCDHB7          calcium-dependent cell-cell adhesion"                            
[22] "  15   238116_at                0.0661          DYNLRB2         microtubule-based movement"                                      
[23] "  16   226218_at               -0.0644          IL7R            regulation of DNA recombination"                                 
[24] "  17   202842_s_at             -0.0626          DNAJB9          protein folding"                                                 
[25] "  18   208732_at               -0.0618          RAB2A           ER to Golgi vesicle-mediated transport"                          
[26] "  19   204379_s_at              0.0594          FGFR3           MAPKKK cascade"                                                  
[27] "  20   242180_at               -0.0585          TSPAN16         cellular activation and adhesion"                                
[28] "  21   216473_x_at             -0.0576          DUX4            regulation of transcription, DNA-dependent"                      
[29] "  22   209683_at               -0.0561          FAM49A          ---"                                                             
[30] "  23   219550_at                0.0559          ROBO3           axon guidance"                                                   
[31] "  24   223811_s_at              0.0556          SUN1 / GET4     cytoskeletal anchoring at nuclear membrane"                      
[32] "  25   202813_at                0.0548          TARBP1          regulation of transcription from RNA polymerase II promoter"     
[33] "  26   212282_at                0.0530          TMEM97          cholesterol homeostasis"                                         
[34] "  27   238780_s_at             -0.0529          EST/ BX647543   ---"                                                             
[35] "  28   M97935_MA_at2            0.0525          STAT1           transcription from RNA polymerase II promoter"                   
[36] "  29   221041_s_at             -0.0520          SLC17A5         anion transport"                                                 
[37] "  30   224009_x_at             -0.0520          DHRS9           androgen metabolic process"                                      
[38] "  31   214612_x_at              0.0496          MAGEA6          ---"                                                             
[39] "  32   208232_x_at             -0.0493          ---             ---"                                                             
[40] "  33   238662_at                0.0490          ATPBD4          ---"                                                             
[41] "  34   206204_at                0.0477          GRB14           signal transduction"                                             
[42] "  35   233437_at                0.0446          GABRA4          transport"                                                       
[43] "  36   200875_s_at              0.0437          NOP56           rRNA processing"                                                 
[44] "  37   38158_at                 0.0423          ESPL1           apoptosis"                                                       
[45] "  38   217548_at               -0.0423          C15orf38        ---"                                                             
[46] "  39   220351_at                0.0420          CCRL1           chemotaxis"                                                      
[47] "  40   213002_at               -0.0418          MARCKS          actin filament crosslinking"                                     
[48] "  41   243018_at                0.0407          EST/BE568408    ---"                                                             
[49] "  42   221755_at                0.0396          EHBP1L1         ---"                                                             
[50] "  43   208667_s_at             -0.0390          ST13            protein folding"                                                 
[51] "  44   212055_at                0.0384          C18orf10        cytoskeleton"                                                    
[52] "  45   201292_at               -0.0372          TOP2A           DNA ligation"                                                    
[53] "  46   201102_s_at              0.0349          PFKL            fructose 6-phosphate metabolic process"                          
[54] "  47   214150_x_at             -0.0349          ATP6V0E1        proton transport"                                                
[55] "  48   226742_at               -0.0345          SAR1B           transport"                                                       
[56] "  49   215181_at               -0.0342          CDH22           cell adhesion"                                                   
[57] "  50   208904_s_at             -0.0334          RPS28           rRNA processing"                                                 
[58] ""                                                                                                                                
[59] "                                                         10"                                                                     
[60] ""                                                                                                                                

> #变整齐了,再去掉空行
> b[10] %>% str_split("\n") %>% .[[1]] %>% .[-c(1:7)] %>% .[-c(51:53)]
 [1] "  1    202728_s_at             -0.1105          LTBP1           negative regulation of TGFbeta receptor signaling"               
 [2] "  2    239054_at               -0.1088          SFMBT1          regulation of transcription"                                     
 [3] "  3    208942_s_at             -0.0997          SEC62           cotranslational protein targeting to membrane"                   
 [4] "  4    208747_s_at             -0.0874          C1S             proteolysis"                                                     
 [5] "  5    202542_s_at              0.0870          AIMP1           negative regulation of endothelial cell proliferation"           
 [6] "  6    214482_at                0.0861          ZBTB25          transcription"                                                   
 [7] "  7    228416_at               -0.0778          ACVR2A          transmembrane receptor protein serine/threonine kinase signaling"
 [8] "  8    217728_at                0.0773          S100A6          signal transduction"                                             
 [9] "  9    215177_s_at             -0.0768          ITGA6           cell-substrate junction assembly"                                
[10] "  10   225601_at                0.0750          HMGB3           multicellular organismal development"                            
[11] "  11   207618_s_at              0.0746          BCS1L           mitochondrion organization"                                      
[12] "  12   231989_s_at              0.0730          LOC100271836    ---"                                                             
[13] "  13   202884_s_at              0.0714          PPP2R1B         control of cell growth and division"                             
[14] "  14   231738_at                0.0686          PCDHB7          calcium-dependent cell-cell adhesion"                            
[15] "  15   238116_at                0.0661          DYNLRB2         microtubule-based movement"                                      
[16] "  16   226218_at               -0.0644          IL7R            regulation of DNA recombination"                                 
[17] "  17   202842_s_at             -0.0626          DNAJB9          protein folding"                                                 
[18] "  18   208732_at               -0.0618          RAB2A           ER to Golgi vesicle-mediated transport"                          
[19] "  19   204379_s_at              0.0594          FGFR3           MAPKKK cascade"                                                  
[20] "  20   242180_at               -0.0585          TSPAN16         cellular activation and adhesion"                                
[21] "  21   216473_x_at             -0.0576          DUX4            regulation of transcription, DNA-dependent"                      
[22] "  22   209683_at               -0.0561          FAM49A          ---"                                                             
[23] "  23   219550_at                0.0559          ROBO3           axon guidance"                                                   
[24] "  24   223811_s_at              0.0556          SUN1 / GET4     cytoskeletal anchoring at nuclear membrane"                      
[25] "  25   202813_at                0.0548          TARBP1          regulation of transcription from RNA polymerase II promoter"     
[26] "  26   212282_at                0.0530          TMEM97          cholesterol homeostasis"                                         
[27] "  27   238780_s_at             -0.0529          EST/ BX647543   ---"                                                             
[28] "  28   M97935_MA_at2            0.0525          STAT1           transcription from RNA polymerase II promoter"                   
[29] "  29   221041_s_at             -0.0520          SLC17A5         anion transport"                                                 
[30] "  30   224009_x_at             -0.0520          DHRS9           androgen metabolic process"                                      
[31] "  31   214612_x_at              0.0496          MAGEA6          ---"                                                             
[32] "  32   208232_x_at             -0.0493          ---             ---"                                                             
[33] "  33   238662_at                0.0490          ATPBD4          ---"                                                             
[34] "  34   206204_at                0.0477          GRB14           signal transduction"                                             
[35] "  35   233437_at                0.0446          GABRA4          transport"                                                       
[36] "  36   200875_s_at              0.0437          NOP56           rRNA processing"                                                 
[37] "  37   38158_at                 0.0423          ESPL1           apoptosis"                                                       
[38] "  38   217548_at               -0.0423          C15orf38        ---"                                                             
[39] "  39   220351_at                0.0420          CCRL1           chemotaxis"                                                      
[40] "  40   213002_at               -0.0418          MARCKS          actin filament crosslinking"                                     
[41] "  41   243018_at                0.0407          EST/BE568408    ---"                                                             
[42] "  42   221755_at                0.0396          EHBP1L1         ---"                                                             
[43] "  43   208667_s_at             -0.0390          ST13            protein folding"                                                 
[44] "  44   212055_at                0.0384          C18orf10        cytoskeleton"                                                    
[45] "  45   201292_at               -0.0372          TOP2A           DNA ligation"                                                    
[46] "  46   201102_s_at              0.0349          PFKL            fructose 6-phosphate metabolic process"                          
[47] "  47   214150_x_at             -0.0349          ATP6V0E1        proton transport"                                                
[48] "  48   226742_at               -0.0345          SAR1B           transport"                                                       
[49] "  49   215181_at               -0.0342          CDH22           cell adhesion"                                                   
[50] "  50   208904_s_at             -0.0334          RPS28           rRNA processing"                                                 
> #看着更规整了,再去掉空格
> b[10] %>% str_split("\n") %>% .[[1]] %>% .[-c(1:7)] %>% .[-c(51:53)] %>% str_split(" ")                                                              
[[1]]
 [1] ""            ""            "1"           ""            ""            ""           
 [7] "202728_s_at" ""            ""            ""            ""            ""           
[13] ""            ""            ""            ""            ""            ""           
[19] ""            "-0.1105"     ""            ""            ""            ""           
[25] ""            ""            ""            ""            ""            "LTBP1"      
[31] ""            ""            ""            ""            ""            ""           
[37] ""            ""            ""            ""            "negative"    "regulation" 
[43] "of"          "TGFbeta"     "receptor"    "signaling"  

[[2]]
 [1] ""              ""              "2"             ""              ""             
 [6] ""              "239054_at"     ""              ""              ""             
[11] ""              ""              ""              ""              ""             
[16] ""              ""              ""              ""              ""             
[21] ""              "-0.1088"       ""              ""              ""             
[26] ""              ""              ""              ""              ""             
[31] ""              "SFMBT1"        ""              ""              ""             
[36] ""              ""              ""              ""              ""             
[41] ""              "regulation"    "of"            "transcription"

[[3]]
 [1] ""                ""                "3"               ""               
 [5] ""                ""                "208942_s_at"     ""               
 [9] ""                ""                ""                ""               
[13] ""                ""                ""                ""               
[17] ""                ""                ""                "-0.0997"        
[21] ""                ""                ""                ""               
[25] ""                ""                ""                ""               
[29] ""                "SEC62"           ""                ""               
[33] ""                ""                ""                ""               
[37] ""                ""                ""                ""               
[41] "cotranslational" "protein"         "targeting"       "to"             
[45] "membrane"       

[[4]]
 [1] ""            ""            "4"           ""            ""            ""           
 [7] "208747_s_at" ""            ""            ""            ""            ""           
[13] ""            ""            ""            ""            ""            ""           
[19] ""            "-0.0874"     ""            ""            ""            ""           
[25] ""            ""            ""            ""            ""            "C1S"        
[31] ""            ""            ""            ""            ""            ""           
[37] ""            ""            ""            ""            ""            ""           
[43] "proteolysis"

[[5]]
 [1] ""              ""              "5"             ""              ""             
 [6] ""              "202542_s_at"   ""              ""              ""             
[11] ""              ""              ""              ""              ""             
[16] ""              ""              ""              ""              ""             
[21] "0.0870"        ""              ""              ""              ""             
[26] ""              ""              ""              ""              ""             
[31] "AIMP1"         ""              ""              ""              ""             
[36] ""              ""              ""              ""              ""             
[41] ""              "negative"      "regulation"    "of"            "endothelial"  
[46] "cell"          "proliferation"

[[6]]
 [1] ""              ""              "6"             ""              ""             
 [6] ""              "214482_at"     ""              ""              ""             
[11] ""              ""              ""              ""              ""             
[16] ""              ""              ""              ""              ""             
[21] ""              ""              "0.0861"        ""              ""             
[26] ""              ""              ""              ""              ""             
[31] ""              ""              "ZBTB25"        ""              ""             
[36] ""              ""              ""              ""              ""             
[41] ""              ""              "transcription"

[[7]]
 [1] ""                 ""                 "7"                ""                
 [5] ""                 ""                 "228416_at"        ""                
 [9] ""                 ""                 ""                 ""                
[13] ""                 ""                 ""                 ""                
[17] ""                 ""                 ""                 ""                
[21] ""                 "-0.0778"          ""                 ""                
[25] ""                 ""                 ""                 ""                
[29] ""                 ""                 ""                 "ACVR2A"          
[33] ""                 ""                 ""                 ""                
[37] ""                 ""                 ""                 ""                
[41] ""                 "transmembrane"    "receptor"         "protein"         
[45] "serine/threonine" "kinase"           "signaling"       

[[8]]
 [1] ""             ""             "8"            ""             ""            
 [6] ""             "217728_at"    ""             ""             ""            
[11] ""             ""             ""             ""             ""            
[16] ""             ""             ""             ""             ""            
[21] ""             ""             "0.0773"       ""             ""            
[26] ""             ""             ""             ""             ""            
[31] ""             ""             "S100A6"       ""             ""            
[36] ""             ""             ""             ""             ""            
[41] ""             ""             "signal"       "transduction"

[[9]]
 [1] ""               ""               "9"              ""               ""              
 [6] ""               "215177_s_at"    ""               ""               ""              
[11] ""               ""               ""               ""               ""              
[16] ""               ""               ""               ""               "-0.0768"       
[21] ""               ""               ""               ""               ""              
[26] ""               ""               ""               ""               "ITGA6"         
[31] ""               ""               ""               ""               ""              
[36] ""               ""               ""               ""               ""              
[41] "cell-substrate" "junction"       "assembly"      

[[10]]
 [1] ""              ""              "10"            ""              ""             
 [6] "225601_at"     ""              ""              ""              ""             
[11] ""              ""              ""              ""              ""             
[16] ""              ""              ""              ""              ""             
[21] ""              "0.0750"        ""              ""              ""             
[26] ""              ""              ""              ""              ""             
[31] ""              "HMGB3"         ""              ""              ""             
[36] ""              ""              ""              ""              ""             
[41] ""              ""              "multicellular" "organismal"    "development"  

[[11]]
 [1] ""              ""              "11"            ""              ""             
 [6] "207618_s_at"   ""              ""              ""              ""             
[11] ""              ""              ""              ""              ""             
[16] ""              ""              ""              ""              "0.0746"       
[21] ""              ""              ""              ""              ""             
[26] ""              ""              ""              ""              "BCS1L"        
[31] ""              ""              ""              ""              ""             
[36] ""              ""              ""              ""              ""             
[41] "mitochondrion" "organization" 

[[12]]
 [1] ""             ""             "12"           ""             ""            
 [6] "231989_s_at"  ""             ""             ""             ""            
[11] ""             ""             ""             ""             ""            
[16] ""             ""             ""             ""             "0.0730"      
[21] ""             ""             ""             ""             ""            
[26] ""             ""             ""             ""             "LOC100271836"
[31] ""             ""             ""             "---"         

[[13]]
 [1] ""            ""            "13"          ""            ""            "202884_s_at"
 [7] ""            ""            ""            ""            ""            ""           
[13] ""            ""            ""            ""            ""            ""           
[19] ""            "0.0714"      ""            ""            ""            ""           
[25] ""            ""            ""            ""            ""            "PPP2R1B"    
[31] ""            ""            ""            ""            ""            ""           
[37] ""            ""            "control"     "of"          "cell"        "growth"     
[43] "and"         "division"   

[[14]]
 [1] ""                  ""                  "14"                ""                 
 [5] ""                  "231738_at"         ""                  ""                 
 [9] ""                  ""                  ""                  ""                 
[13] ""                  ""                  ""                  ""                 
[17] ""                  ""                  ""                  ""                 
[21] ""                  "0.0686"            ""                  ""                 
[25] ""                  ""                  ""                  ""                 
[29] ""                  ""                  ""                  "PCDHB7"           
[33] ""                  ""                  ""                  ""                 
[37] ""                  ""                  ""                  ""                 
[41] ""                  "calcium-dependent" "cell-cell"         "adhesion"         

[[15]]
 [1] ""                  ""                  "15"                ""                 
 [5] ""                  "238116_at"         ""                  ""                 
 [9] ""                  ""                  ""                  ""                 
[13] ""                  ""                  ""                  ""                 
[17] ""                  ""                  ""                  ""                 
[21] ""                  "0.0661"            ""                  ""                 
[25] ""                  ""                  ""                  ""                 
[29] ""                  ""                  ""                  "DYNLRB2"          
[33] ""                  ""                  ""                  ""                 
[37] ""                  ""                  ""                  ""                 
[41] "microtubule-based" "movement"         

[[16]]
 [1] ""              ""              "16"            ""              ""             
 [6] "226218_at"     ""              ""              ""              ""             
[11] ""              ""              ""              ""              ""             
[16] ""              ""              ""              ""              ""             
[21] "-0.0644"       ""              ""              ""              ""             
[26] ""              ""              ""              ""              ""             
[31] "IL7R"          ""              ""              ""              ""             
[36] ""              ""              ""              ""              ""             
[41] ""              ""              "regulation"    "of"            "DNA"          
[46] "recombination"

[[17]]
 [1] ""            ""            "17"          ""            ""            "202842_s_at"
 [7] ""            ""            ""            ""            ""            ""           
[13] ""            ""            ""            ""            ""            ""           
[19] "-0.0626"     ""            ""            ""            ""            ""           
[25] ""            ""            ""            ""            "DNAJB9"      ""           
[31] ""            ""            ""            ""            ""            ""           
[37] ""            ""            "protein"     "folding"    

[[18]]
 [1] ""                 ""                 "18"               ""                
 [5] ""                 "208732_at"        ""                 ""                
 [9] ""                 ""                 ""                 ""                
[13] ""                 ""                 ""                 ""                
[17] ""                 ""                 ""                 ""                
[21] "-0.0618"          ""                 ""                 ""                
[25] ""                 ""                 ""                 ""                
[29] ""                 ""                 "RAB2A"            ""                
[33] ""                 ""                 ""                 ""                
[37] ""                 ""                 ""                 ""                
[41] ""                 "ER"               "to"               "Golgi"           
[45] "vesicle-mediated" "transport"       

[[19]]
 [1] ""            ""            "19"          ""            ""            "204379_s_at"
 [7] ""            ""            ""            ""            ""            ""           
[13] ""            ""            ""            ""            ""            ""           
[19] ""            "0.0594"      ""            ""            ""            ""           
[25] ""            ""            ""            ""            ""            "FGFR3"      
[31] ""            ""            ""            ""            ""            ""           
[37] ""            ""            ""            ""            "MAPKKK"      "cascade"    

[[20]]
 [1] ""           ""           "20"         ""           ""           "242180_at" 
 [7] ""           ""           ""           ""           ""           ""          
[13] ""           ""           ""           ""           ""           ""          
[19] ""           ""           "-0.0585"    ""           ""           ""          
[25] ""           ""           ""           ""           ""           ""          
[31] "TSPAN16"    ""           ""           ""           ""           ""          
[37] ""           ""           ""           "cellular"   "activation" "and"       
[43] "adhesion"  

[[21]]
 [1] ""               ""               "21"             ""               ""              
 [6] "216473_x_at"    ""               ""               ""               ""              
[11] ""               ""               ""               ""               ""              
[16] ""               ""               ""               "-0.0576"        ""              
[21] ""               ""               ""               ""               ""              
[26] ""               ""               ""               "DUX4"           ""              
[31] ""               ""               ""               ""               ""              
[36] ""               ""               ""               ""               ""              
[41] "regulation"     "of"             "transcription," "DNA-dependent" 

[[22]]
 [1] ""          ""          "22"        ""          ""          "209683_at" ""         
 [8] ""          ""          ""          ""          ""          ""          ""         
[15] ""          ""          ""          ""          ""          ""          "-0.0561"  
[22] ""          ""          ""          ""          ""          ""          ""         
[29] ""          ""          "FAM49A"    ""          ""          ""          ""         
[36] ""          ""          ""          ""          ""          "---"      

[[23]]
 [1] ""          ""          "23"        ""          ""          "219550_at" ""         
 [8] ""          ""          ""          ""          ""          ""          ""         
[15] ""          ""          ""          ""          ""          ""          ""         
[22] "0.0559"    ""          ""          ""          ""          ""          ""         
[29] ""          ""          ""          "ROBO3"     ""          ""          ""         
[36] ""          ""          ""          ""          ""          ""          ""         
[43] "axon"      "guidance" 

[[24]]
 [1] ""             ""             "24"           ""             ""            
 [6] "223811_s_at"  ""             ""             ""             ""            
[11] ""             ""             ""             ""             ""            
[16] ""             ""             ""             ""             "0.0556"      
[21] ""             ""             ""             ""             ""            
[26] ""             ""             ""             ""             "SUN1"        
[31] "/"            "GET4"         ""             ""             ""            
[36] ""             "cytoskeletal" "anchoring"    "at"           "nuclear"     
[41] "membrane"    

[[25]]
 [1] ""              ""              "25"            ""              ""             
 [6] "202813_at"     ""              ""              ""              ""             
[11] ""              ""              ""              ""              ""             
[16] ""              ""              ""              ""              ""             
[21] ""              "0.0548"        ""              ""              ""             
[26] ""              ""              ""              ""              ""             
[31] ""              "TARBP1"        ""              ""              ""             
[36] ""              ""              ""              ""              ""             
[41] ""              "regulation"    "of"            "transcription" "from"         
[46] "RNA"           "polymerase"    "II"            "promoter"     

[[26]]
 [1] ""            ""            "26"          ""            ""            "212282_at"  
 [7] ""            ""            ""            ""            ""            ""           
[13] ""            ""            ""            ""            ""            ""           
[19] ""            ""            ""            "0.0530"      ""            ""           
[25] ""            ""            ""            ""            ""            ""           
[31] ""            "TMEM97"      ""            ""            ""            ""           
[37] ""            ""            ""            ""            ""            "cholesterol"
[43] "homeostasis"

[[27]]
 [1] ""            ""            "27"          ""            ""            "238780_s_at"
 [7] ""            ""            ""            ""            ""            ""           
[13] ""            ""            ""            ""            ""            ""           
[19] "-0.0529"     ""            ""            ""            ""            ""           
[25] ""            ""            ""            ""            "EST/"        "BX647543"   
[31] ""            ""            "---"        

[[28]]
 [1] ""              ""              "28"            ""              ""             
 [6] "M97935_MA_at2" ""              ""              ""              ""             
[11] ""              ""              ""              ""              ""             
[16] ""              ""              "0.0525"        ""              ""             
[21] ""              ""              ""              ""              ""             
[26] ""              ""              "STAT1"         ""              ""             
[31] ""              ""              ""              ""              ""             
[36] ""              ""              ""              "transcription" "from"         
[41] "RNA"           "polymerase"    "II"            "promoter"     

[[29]]
 [1] ""            ""            "29"          ""            ""            "221041_s_at"
 [7] ""            ""            ""            ""            ""            ""           
[13] ""            ""            ""            ""            ""            ""           
[19] "-0.0520"     ""            ""            ""            ""            ""           
[25] ""            ""            ""            ""            "SLC17A5"     ""           
[31] ""            ""            ""            ""            ""            ""           
[37] ""            "anion"       "transport"  

[[30]]
 [1] ""            ""            "30"          ""            ""            "224009_x_at"
 [7] ""            ""            ""            ""            ""            ""           
[13] ""            ""            ""            ""            ""            ""           
[19] "-0.0520"     ""            ""            ""            ""            ""           
[25] ""            ""            ""            ""            "DHRS9"       ""           
[31] ""            ""            ""            ""            ""            ""           
[37] ""            ""            ""            "androgen"    "metabolic"   "process"    

[[31]]
 [1] ""            ""            "31"          ""            ""            "214612_x_at"
 [7] ""            ""            ""            ""            ""            ""           
[13] ""            ""            ""            ""            ""            ""           
[19] ""            "0.0496"      ""            ""            ""            ""           
[25] ""            ""            ""            ""            ""            "MAGEA6"     
[31] ""            ""            ""            ""            ""            ""           
[37] ""            ""            ""            "---"        

[[32]]
 [1] ""            ""            "32"          ""            ""            "208232_x_at"
 [7] ""            ""            ""            ""            ""            ""           
[13] ""            ""            ""            ""            ""            ""           
[19] "-0.0493"     ""            ""            ""            ""            ""           
[25] ""            ""            ""            ""            "---"         ""           
[31] ""            ""            ""            ""            ""            ""           
[37] ""            ""            ""            ""            ""            "---"        

[[33]]
 [1] ""          ""          "33"        ""          ""          "238662_at" ""         
 [8] ""          ""          ""          ""          ""          ""          ""         
[15] ""          ""          ""          ""          ""          ""          ""         
[22] "0.0490"    ""          ""          ""          ""          ""          ""         
[29] ""          ""          ""          "ATPBD4"    ""          ""          ""         
[36] ""          ""          ""          ""          ""          ""          "---"      

[[34]]
 [1] ""             ""             "34"           ""             ""            
 [6] "206204_at"    ""             ""             ""             ""            
[11] ""             ""             ""             ""             ""            
[16] ""             ""             ""             ""             ""            
[21] ""             "0.0477"       ""             ""             ""            
[26] ""             ""             ""             ""             ""            
[31] ""             "GRB14"        ""             ""             ""            
[36] ""             ""             ""             ""             ""            
[41] ""             ""             "signal"       "transduction"

[[35]]
 [1] ""          ""          "35"        ""          ""          "233437_at" ""         
 [8] ""          ""          ""          ""          ""          ""          ""         
[15] ""          ""          ""          ""          ""          ""          ""         
[22] "0.0446"    ""          ""          ""          ""          ""          ""         
[29] ""          ""          ""          "GABRA4"    ""          ""          ""         
[36] ""          ""          ""          ""          ""          ""          "transport"

[[36]]
 [1] ""            ""            "36"          ""            ""            "200875_s_at"
 [7] ""            ""            ""            ""            ""            ""           
[13] ""            ""            ""            ""            ""            ""           
[19] ""            "0.0437"      ""            ""            ""            ""           
[25] ""            ""            ""            ""            ""            "NOP56"      
[31] ""            ""            ""            ""            ""            ""           
[37] ""            ""            ""            ""            "rRNA"        "processing" 

[[37]]
 [1] ""          ""          "37"        ""          ""          "38158_at"  ""         
 [8] ""          ""          ""          ""          ""          ""          ""         
[15] ""          ""          ""          ""          ""          ""          ""         
[22] ""          "0.0423"    ""          ""          ""          ""          ""         
[29] ""          ""          ""          ""          "ESPL1"     ""          ""         
[36] ""          ""          ""          ""          ""          ""          ""         
[43] ""          "apoptosis"

[[38]]
 [1] ""          ""          "38"        ""          ""          "217548_at" ""         
 [8] ""          ""          ""          ""          ""          ""          ""         
[15] ""          ""          ""          ""          ""          ""          "-0.0423"  
[22] ""          ""          ""          ""          ""          ""          ""         
[29] ""          ""          "C15orf38"  ""          ""          ""          ""         
[36] ""          ""          ""          "---"      

[[39]]
 [1] ""           ""           "39"         ""           ""           "220351_at" 
 [7] ""           ""           ""           ""           ""           ""          
[13] ""           ""           ""           ""           ""           ""          
[19] ""           ""           ""           "0.0420"     ""           ""          
[25] ""           ""           ""           ""           ""           ""          
[31] ""           "CCRL1"      ""           ""           ""           ""          
[37] ""           ""           ""           ""           ""           ""          
[43] "chemotaxis"

[[40]]
 [1] ""             ""             "40"           ""             ""            
 [6] "213002_at"    ""             ""             ""             ""            
[11] ""             ""             ""             ""             ""            
[16] ""             ""             ""             ""             ""            
[21] "-0.0418"      ""             ""             ""             ""            
[26] ""             ""             ""             ""             ""            
[31] "MARCKS"       ""             ""             ""             ""            
[36] ""             ""             ""             ""             ""            
[41] "actin"        "filament"     "crosslinking"

[[41]]
 [1] ""             ""             "41"           ""             ""            
 [6] "243018_at"    ""             ""             ""             ""            
[11] ""             ""             ""             ""             ""            
[16] ""             ""             ""             ""             ""            
[21] ""             "0.0407"       ""             ""             ""            
[26] ""             ""             ""             ""             ""            
[31] ""             "EST/BE568408" ""             ""             ""            
[36] "---"         

[[42]]
 [1] ""          ""          "42"        ""          ""          "221755_at" ""         
 [8] ""          ""          ""          ""          ""          ""          ""         
[15] ""          ""          ""          ""          ""          ""          ""         
[22] "0.0396"    ""          ""          ""          ""          ""          ""         
[29] ""          ""          ""          "EHBP1L1"   ""          ""          ""         
[36] ""          ""          ""          ""          ""          "---"      

[[43]]
 [1] ""            ""            "43"          ""            ""            "208667_s_at"
 [7] ""            ""            ""            ""            ""            ""           
[13] ""            ""            ""            ""            ""            ""           
[19] "-0.0390"     ""            ""            ""            ""            ""           
[25] ""            ""            ""            ""            "ST13"        ""           
[31] ""            ""            ""            ""            ""            ""           
[37] ""            ""            ""            ""            "protein"     "folding"    

[[44]]
 [1] ""             ""             "44"           ""             ""            
 [6] "212055_at"    ""             ""             ""             ""            
[11] ""             ""             ""             ""             ""            
[16] ""             ""             ""             ""             ""            
[21] ""             "0.0384"       ""             ""             ""            
[26] ""             ""             ""             ""             ""            
[31] ""             "C18orf10"     ""             ""             ""            
[36] ""             ""             ""             ""             "cytoskeleton"

[[45]]
 [1] ""          ""          "45"        ""          ""          "201292_at" ""         
 [8] ""          ""          ""          ""          ""          ""          ""         
[15] ""          ""          ""          ""          ""          ""          "-0.0372"  
[22] ""          ""          ""          ""          ""          ""          ""         
[29] ""          ""          "TOP2A"     ""          ""          ""          ""         
[36] ""          ""          ""          ""          ""          ""          "DNA"      
[43] "ligation" 

[[46]]
 [1] ""            ""            "46"          ""            ""            "201102_s_at"
 [7] ""            ""            ""            ""            ""            ""           
[13] ""            ""            ""            ""            ""            ""           
[19] ""            "0.0349"      ""            ""            ""            ""           
[25] ""            ""            ""            ""            ""            "PFKL"       
[31] ""            ""            ""            ""            ""            ""           
[37] ""            ""            ""            ""            ""            "fructose"   
[43] "6-phosphate" "metabolic"   "process"    

[[47]]
 [1] ""            ""            "47"          ""            ""            "214150_x_at"
 [7] ""            ""            ""            ""            ""            ""           
[13] ""            ""            ""            ""            ""            ""           
[19] "-0.0349"     ""            ""            ""            ""            ""           
[25] ""            ""            ""            ""            "ATP6V0E1"    ""           
[31] ""            ""            ""            ""            ""            ""           
[37] "proton"      "transport"  

[[48]]
 [1] ""          ""          "48"        ""          ""          "226742_at" ""         
 [8] ""          ""          ""          ""          ""          ""          ""         
[15] ""          ""          ""          ""          ""          ""          "-0.0345"  
[22] ""          ""          ""          ""          ""          ""          ""         
[29] ""          ""          "SAR1B"     ""          ""          ""          ""         
[36] ""          ""          ""          ""          ""          ""          "transport"

[[49]]
 [1] ""          ""          "49"        ""          ""          "215181_at" ""         
 [8] ""          ""          ""          ""          ""          ""          ""         
[15] ""          ""          ""          ""          ""          ""          "-0.0342"  
[22] ""          ""          ""          ""          ""          ""          ""         
[29] ""          ""          "CDH22"     ""          ""          ""          ""         
[36] ""          ""          ""          ""          ""          ""          "cell"     
[43] "adhesion" 

[[50]]
 [1] ""            ""            "50"          ""            ""            "208904_s_at"
 [7] ""            ""            ""            ""            ""            ""           
[13] ""            ""            ""            ""            ""            ""           
[19] "-0.0334"     ""            ""            ""            ""            ""           
[25] ""            ""            ""            ""            "RPS28"       ""           
[31] ""            ""            ""            ""            ""            ""           
[37] ""            ""            ""            "rRNA"        "processing" 

> #只用把""去掉然后提取gene就好啦,可用for循环,也可以用lapply函数,道理都是相同的
> # gene <- list()
> # for (i in 1:50){
> #   gene_name <- b2 %>% .[[i]] %>% .[.!=""] %>% .[4] # gene名排在第四个,根据不同的数据做不同的处理
> #   gene <- rbind(gene,gene_name)
> # }
> b[10] %>% str_split("\n") %>% .[[1]] %>% .[-c(1:7)] %>% .[-c(51:53)] %>% str_split(" ") %>% lapply(\(x){x[x!=""]%>%.[4]})
[[1]]
[1] "LTBP1"

[[2]]
[1] "SFMBT1"

[[3]]
[1] "SEC62"

[[4]]
[1] "C1S"

[[5]]
[1] "AIMP1"

[[6]]
[1] "ZBTB25"

[[7]]
[1] "ACVR2A"

[[8]]
[1] "S100A6"

[[9]]
[1] "ITGA6"

[[10]]
[1] "HMGB3"

[[11]]
[1] "BCS1L"

[[12]]
[1] "LOC100271836"

[[13]]
[1] "PPP2R1B"

[[14]]
[1] "PCDHB7"

[[15]]
[1] "DYNLRB2"

[[16]]
[1] "IL7R"

[[17]]
[1] "DNAJB9"

[[18]]
[1] "RAB2A"

[[19]]
[1] "FGFR3"

[[20]]
[1] "TSPAN16"

[[21]]
[1] "DUX4"

[[22]]
[1] "FAM49A"

[[23]]
[1] "ROBO3"

[[24]]
[1] "SUN1"

[[25]]
[1] "TARBP1"

[[26]]
[1] "TMEM97"

[[27]]
[1] "EST/"

[[28]]
[1] "STAT1"

[[29]]
[1] "SLC17A5"

[[30]]
[1] "DHRS9"

[[31]]
[1] "MAGEA6"

[[32]]
[1] "---"

[[33]]
[1] "ATPBD4"

[[34]]
[1] "GRB14"

[[35]]
[1] "GABRA4"

[[36]]
[1] "NOP56"

[[37]]
[1] "ESPL1"

[[38]]
[1] "C15orf38"

[[39]]
[1] "CCRL1"

[[40]]
[1] "MARCKS"

[[41]]
[1] "EST/BE568408"

[[42]]
[1] "EHBP1L1"

[[43]]
[1] "ST13"

[[44]]
[1] "C18orf10"

[[45]]
[1] "TOP2A"

[[46]]
[1] "PFKL"

[[47]]
[1] "ATP6V0E1"

[[48]]
[1] "SAR1B"

[[49]]
[1] "CDH22"

[[50]]
[1] "RPS28"


像这样

pdf_data()

pdf_data() 可将pdf每页返回为数据帧

pdf_render_page()

render into a raw bitmap array for further processing in R

pdf_convert()

High quality conversion of pdf page(s) to png, jpeg or tiff format

这几个功能还没有用上,等用过了在回来写 😃

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
### 回答1: 李东风的R语言教程PDF是一份非常实用和有价值的学习材料。 R语言是一种功能强大的编程语言,特别适用于数据分析和统计建模。由于它的开源性质和广泛的社区支持,R语言在学术界和工业界都得到了广泛的应用。 李东风的教程以PDF的形式呈现,使得学习者可以在电脑、平板或手机上随时随地进行学习。教程内容丰富,包括R语言的基本语法、数据操作、数据可视化、统计建模等方面的内容。通过这份教程,学习者可以系统地学习R语言的各个方面,从入门到进阶。 这份教程的特点之一是结合了理论和实践,通过丰富的例子和案例帮助学习者理解和掌握R语言的具体应用。另外,教程还提供了习题和答案,使得学习者可以通过练习检验自己的学习效果。 李东风的R语言教程PDF的编写风格简洁明了,容易理解。无论是初学者还是有一定基础的学习者,都可以从受益。这份教程适合自学,也适合作为大学或研究生统计学课程的辅助教材。 总之,李东风的R语言教程PDF对于想要学习R语言或提高R语言应用能力的人来说是一份非常有用的学习资料。它通过系统化的教学内容和丰富的实例帮助学习者掌握R语言的使用方法和技巧,对于数据分析和统计建模等领域的从业者来说是一份宝贵的参考资料。 ### 回答2: 李东风的R语言教程是一本非常受欢迎的R语言学习资料,它提供了广泛而系统的R语言知识,非常适合初学者入门和深入学习R语言编程的进阶者。这本教程以PDF格式发布,方便读者进行电子阅读。 该教程内容丰富,由浅入深,旨在帮助读者逐步掌握R语言的基础知识和编程技巧。教程首先介绍了R语言的基本语法和数据结构,包括向量、矩阵、数组、列表和数据框等。接着,教程详细介绍了R语言数据处理和统计分析功能,包括数据的读取和写入、数据的清洗和转换、数据的可视化和统计分析等。同时,教程还介绍了R语言的函数和流程控制结构,帮助读者编写复杂的程序和自定义函数。 该教程的特点之一是丰富的例子和实践项目。每个章节都提供了大量的实例和练习,帮助读者巩固所学知识,提高实际应用能力。此外,教程还附带了许多易于理解和操作的图表和代码段,方便读者阅读和实践。 总体而言,李东风的R语言教程PDF是一本非常实用的学习资料,具有广泛的应用价值。无论是初学者还是有一定R语言基础的人士,都可以通过该教程系统地学习和提高R语言编程技能。如果你有兴趣学习R语言,这本教程将是一个很好的选择。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值