论文数据统计Task1

数据集

链接:数据集
运行环境:AI Studio

具体代码实现

导入所需包
# 导入所需的package
import seaborn as sns #用于画图
from bs4 import BeautifulSoup #用于爬取arxiv的数据
import re #用于正则表达式,匹配字符串的模式
import requests #用于网络连接,发送网络请求,使用域名获取对应信息
import json #读取数据,我们的数据为json格式的
import pandas as pd #数据处理,数据分析
import matplotlib.pyplot as plt #画图工具
读入数据并查看数据大小
# 读入数据

data  = [] #初始化
#使用with语句优势:1.自动关闭文件句柄;2.自动显示(处理)文件读取数据异常
with open("/home/aistudio/data/data67990/arxiv-metadata-oai-2019.json", 'r') as f: 
    for line in f: 
        data.append(json.loads(line))
        
data = pd.DataFrame(data) #将list变为dataframe格式,方便使用pandas进行分析
data.shape #显示数据大小
(170618, 14)
显示数据的前五行
data.head() #显示数据的前五行
abstractauthorsauthors_parsedcategoriescommentsdoiidjournal-reflicensereport-nosubmittertitleupdate_dateversions
0We systematically explore the evolution of t...Sung-Chul Yoon, Philipp Podsiadlowski and Step...[[Yoon, Sung-Chul, ], [Podsiadlowski, Philipp,...astro-ph15 pages, 15 figures, 3 tables, submitted to M...10.1111/j.1365-2966.2007.12161.x0704.0297NoneNoneNoneSung-Chul YoonRemnant evolution after a carbon-oxygen white ...2019-08-19[{'version': 'v1', 'created': 'Tue, 3 Apr 2007...
1Cofibrations are defined in the category of ...B. Dugmore and PP. Ntumba[[Dugmore, B., ], [Ntumba, PP., ]]math.AT27 pagesNone0704.0342NoneNoneNonePatrice Ntumba PunguCofibrations in the Category of Frolicher Spac...2019-08-19[{'version': 'v1', 'created': 'Tue, 3 Apr 2007...
2We explore the effect of an inhomogeneous ma...T.V. Zaqarashvili and K Murawski[[Zaqarashvili, T. V., ], [Murawski, K, ]]astro-ph6 pages, 3 figures, accepted in A&A10.1051/0004-6361:200772460704.0360NoneNoneNoneZaqarashviliTorsional oscillations of longitudinally inhom...2019-08-19[{'version': 'v1', 'created': 'Tue, 3 Apr 2007...
3This paper has been removed by arXiv adminis...Sezgin Aygun, Ismail Tarhan, Husnu Baysal[[Aygun, Sezgin, ], [Tarhan, Ismail, ], [Baysa...gr-qcThis submission has been withdrawn by arXiv ad...10.1088/0256-307X/24/2/0150704.0525Chin.Phys.Lett.24:355-358,2007NoneNoneSezgin Ayg\"unOn the Energy-Momentum Problem in Static Einst...2019-10-21[{'version': 'v1', 'created': 'Wed, 4 Apr 2007...
4The most massive elliptical galaxies show a ...Antonio Pipino (1,3), Thomas H. Puzia (2,4), a...[[Pipino, Antonio, ], [Puzia, Thomas H., ], [M...astro-ph32 pages (referee format), 9 figures, ApJ acce...10.1086/5195460704.0535Astrophys.J.665:295-305,2007NoneNoneAntonio PipinoThe Formation of Globular Cluster Systems in M...2019-08-19[{'version': 'v1', 'created': 'Wed, 4 Apr 2007...
进行数据预处理

粗略统计论文的种类信息

'''
count:一列数据的元素个数;
unique:一列数据中元素的种类;
top:一列数据中出现频率最高的元素;
freq:一列数据中出现频率最高的元素的个数;
'''

data["categories"].describe()
count     170618
unique     15592
top        cs.CV
freq        5559
Name: categories, dtype: object

以上的结果表明:共有170618个数据,有15592个子类(因为有论文的类别是多个,例如一篇paper的类别是CS.AI & CS.MM和一篇paper的类别是CS.AI & CS.OS属于不同的子类别,这里仅仅是粗略统计),其中最多的种类是cs.CV,共出现了5559次。

查看所有论文的种类
# 所有的种类(独立的)
unique_categories = set([i for l in [x.split(' ') for x in data["categories"]] for i in l])
print(len(unique_categories))
unique_categories

由输出数据可知,数据集中共有172个种类的论文

172
{'acc-phys',
 'adap-org',
 'alg-geom',
 'astro-ph',
 'astro-ph.CO',
 'astro-ph.EP',
 'astro-ph.GA',
 'astro-ph.HE',
 'astro-ph.IM',
 'astro-ph.SR',
 'chao-dyn',
 'chem-ph',
 'cmp-lg',
 'comp-gas',
 'cond-mat',
 'cond-mat.dis-nn',
 'cond-mat.mes-hall',
 'cond-mat.mtrl-sci',
 'cond-mat.other',
 'cond-mat.quant-gas',
 'cond-mat.soft',
 'cond-mat.stat-mech',
 'cond-mat.str-el',
 'cond-mat.supr-con',
 'cs.AI',
 'cs.AR',
 'cs.CC',
 'cs.CE',
 'cs.CG',
 'cs.CL',
 'cs.CR',
 'cs.CV',
 'cs.CY',
 'cs.DB',
 'cs.DC',
 'cs.DL',
 'cs.DM',
 'cs.DS',
 'cs.ET',
 'cs.FL',
 'cs.GL',
 'cs.GR',
 'cs.GT',
 'cs.HC',
 'cs.IR',
 'cs.IT',
 'cs.LG',
 'cs.LO',
 'cs.MA',
 'cs.MM',
 'cs.MS',
 'cs.NA',
 'cs.NE',
 'cs.NI',
 'cs.OH',
 'cs.OS',
 'cs.PF',
 'cs.PL',
 'cs.RO',
 'cs.SC',
 'cs.SD',
 'cs.SE',
 'cs.SI',
 'cs.SY',
 'dg-ga',
 'econ.EM',
 'econ.GN',
 'econ.TH',
 'eess.AS',
 'eess.IV',
 'eess.SP',
 'eess.SY',
 'funct-an',
 'gr-qc',
 'hep-ex',
 'hep-lat',
 'hep-ph',
 'hep-th',
 'math-ph',
 'math.AC',
 'math.AG',
 'math.AP',
 'math.AT',
 'math.CA',
 'math.CO',
 'math.CT',
 'math.CV',
 'math.DG',
 'math.DS',
 'math.FA',
 'math.GM',
 'math.GN',
 'math.GR',
 'math.GT',
 'math.HO',
 'math.IT',
 'math.KT',
 'math.LO',
 'math.MG',
 'math.MP',
 'math.NA',
 'math.NT',
 'math.OA',
 'math.OC',
 'math.PR',
 'math.QA',
 'math.RA',
 'math.RT',
 'math.SG',
 'math.SP',
 'math.ST',
 'mtrl-th',
 'nlin.AO',
 'nlin.CD',
 'nlin.CG',
 'nlin.PS',
 'nlin.SI',
 'nucl-ex',
 'nucl-th',
 'patt-sol',
 'physics.acc-ph',
 'physics.ao-ph',
 'physics.app-ph',
 'physics.atm-clus',
 'physics.atom-ph',
 'physics.bio-ph',
 'physics.chem-ph',
 'physics.class-ph',
 'physics.comp-ph',
 'physics.data-an',
 'physics.ed-ph',
 'physics.flu-dyn',
 'physics.gen-ph',
 'physics.geo-ph',
 'physics.hist-ph',
 'physics.ins-det',
 'physics.med-ph',
 'physics.optics',
 'physics.plasm-ph',
 'physics.pop-ph',
 'physics.soc-ph',
 'physics.space-ph',
 'q-alg',
 'q-bio',
 'q-bio.BM',
 'q-bio.CB',
 'q-bio.GN',
 'q-bio.MN',
 'q-bio.NC',
 'q-bio.OT',
 'q-bio.PE',
 'q-bio.QM',
 'q-bio.SC',
 'q-bio.TO',
 'q-fin.CP',
 'q-fin.EC',
 'q-fin.GN',
 'q-fin.MF',
 'q-fin.PM',
 'q-fin.PR',
 'q-fin.RM',
 'q-fin.ST',
 'q-fin.TR',
 'quant-ph',
 'solv-int',
 'stat.AP',
 'stat.CO',
 'stat.ME',
 'stat.ML',
 'stat.OT',
 'stat.TH',
 'supr-con'}
特征处理

任务要求对2019年以后的paper进行分析,所以首先要对时间特征进行预处理,从而得到2019年以后的所有种类的论文:

data["year"] = pd.to_datetime(data["update_date"]).dt.year #将update_date从例如2019-02-20的str变为datetime格式,并提取处year
del data["update_date"] #删除 update_date特征,其使命已完成
data = data[data["year"] >= 2019] #找出 year 中2019年以后的数据,并将其他数据删除
# data.groupby(['categories','year']) #以 categories 进行排序,如果同一个categories 相同则使用 year 特征进行排序
data.reset_index(drop=True, inplace=True) #重新编号
data #查看结果
abstractauthorsauthors_parsedcategoriescommentsdoiidjournal-reflicensereport-nosubmittertitleversionsyear
0We systematically explore the evolution of t...Sung-Chul Yoon, Philipp Podsiadlowski and Step...[[Yoon, Sung-Chul, ], [Podsiadlowski, Philipp,...astro-ph15 pages, 15 figures, 3 tables, submitted to M...10.1111/j.1365-2966.2007.12161.x0704.0297NoneNoneNoneSung-Chul YoonRemnant evolution after a carbon-oxygen white ...[{'version': 'v1', 'created': 'Tue, 3 Apr 2007...2019
1Cofibrations are defined in the category of ...B. Dugmore and PP. Ntumba[[Dugmore, B., ], [Ntumba, PP., ]]math.AT27 pagesNone0704.0342NoneNoneNonePatrice Ntumba PunguCofibrations in the Category of Frolicher Spac...[{'version': 'v1', 'created': 'Tue, 3 Apr 2007...2019
2We explore the effect of an inhomogeneous ma...T.V. Zaqarashvili and K Murawski[[Zaqarashvili, T. V., ], [Murawski, K, ]]astro-ph6 pages, 3 figures, accepted in A&A10.1051/0004-6361:200772460704.0360NoneNoneNoneZaqarashviliTorsional oscillations of longitudinally inhom...[{'version': 'v1', 'created': 'Tue, 3 Apr 2007...2019
3This paper has been removed by arXiv adminis...Sezgin Aygun, Ismail Tarhan, Husnu Baysal[[Aygun, Sezgin, ], [Tarhan, Ismail, ], [Baysa...gr-qcThis submission has been withdrawn by arXiv ad...10.1088/0256-307X/24/2/0150704.0525Chin.Phys.Lett.24:355-358,2007NoneNoneSezgin Ayg\"unOn the Energy-Momentum Problem in Static Einst...[{'version': 'v1', 'created': 'Wed, 4 Apr 2007...2019
4The most massive elliptical galaxies show a ...Antonio Pipino (1,3), Thomas H. Puzia (2,4), a...[[Pipino, Antonio, ], [Puzia, Thomas H., ], [M...astro-ph32 pages (referee format), 9 figures, ApJ acce...10.1086/5195460704.0535Astrophys.J.665:295-305,2007NoneNoneAntonio PipinoThe Formation of Globular Cluster Systems in M...[{'version': 'v1', 'created': 'Wed, 4 Apr 2007...2019
5Differential and total cross-sections for ph...J. Junkersfeld (for the CB-ELSA collaboration)[[Junkersfeld, J., , for the CB-ELSA collabora...nucl-ex8 pages, 13 figures10.1140/epja/i2006-10302-70704.0710Eur.Phys.J.A31:365-372,2007NoneNoneJoerg JunkersfeldPhotoproduction of pi0 omega off protons for E...[{'version': 'v1', 'created': 'Thu, 5 Apr 2007...2019
6In a ring of s-wave superconducting material...Walter A. Simmons and Sandip S. Pakvasa[[Simmons, Walter A., ], [Pakvasa, Sandip S., ]]quant-ph5 pages, pdf formatNone0704.0803NoneNoneNoneJosephine NanaoGeometric Phase and Superconducting Flux Quant...[{'version': 'v1', 'created': 'Thu, 5 Apr 2007...2019
7We study the Dirichlet problem associated to...Xuan Hien Nguyen[[Nguyen, Xuan Hien, ]]math.DG30 pagesNone0704.0981Adv. Differential Equations 15 (2010), no. 5-6...NoneNoneXuan Hien NguyenConstruction of Complete Embedded Self-Similar...[{'version': 'v1', 'created': 'Sat, 7 Apr 2007...2019
8We report a measurement of D0-D0bar mixing i...L.M. Zhang, et al (for the Belle Collaboration)[[Zhang, L. M., ]]hep-ex6 pages, 4 figures, Submitted to Physical Revi...10.1103/PhysRevLett.99.1318030704.1000Phys.Rev.Lett.99:131803,2007NoneBELLE-CONF-0702Liming ZhangMeasurement of D0-D0bar mixing in D0->Ks pi+ p...[{'version': 'v1', 'created': 'Sat, 7 Apr 2007...2019
9We present single pointing observations of S...P.D. Klaassen and C.D. Wilson[[Klaassen, P. D., ], [Wilson, C. D., ]]astro-ph34 pages, 9 figures, accepted for publication ...10.1086/5187600704.1245Astrophys.J.663:1092-1102,2007NoneNonePamela KlaassenOutflow and Infall in a Sample of Massive Star...[{'version': 'v1', 'created': 'Tue, 10 Apr 200...2019
10The proton spin structure is not understood ...K. Aoki (for the PHENIX Collaboration)[[Aoki, K., , for the PHENIX Collaboration]]hep-ex4 pages, 3 figures, to be published in the Pro...10.1063/1.27507910704.1369AIPConf.Proc.915:339-342,2007NoneNoneKazuya AokiDouble Helicity Asymmetry of Inclusive pi0 Pro...[{'version': 'v1', 'created': 'Wed, 11 Apr 200...2019
11Abridged... Blue stragglers (BSS) are though...Y. Momany, E.V. Held, I. Saviane, S. Zaggia, L...[[Momany, Y., ], [Held, E. V., ], [Saviane, I....astro-phAccepted for publication in Astronomy & Astrop...10.1051/0004-6361:200670240704.1430NoneNoneNoneSimone Zaggia R.The blue plume population in dwarf spheroidal ...[{'version': 'v1', 'created': 'Wed, 11 Apr 200...2019
12The spatial Fourier spectrum of the electron...Yasha Gindikin and Vladimir A. Sablikov[[Gindikin, Yasha, ], [Sablikov, Vladimir A., ]]cond-mat.str-el cond-mat.mes-hall10 pages, 11 figures. Misprints fixed10.1103/PhysRevB.76.0451220704.1445Phys. Rev. B 76, 045122 (2007)http://arxiv.org/licenses/nonexclusive-distrib...NoneYasha GindikinDeformed Wigner crystal in a one-dimensional q...[{'version': 'v1', 'created': 'Wed, 11 Apr 200...2019
13The Gemini Planet (GPI) imager is an "extrem...James R. Graham (1), Bruce Macintosh (2), Rene...[[Graham, James R., ], [Macintosh, Bruce, ], [...astro-phWhite paper submitted to the NSF-NASA-DOE Astr...None0704.1454NoneNoneNoneJames R. GrahamGround-Based Direct Detection of Exoplanets wi...[{'version': 'v1', 'created': 'Wed, 11 Apr 200...2019
14We present ACS/HST coronagraphic observation...D.R. Ardila, D.A. Golimowski, J.E. Krist, M. C...[[Ardila, D. R., ], [Golimowski, D. A., ], [Kr...astro-phAccepted to ApJNone0704.1507NoneNoneNoneDavid ArdilaHST/ACS Coronagraphic Observations of the Dust...[{'version': 'v1', 'created': 'Wed, 11 Apr 200...2019
15We have selected a sample of 88 nearby (z<0....J. A. L. Aguerri, R. Sanchez-Janssen and C. Mu...[[Aguerri, J. A. L., ], [Sanchez-Janssen, R., ...astro-ph19 pages, 11 figures, accepted for publication...10.1051/0004-6361:200664780704.1579NoneNoneNoneJose Alfonso Lopez AguerriA Study of Catalogued Nearby Galaxy Clusters i...[{'version': 'v1', 'created': 'Thu, 12 Apr 200...2019
16Photoproduction of pi0 mesons was studied wi...H. van Pee, O. Bartholomy, V. Crede (for the C...[[van Pee, H., , for the CB-ELSA Collaboration...nucl-ex17 pages, 17 figures10.1140/epja/i2006-10160-30704.1776Eur.Phys.J.A31:61-77,2007NoneNoneJoerg JunkersfeldPhotoproduction of pi0-mesons off protons from...[{'version': 'v1', 'created': 'Fri, 13 Apr 200...2019
17We investigate the dissipation of magnetic f...Hideki Maki and Hajime Susa[[Maki, Hideki, ], [Susa, Hajime, ]]astro-ph12 pages, 7 figures, PASJ accepted10.1093/pasj/59.4.7870704.1853NoneNoneNoneHajime SusaDissipation of Magnetic Flux in Primordial Sta...[{'version': 'v1', 'created': 'Sat, 14 Apr 200...2019
18A long duration photon beam can induce macro...G. Barbiellini (1,2), A. Galli (2,3), L. Amati...[[Barbiellini, G., ], [Galli, A., ], [Amati, L...astro-ph3 pages, no figure, to be published in "The Pr...10.1063/1.27573180704.2135AIPConf.Proc.921:265-267,2007NoneNoneAlessandra GalliRelativistic interaction of a high intensity p...[{'version': 'v1', 'created': 'Tue, 17 Apr 200...2019
19The environment of high-redshift galaxies is...A.P.M. Fangano, A. Ferrara and P. Richter[[Fangano, A. P. M., ], [Ferrara, A., ], [Rich...astro-ph27 pages, 27 figures. Submitted to MNRAS. Full...10.1111/j.1365-2966.2007.12220.x0704.2143NoneNoneNoneAlessio FanganoAbsorption features of high redshift galactic ...[{'version': 'v1', 'created': 'Tue, 17 Apr 200...2019
20We describe the methodology and compute the ...Keigo Fukumura and Demosthenes Kazanas[[Fukumura, Keigo, ], [Kazanas, Demosthenes, ]]astro-ph26 pages, 21 b/w figures, accepted for publica...10.1086/5188830704.2159Astrophys.J.664:14-25,2007NoneNoneKeigo FukumuraAccretion Disk Illumination in Schwarzschild a...[{'version': 'v1', 'created': 'Tue, 17 Apr 200...2019
21Feedback from black hole activity is widely ...J.-M. Wang, Y.-M. Chen, C.-S. Yan, C. Hu and W...[[Wang, J. -M., ], [Chen, Y. -M., ], [Yan, C. ...astro-ph1 color figure and 1 table. ApJ Letters in press10.1086/5188070704.2288NoneNoneNoneJian-Min WangSuppressed star formation in circumnuclear reg...[{'version': 'v1', 'created': 'Wed, 18 Apr 200...2019
22Let K be a compact subset of ${\mathbb R}^n$...Athanasios Batakis (MAPMO), Pierre Levitz (PMC...[[Batakis, Athanasios, , MAPMO], [Levitz, Pier...math.CANoneNone0704.2362Pure & Applied Mathematics Quarterly (2011) Vo...NoneNoneAthanasios BatakisOn Brownian flights[{'version': 'v1', 'created': 'Wed, 18 Apr 200...2019
23We calculate relations on characteristic cla...Benjamin McKay (University College Cork)[[McKay, Benjamin, , University College Cork]]math.DG math.AG29 pages (on A4 paper). I split off the result...None0704.2555Adv. Geom. 11 (2011), no. 1, 139-168http://arxiv.org/licenses/nonexclusive-distrib...NoneBenjamin McKayCharacteristic forms of complex Cartan geometries[{'version': 'v1', 'created': 'Thu, 19 Apr 200...2019
24A complete model of helium-like line and con...R. L. Porter and G. J. Ferland[[Porter, R. L., ], [Ferland, G. J., ]]astro-ph28 pages, 7 figures, accepted to ApJ10.1086/5188820704.2642Astrophys.J.664:586-595,2007NoneNoneRyan PorterRevisiting He-like X-ray Emission Line Plasma ...[{'version': 'v1', 'created': 'Fri, 20 Apr 200...2019
25We report the first observation of the decay...T. Medvedeva, R. Chistov, et al (for the Belle...[[Medvedeva, T., ], [Chistov, R., ]]hep-ex5 pages, 2 PostScript figures, 1 table10.1103/PhysRevD.76.0511020704.2652Phys.Rev.D76:051102,2007NoneNoneTatiana MedvedevaObservation of the Decay \bar{B0}-> Ds+ Lambda...[{'version': 'v1', 'created': 'Fri, 20 Apr 200...2019
26We study the charmless baryonic three-body d...M.-Z. Wang, Y.-J. Lee, et al (for the Belle Co...[[Wang, M. -Z., ], [Lee, Y. -J., ]]hep-ex12 pages, 5 figures (11 figure files), PRD pub...10.1103/PhysRevD.76.0520040704.2672Phys.Rev.D76:052004,2007NoneBelle Preprint 2007-19, KEK Preprint 2007-6Minzu WangStudy of B+ to p Lambdabar gamma, p Lambdabar ...[{'version': 'v1', 'created': 'Fri, 20 Apr 200...2019
27Massive stars, supernovae (SNe), and long-du...Jorick S. Vink and Rubina Kotak[[Vink, Jorick S., ], [Kotak, Rubina, ]]astro-ph6 pages, 5 figs, To appear in: "Circumstellar ...None0704.2689NoneNoneNoneJorick S. VinkMass loss from Luminous Blue Variables and Qua...[{'version': 'v1', 'created': 'Fri, 20 Apr 200...2019
28Using data collected with the CLEO III detec...R.A. Briere, et al. (CLEO Collaboration)[[Briere, R. A., ]]hep-ex21 pages postscript,also available through\n ...10.1103/PhysRevD.76.0120050704.2766Phys.Rev.D76:012005,2007NoneCLNS 06/1984, CLEO 06-24Pamela MorehouseComparison of Particle Production in Quark and...[{'version': 'v1', 'created': 'Fri, 20 Apr 200...2019
29Motivated by a proposal to create an optical...Pavel Exner and Martin Fraas[[Exner, Pavel, ], [Fraas, Martin, ]]quant-ph cond-mat.mes-hall math-ph math.MPLaTeX, 12 pages10.1016/j.physleta.2007.05.0130704.2770Phys. Lett. A369 (2007), 393-399NoneNonePavel ExnerA remark on helical waveguides[{'version': 'v1', 'created': 'Fri, 20 Apr 200...2019
.............................................
170588We present a scheme for generating polarizat...Zachary D. Walton, Alexander V. Sergienko, Bah...[[Walton, Zachary D., ], [Sergienko, Alexander...quant-ph6 pages, 3 figures10.1103/PhysRevA.70.052317quant-ph/0405021Phys. Rev. A 70, 052317 (2004)NoneNoneZac WaltonGenerating Polarization-Entangled Photon Pairs...[{'version': 'v1', 'created': 'Tue, 4 May 2004...2019
170589In quant-ph/0406139, we have introduced in a...Elena R. Loubenets[[Loubenets, Elena R., ]]quant-ph math-ph math.MP6 pagesNonequant-ph/0407097NoneNoneNoneElena R. LoubenetsOn validity of the original Bell inequality fo...[{'version': 'v1', 'created': 'Wed, 14 Jul 200...2019
170590Both the set of quantum states and the set o...O.V.Man'ko, and V.I.Man'ko[[Man'ko, O. V., ], [Man'ko, V. I., ]]quant-ph14 pages, to appear in Journal of Russian Lase...10.1023/B:JORR.0000043735.34372.8fquant-ph/0407183Journal of Russian Laser Research (2004) 25: 477NoneNoneOlga Manko VladimirovnaClassical mechanics is not h=0 limit of quantu...[{'version': 'v1', 'created': 'Fri, 23 Jul 200...2019
170591Classical Floyd-Warshall algorithm is used t...A. S. Gupta, A. Pathak[[Gupta, A. S., ], [Pathak, A., ]]quant-phThere was a logical flaw in the reported algor...Nonequant-ph/0502144NoneNoneNoneAnirban PathakQuantum Floyd-Warshall Alorithm[{'version': 'v1', 'created': 'Wed, 23 Feb 200...2019
170592An experiment performed in 2002 by Sciarrino...Sofia Wechsler[[Wechsler, Sofia, ]]quant-phThe author of this article re-considered Sciar...Nonequant-ph/0503232NoneNoneNoneSofia WechslerNonlocality of single fermions - branches that...[{'version': 'v1', 'created': 'Wed, 30 Mar 200...2019
170593For emitters embedded in media of various re...Chang-Kui Duan, Michael F. Reid[[Duan, Chang-Kui, ], [Reid, Michael F., ]]quant-ph9pages, 1 figures, presented on AMN-2 and to a...10.1016/j.cap.2005.11.016quant-ph/0505182Current Applied Physics 6, 348-350 (2006)NoneNoneChang-Kui DuanLocal field effects on the radiative lifetimes...[{'version': 'v1', 'created': 'Tue, 24 May 200...2019
170594The dynamics of a two-mode Bose-Einstein con...B. R. da Cunha and M. C. de Oliveira[[da Cunha, B. R., ], [de Oliveira, M. C., ]]quant-ph cond-mat.other physics.atom-ph9 pages, 5 figures10.1103/PhysRevA.75.063615quant-ph/0602054NoneNoneNoneMarcos C. de OliveiraOptimal Conditions for Atomic Homodyne Detecti...[{'version': 'v1', 'created': 'Sat, 4 Feb 2006...2019
170595Within a well-known decay model describing a...Pavel Exner and Martin Fraas[[Exner, Pavel, ], [Fraas, Martin, ]]quant-ph4 pages, 3 eps figures10.1088/1751-8113/40/6/010quant-ph/0603067J. Phys. A: Math. Theor. 40 (2007), 1333-1340NoneNonePavel ExnerThe decay law can have an irregular character[{'version': 'v1', 'created': 'Wed, 8 Mar 2006...2019
170596We study the thermal entanglement in a two-s...S.Y. Mirafzali, M. Sarbishaei[[Mirafzali, S. Y., ], [Sarbishaei, M., ]]quant-ph5 pages, 3 figuresNonequant-ph/0608169NoneNoneNoneSeyyad Yahya MirafzaliThe effect of anisotropy and external magnetic...[{'version': 'v1', 'created': 'Tue, 22 Aug 200...2019
170597A simple quantum mechanical model consisting...E. Kogan[[Kogan, E., ]]quant-ph cond-mat.mes-hall6 pages, 6 eps figures, revtexNonequant-ph/0609011Nonehttp://arxiv.org/licenses/nonexclusive-distrib...NoneEugene KoganDecay of discrete state resonantly coupled to ...[{'version': 'v1', 'created': 'Sun, 3 Sep 2006...2019
170598The local hidden variable assumption was rep...Sofia Wechsler[[Wechsler, Sofia, ]]quant-phThis article is based on very old information....Nonequant-ph/0610159NoneNoneNoneSofia WechslerAre superluminal "signals" an acceptable hypot...[{'version': 'v1', 'created': 'Thu, 19 Oct 200...2019
170599We study analytic structure of the Green's f...E. Kogan[[Kogan, E., ]]quant-ph cond-mat.mes-hall4 pages, 6 eps figures, latex. arXiv admin not...Nonequant-ph/0611043NoneNoneNoneEugene KoganOn the analytic structure of Green's function ...[{'version': 'v1', 'created': 'Fri, 3 Nov 2006...2019
170600We introduce Bell-type inequalities allowing...Perola Milman (PPM, CERMICS), Arne Keller (PPM...[[Milman, Perola, , PPM, CERMICS], [Keller, Ar...quant-ph4 pages10.1103/PhysRevLett.99.130405quant-ph/0612044Phys. Rev. Lett. 99, 130405 (2007)NoneNoneArne KellerBell-type inequalities for cold heteronuclear ...[{'version': 'v1', 'created': 'Wed, 6 Dec 2006...2019
170601We provide a computational definition of the...Pablo Arrighi, Gilles Dowek[[Arrighi, Pablo, ], [Dowek, Gilles, ]]quant-ph cs.LO cs.PLThe complementary note "On the critical pairs ...10.23638/LMCS-13(1:8)2017quant-ph/0612199Logical Methods in Computer Science, Volume 13...http://arxiv.org/licenses/nonexclusive-distrib...NoneJ\"urgen KoslowskiLineal: A linear-algebraic Lambda-calculus[{'version': 'v1', 'created': 'Fri, 22 Dec 200...2019
170602Recently, Farhi, Goldstone, and Gutmann gave...Andrew M. Childs, Richard Cleve, Stephen P. Jo...[[Childs, Andrew M., ], [Cleve, Richard, ], [J...quant-ph2 pages. v2: updated name of one author10.4086/toc.2009.v005a005quant-ph/0702160Theory of Computing, Vol. 5 (2009) 119-123http://arxiv.org/licenses/nonexclusive-distrib...NoneAndrew M. ChildsDiscrete-query quantum algorithm for NAND trees[{'version': 'v1', 'created': 'Fri, 16 Feb 200...2019
170603The neutral B-meson pair produced at the Ups...A. Go, A. Bay, et al. (for the Belle Collabora...[[Go, A., ], [Bay, A., ]]quant-ph hep-ex8 pages, 2 figures, submitted to Phys. Rev. Lett10.1103/PhysRevLett.99.131802quant-ph/0702267Phys.Rev.Lett.99:131802,2007NoneBelle Preprint 2006-40, KEK Preprint 2006-61Apollo GoMeasurement of EPR-type flavour entanglement i...[{'version': 'v1', 'created': 'Wed, 28 Feb 200...2019
170604.We expound an alternative to the Copenhagen...Arthur Jabs[[Jabs, Arthur, ]]quant-phLatex, 88 pages, 6 figures. The present versio...Nonequant-ph/9606017Nonehttp://arxiv.org/licenses/nonexclusive-distrib...NoneArthur JabsQuantum Mechanics in Terms of Realism[{'version': 'v1', 'created': 'Mon, 17 Jun 199...2019
170605It is shown, that for quantum systems the ve...V. I. Man'ko, G. Marmo, E. C. G. Sudarshan, an...[[Man'ko, V. I., ], [Marmo, G., ], [Sudarshan,...quant-phLatex,14 pages,accepted by Int. Jour.Mod.Phys10.1142/S0217979297000666quant-ph/9612007Int.J.Mod.Phys. B11 (1997) 1281-1296NoneNoneNoneWigner's Problem and Alternative Commutation R...[{'version': 'v1', 'created': 'Sat, 30 Nov 199...2019
170606The q-deformation of harmonic oscillators is...V.I. Man'ko, G.Marmo, F.Zaccaria[[Man'ko, V. I., ], [Marmo, G., ], [Zaccaria, ...quant-ph23 pages,LATEX, to be published in Rend.Sem.Ma...Nonequant-ph/9703020Rend.Sem.Mat.Univ.Politec.Torino 54 (1996) 337...NoneNoneNoneDeformations and Nonlinear Systems[{'version': 'v1', 'created': 'Wed, 12 Mar 199...2019
170607The microscopic approach quantum dissipation...C.P.Sun H.B.Gao, H.F.Dong, S.R.Zhao[[Gao, C. P. Sun H. B., ], [Dong, H. F., ], [Z...quant-ph9 pages,Latex, E-mail address available after ...10.1103/PhysRevE.57.3900quant-ph/9706047Phys.Rev. E57 (1998) 3900-3904NoneITP.AC.97-6-19Chang-Pu SunPartial Factorization of Wave Function for A Q...[{'version': 'v1', 'created': 'Fri, 20 Jun 199...2019
170608We consider the possibility of encoding m cl...Andris Ambainis, Ashwin Nayak, Amnon Ta-Shma, ...[[Ambainis, Andris, ], [Nayak, Ashwin, ], [Ta-...quant-ph cs.CC12 pages, 3 figures. Defines random access cod...Nonequant-ph/9804043NoneNoneNoneAshwin NayakDense Quantum Coding and a Lower Bound for 1-w...[{'version': 'v1', 'created': 'Sat, 18 Apr 199...2019
170609This paper has been superseded by quant-ph/9...Yu Shi[[Shi, Yu, ]]quant-phThis paper has been withdrawnNonequant-ph/9805083NoneNoneNoneYu ShiRemarks on Universal Quantum Computer[{'version': 'v1', 'created': 'Thu, 28 May 199...2019
170610The properties of the time-of-arrival operat...J. G. Muga, C. R. Leavens and J. P. Palao[[Muga, J. G., ], [Leavens, C. R., ], [Palao, ...quant-phREVTEX, 12 pages, 4 postscript figures10.1103/PhysRevA.58.4336quant-ph/9807066Phys.Rev. A58 (1998) 1NoneULL-FIS-980701NoneSpace-time properties of free motion time-of-a...[{'version': 'v1', 'created': 'Thu, 23 Jul 199...2019
170611Without imposing the locality condition,it i...H.Razmi, M.Golshani[[Razmi, H., ], [Golshani, M., ]]quant-ph5 pages LaTeXNonequant-ph/9812029NoneNoneTMU-98-03NoneLocality Is An Unnecessary Assumption of Bell'...[{'version': 'v1', 'created': 'Mon, 14 Dec 199...2019
170612A quantum computer is proposed in which info...Mark S. Sherwin, Atac Imamoglu, Thomas Montroy...[[Sherwin, Mark S., , University of\n Califor...quant-phRevtex 6 pages, 3 postscript figures, minor ty...10.1103/PhysRevA.60.3508quant-ph/9903065NoneNoneNoneTom MontroyQuantum Computation with Quantum Dots and Tera...[{'version': 'v1', 'created': 'Thu, 18 Mar 199...2019
170613We utilize the generation of large atomic co...V. A. Sautenkov, M. D. Lukin, C. J. Bednar, G....[[Sautenkov, V. A., ], [Lukin, M. D., ], [Bedn...quant-phNone10.1103/PhysRevA.62.023810quant-ph/9904032NoneNoneNoneMikhail LukinEnhancement of Magneto-Optic Effects via Large...[{'version': 'v1', 'created': 'Thu, 8 Apr 1999...2019
170614Some explicit traveling wave solutions to a ...Wen-Xiu Ma, Benno Fuchssteiner[[Ma, Wen-Xiu, ], [Fuchssteiner, Benno, ]]solv-int nlin.SI14pages, Latex, to appear in Intern. J. Nonlin...10.1016/0020-7462(95)00064-Xsolv-int/9511005NoneNoneNoneWen-Xiu MaExplicit and Exact Solutions to a Kolmogorov-P...[{'version': 'v1', 'created': 'Tue, 14 Nov 199...2019
170615We consider a hierarchy of many-particle sys...J C Eilbeck, V Z Enol'skii, V B Kuznetsov, D V...[[Eilbeck, J C, ], [Enol'skii, V Z, ], [Kuznet...solv-int nlin.SIplain LaTeX, 28 pagesNonesolv-int/9809008NoneNoneNoneVictor EnolskiiLinear r-Matrix Algebra for a Hierarchy of One...[{'version': 'v1', 'created': 'Wed, 2 Sep 1998...2019
170616Consider the evolution $$ \frac{\pl m_\iy}{\...M. Adler, T. Shiota and P. van Moerbeke[[Adler, M., ], [Shiota, T., ], [van Moerbeke,...solv-int adap-org hep-th nlin.AO nlin.SI42 pagesNonesolv-int/9909010NoneNoneNonePierre van MoerbekePfaff tau-functions[{'version': 'v1', 'created': 'Wed, 15 Sep 199...2019
170617A general solution to the Complex Monge-Amp\...D.B. Fairlie and A.N. Leznov[[Fairlie, D. B., ], [Leznov, A. N., ]]solv-int nlin.SI13 pages, latex, no figures10.1088/0305-4470/33/25/307solv-int/9909014NoneNoneNoneDavid FairlieThe General Solution of the Complex Monge-Amp\...[{'version': 'v1', 'created': 'Thu, 16 Sep 199...2019

170618 rows × 14 columns

筛选数据

这里我们就已经得到了所有2019年以后的论文,下面我们挑选出计算机领域内的所有文章:

#爬取所有的类别
website_url = requests.get('https://arxiv.org/category_taxonomy').text #获取网页的文本数据
soup = BeautifulSoup(website_url,'html.parser') #爬取数据,这里使用lxml的解析器,加速 soup = BeautifulSoup(r.text, ‘html.parser’)
root = soup.find('div',{'id':'category_taxonomy_list'}) #找出 BeautifulSoup 对应的标签入口
tags = root.find_all(["h2","h3","h4","p"], recursive=True) #读取 tags

#初始化 str 和 list 变量
level_1_name = ""
level_2_name = ""
level_2_code = ""
level_1_names = []
level_2_codes = []
level_2_names = []
level_3_codes = []
level_3_names = []
level_3_notes = []

#进行
for t in tags:
    if t.name == "h2":
        level_1_name = t.text    
        level_2_code = t.text
        level_2_name = t.text
    elif t.name == "h3":
        raw = t.text
        level_2_code = re.sub(r"(.*)\((.*)\)",r"\2",raw) #正则表达式:模式字符串:(.*)\((.*)\);被替换字符串"\2";被处理字符串:raw
        level_2_name = re.sub(r"(.*)\((.*)\)",r"\1",raw)
    elif t.name == "h4":
        raw = t.text
        level_3_code = re.sub(r"(.*) \((.*)\)",r"\1",raw)
        level_3_name = re.sub(r"(.*) \((.*)\)",r"\2",raw)
    elif t.name == "p":
        notes = t.text
        level_1_names.append(level_1_name)
        level_2_names.append(level_2_name)
        level_2_codes.append(level_2_code)
        level_3_names.append(level_3_name)
        level_3_codes.append(level_3_code)
        level_3_notes.append(notes)

#根据以上信息生成dataframe格式的数据
df_taxonomy = pd.DataFrame({
    'group_name' : level_1_names,
    'archive_name' : level_2_names,
    'archive_id' : level_2_codes,
    'category_name' : level_3_names,
    'categories' : level_3_codes,
    'category_description': level_3_notes
    
})

#按照 "group_name" 进行分组,在组内使用 "archive_name" 进行排序
df_taxonomy.groupby(["group_name","archive_name"])
df_taxonomy

筛选结果如下所示:

group_namearchive_namearchive_idcategory_namecategoriescategory_description
0Computer ScienceComputer ScienceComputer ScienceArtificial Intelligencecs.AICovers all areas of AI except Vision, Robotics...
1Computer ScienceComputer ScienceComputer ScienceHardware Architecturecs.ARCovers systems organization and hardware archi...
2Computer ScienceComputer ScienceComputer ScienceComputational Complexitycs.CCCovers models of computation, complexity class...
3Computer ScienceComputer ScienceComputer ScienceComputational Engineering, Finance, and Sciencecs.CECovers applications of computer science to the...
4Computer ScienceComputer ScienceComputer ScienceComputational Geometrycs.CGRoughly includes material in ACM Subject Class...
5Computer ScienceComputer ScienceComputer ScienceComputation and Languagecs.CLCovers natural language processing. Roughly in...
6Computer ScienceComputer ScienceComputer ScienceCryptography and Securitycs.CRCovers all areas of cryptography and security ...
7Computer ScienceComputer ScienceComputer ScienceComputer Vision and Pattern Recognitioncs.CVCovers image processing, computer vision, patt...
8Computer ScienceComputer ScienceComputer ScienceComputers and Societycs.CYCovers impact of computers on society, compute...
9Computer ScienceComputer ScienceComputer ScienceDatabasescs.DBCovers database management, datamining, and da...
10Computer ScienceComputer ScienceComputer ScienceDistributed, Parallel, and Cluster Computingcs.DCCovers fault-tolerance, distributed algorithms...
11Computer ScienceComputer ScienceComputer ScienceDigital Librariescs.DLCovers all aspects of the digital library desi...
12Computer ScienceComputer ScienceComputer ScienceDiscrete Mathematicscs.DMCovers combinatorics, graph theory, applicatio...
13Computer ScienceComputer ScienceComputer ScienceData Structures and Algorithmscs.DSCovers data structures and analysis of algorit...
14Computer ScienceComputer ScienceComputer ScienceEmerging Technologiescs.ETCovers approaches to information processing (c...
15Computer ScienceComputer ScienceComputer ScienceFormal Languages and Automata Theorycs.FLCovers automata theory, formal language theory...
16Computer ScienceComputer ScienceComputer ScienceGeneral Literaturecs.GLCovers introductory material, survey material,...
17Computer ScienceComputer ScienceComputer ScienceGraphicscs.GRCovers all aspects of computer graphics. Rough...
18Computer ScienceComputer ScienceComputer ScienceComputer Science and Game Theorycs.GTCovers all theoretical and applied aspects at ...
19Computer ScienceComputer ScienceComputer ScienceHuman-Computer Interactioncs.HCCovers human factors, user interfaces, and col...
20Computer ScienceComputer ScienceComputer ScienceInformation Retrievalcs.IRCovers indexing, dictionaries, retrieval, cont...
21Computer ScienceComputer ScienceComputer ScienceInformation Theorycs.ITCovers theoretical and experimental aspects of...
22Computer ScienceComputer ScienceComputer ScienceMachine Learningcs.LGPapers on all aspects of machine learning rese...
23Computer ScienceComputer ScienceComputer ScienceLogic in Computer Sciencecs.LOCovers all aspects of logic in computer scienc...
24Computer ScienceComputer ScienceComputer ScienceMultiagent Systemscs.MACovers multiagent systems, distributed artific...
25Computer ScienceComputer ScienceComputer ScienceMultimediacs.MMRoughly includes material in ACM Subject Class...
26Computer ScienceComputer ScienceComputer ScienceMathematical Softwarecs.MSRoughly includes material in ACM Subject Class...
27Computer ScienceComputer ScienceComputer ScienceNumerical Analysiscs.NAcs.NA is an alias for math.NA. Roughly include...
28Computer ScienceComputer ScienceComputer ScienceNeural and Evolutionary Computingcs.NECovers neural networks, connectionism, genetic...
29Computer ScienceComputer ScienceComputer ScienceNetworking and Internet Architecturecs.NICovers all aspects of computer communication n...
.....................
125PhysicsPhysicsphysicsPlasma Physicsphysics.plasm-phDescription coming soon
126PhysicsPhysicsphysicsPopular Physicsphysics.pop-phDescription coming soon
127PhysicsPhysicsphysicsPhysics and Societyphysics.soc-phDescription coming soon
128PhysicsPhysicsphysicsSpace Physicsphysics.space-phDescription coming soon
129PhysicsQuantum Physicsquant-phQuantum Physicsquant-phDescription coming soon
130Quantitative BiologyQuantitative BiologyQuantitative BiologyBiomoleculesq-bio.BMDNA, RNA, proteins, lipids, etc.; molecular st...
131Quantitative BiologyQuantitative BiologyQuantitative BiologyCell Behaviorq-bio.CBCell-cell signaling and interaction; morphogen...
132Quantitative BiologyQuantitative BiologyQuantitative BiologyGenomicsq-bio.GNDNA sequencing and assembly; gene and motif fi...
133Quantitative BiologyQuantitative BiologyQuantitative BiologyMolecular Networksq-bio.MNGene regulation, signal transduction, proteomi...
134Quantitative BiologyQuantitative BiologyQuantitative BiologyNeurons and Cognitionq-bio.NCSynapse, cortex, neuronal dynamics, neural net...
135Quantitative BiologyQuantitative BiologyQuantitative BiologyOther Quantitative Biologyq-bio.OTWork in quantitative biology that does not fit...
136Quantitative BiologyQuantitative BiologyQuantitative BiologyPopulations and Evolutionq-bio.PEPopulation dynamics, spatio-temporal and epide...
137Quantitative BiologyQuantitative BiologyQuantitative BiologyQuantitative Methodsq-bio.QMAll experimental, numerical, statistical and m...
138Quantitative BiologyQuantitative BiologyQuantitative BiologySubcellular Processesq-bio.SCAssembly and control of subcellular structures...
139Quantitative BiologyQuantitative BiologyQuantitative BiologyTissues and Organsq-bio.TOBlood flow in vessels, biomechanics of bones, ...
140Quantitative FinanceQuantitative FinanceQuantitative FinanceComputational Financeq-fin.CPComputational methods, including Monte Carlo, ...
141Quantitative FinanceQuantitative FinanceQuantitative FinanceEconomicsq-fin.ECq-fin.EC is an alias for econ.GN. Economics, i...
142Quantitative FinanceQuantitative FinanceQuantitative FinanceGeneral Financeq-fin.GNDevelopment of general quantitative methodolog...
143Quantitative FinanceQuantitative FinanceQuantitative FinanceMathematical Financeq-fin.MFMathematical and analytical methods of finance...
144Quantitative FinanceQuantitative FinanceQuantitative FinancePortfolio Managementq-fin.PMSecurity selection and optimization, capital a...
145Quantitative FinanceQuantitative FinanceQuantitative FinancePricing of Securitiesq-fin.PRValuation and hedging of financial securities,...
146Quantitative FinanceQuantitative FinanceQuantitative FinanceRisk Managementq-fin.RMMeasurement and management of financial risks ...
147Quantitative FinanceQuantitative FinanceQuantitative FinanceStatistical Financeq-fin.STStatistical, econometric and econophysics anal...
148Quantitative FinanceQuantitative FinanceQuantitative FinanceTrading and Market Microstructureq-fin.TRMarket microstructure, liquidity, exchange and...
149StatisticsStatisticsStatisticsApplicationsstat.APBiology, Education, Epidemiology, Engineering,...
150StatisticsStatisticsStatisticsComputationstat.COAlgorithms, Simulation, Visualization
151StatisticsStatisticsStatisticsMethodologystat.MEDesign, Surveys, Model Selection, Multiple Tes...
152StatisticsStatisticsStatisticsMachine Learningstat.MLCovers machine learning papers (supervised, un...
153StatisticsStatisticsStatisticsOther Statisticsstat.OTWork in statistics that does not fit into the ...
154StatisticsStatisticsStatisticsStatistics Theorystat.THstat.TH is an alias for math.ST. Asymptotics, ...

155 rows × 6 columns

数据分析及可视化

首先看一下所有大类的paper数量分布:

_df = data.merge(df_taxonomy, on="categories", how="left").drop_duplicates(["id","group_name"]).groupby("group_name").agg({"id":"count"}).sort_values(by="id",ascending=False).reset_index()

_df

我们使用merge函数,以两个dataframe共同的属性 “categories” 进行合并,并以 “group_name” 作为类别进行统计,统计结果放入 “id” 列中并排序。

结果如下:

group_nameid
0Physics38379
1Mathematics24495
2Computer Science18087
3Statistics1802
4Electrical Engineering and Systems Science1371
5Quantitative Biology886
6Quantitative Finance352
7Economics173

下面我们使用饼图进行上图结果的可视化:

fig = plt.figure(figsize=(15,12))
explode = (0, 0, 0, 0.2, 0.3, 0.3, 0.2, 0.1) 
plt.pie(_df["id"],  labels=_df["group_name"], autopct='%1.2f%%', startangle=160, explode=explode)
plt.tight_layout()
plt.show()

论文大类分类数量统计图

下面统计在计算机各个子领域2019年后的paper数量:

group_name="Computer Science"
cats = data.merge(df_taxonomy, on="categories").query("group_name == @group_name")
cats.groupby(["year","category_name"]).count().reset_index().pivot(index="category_name", columns="year",values="id") 

我们同样使用 merge 函数,对于两个dataframe 共同的特征 categories 进行合并并且进行查询。然后我们再对于数据进行统计和排序从而得到以下的结果:

year2019
category_name
Artificial Intelligence558
Computation and Language2153
Computational Complexity131
Computational Engineering, Finance, and Science108
Computational Geometry199
Computer Science and Game Theory281
Computer Vision and Pattern Recognition5559
Computers and Society346
Cryptography and Security1067
Data Structures and Algorithms711
Databases282
Digital Libraries125
Discrete Mathematics84
Distributed, Parallel, and Cluster Computing715
Emerging Technologies101
Formal Languages and Automata Theory152
General Literature5
Graphics116
Hardware Architecture95
Human-Computer Interaction420
Information Retrieval245
Logic in Computer Science470
Machine Learning177
Mathematical Software27
Multiagent Systems85
Multimedia76
Networking and Internet Architecture864
Neural and Evolutionary Computing235
Numerical Analysis40
Operating Systems36
Other Computer Science67
Performance45
Programming Languages268
Robotics917
Social and Information Networks202
Software Engineering659
Sound7
Symbolic Computation44
Systems and Control415

我们可以从结果看出,Computer Vision and Pattern Recognition(计算机视觉与模式识别)类是CS中paper数量最多的子类另外,Computation and Language(计算与语言)以及Cryptography and Security(密码学与安全)的2019年paper数量均超过1000或接近1000,这与我们的认知是一致的。

心得体会

在这次任务里,我学习了 Pandas 的基础操作、数据预处理、数据筛选及数据可视化相关的知识,收获颇多,将会继续努力!

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
根据引用\[1\]和引用\[2\]的内容,描述性统计分析是一种通过使用少量的描述指标来概括大量的原始数据统计分析方法。在数据分析中,大部分变量都是定距变量,通过进行定距变量的描述性统计,可以得到数据的概要统计指标,如平均值、最大值、最小值、标准差、百分位数、中位数、偏度系数和峰度系数等。这些指标可以帮助数据分析者从整体上对数据进行宏观的把握,为后续更深入的数据分析做好准备。 根据引用\[3\]的内容,如果你使用的是Stata软件进行描述性统计分析,你可以使用以下代码: ``` outreg2 using 描述性统计.doc, replace sum(log) keep(gap gtp size lev roa labor age indratio cash top1 soe olddep avgwage lnpgdp DA msac) title(Decriptive statistics) outreg2 ``` 这段代码将会生成一个名为"描述性统计.doc"的文档,其中包含了你选择的变量的描述性统计结果。你可以根据需要修改代码中的变量列表和输出文件名。 #### 引用[.reference_title] - *1* *2* [第3章 Stata描述统计](https://blog.csdn.net/qq_45112156/article/details/118334864)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^insert_down1,239^v3^insert_chatgpt"}} ] [.reference_item] - *3* [实证论文复刻|描述性统计分析 stata](https://blog.csdn.net/weixin_50381726/article/details/129279660)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^insert_down1,239^v3^insert_chatgpt"}} ] [.reference_item] [ .reference_list ]

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值