学术前沿趋势分析

任务一,论文数据统计

任务说明

  • 任务主题:统计2019年全年计算机各个方向论文数量
  • 任务内容:赛题理解、使用Pandas读取数据并进行统计

数据集介绍

  • 数据来源:[数据集地址]https://www.kaggle.com/Cornell-University/arxiv
wget https://cdn.coggle.club/arxiv-metadata-oai-2019.json.zip

  • 数据集格式如下:
字段解释
idarXiv Id,可用于访问论文
submitter论文提交者
authors论文作者
title论文标题
comments论文页数和图表等其他信息
journal-ref论文发表的期刊的信息
doi数字对象标识符,https://www.doi.org
report-no报告编号
categories论文在arXiv系统的所属类别或标签
license文章的许可证
abstract论文摘要
version论文版本
authors_parsed作者的信息
  • 数据集实例
"root":{
		"id":string"0704.0001"
		"submitter":string"Pavel Nadolsky"
		"authors":string"C. Bal\'azs, E. L. Berger, P. M. Nadolsky, C.-P. Yuan"
		"title":string"Calculation of prompt diphoton production cross sections at Tevatron and LHC energies"
		"comments":string"37 pages, 15 figures; published version"
		"journal-ref":string"Phys.Rev.D76:013009,2007"
		"doi":string"10.1103/PhysRevD.76.013009"
		"report-no":string"ANL-HEP-PR-07-12"
		"categories":string"hep-ph"
		"license":NULL
		"abstract":string"  A fully differential calculation in perturbative quantum chromodynamics is presented for the production of massive photon pairs at hadron colliders. All next-to-leading order perturbative contributions from quark-antiquark, gluon-(anti)quark, and gluon-gluon subprocesses are included, as well as all-orders resummation of initial-state gluon radiation valid at next-to-next-to leading logarithmic accuracy. The region of phase space is specified in which the calculation is most reliable. Good agreement is demonstrated with data from the Fermilab Tevatron, and predictions are made for more detailed tests with CDF and DO data. Predictions are shown for distributions of diphoton pairs produced at the energy of the Large Hadron Collider (LHC). Distributions of the diphoton pairs from the decay of a Higgs boson are contrasted with those produced from QCD processes at the LHC, showing that enhanced sensitivity to the signal can be obtained with judicious selection of events."
		"versions":[
				0:{
						"version":string"v1"
						"created":string"Mon, 2 Apr 2007 19:18:42 GMT"
					}
				1:{
						"version":string"v2"
						"created":string"Tue, 24 Jul 2007 20:10:27 GMT"
					}]
		"update_date":string"2008-11-26"
		"authors_parsed":[
				0:[
						0:string"Balázs"
						1:string"C."
						2:string""]
				1:[
						0:string"Berger"
						1:string"E. L."
						2:string""]
				2:[
						0:string"Nadolsky"
						1:string"P. M."
						2:string""]
				3:[
						0:string"Yuan"
						1:string"C. -P."
						2:string""]]
}

arXiv论文类别介绍

我们从arXiv官网,查询到论文的类别名称及其解释。
链接:https://arxiv.org/help/api/user-manual 的 5.3 小节的 Subject Classifications 的部分,或 https://arxiv.org/category_taxonomy, 具体的153种paper的类别部分如下:

'astro-ph': 'Astrophysics',
'astro-ph.CO': 'Cosmology and Nongalactic Astrophysics',
'astro-ph.EP': 'Earth and Planetary Astrophysics',
'astro-ph.GA': 'Astrophysics of Galaxies',
'cs.AI': 'Artificial Intelligence',
'cs.AR': 'Hardware Architecture',
'cs.CC': 'Computational Complexity',
'cs.CE': 'Computational Engineering, Finance, and Science',
'cs.CV': 'Computer Vision and Pattern Recognition',
'cs.CY': 'Computers and Society',
'cs.DB': 'Databases',
'cs.DC': 'Distributed, Parallel, and Cluster Computing',
'cs.DL': 'Digital Libraries',
'cs.NA': 'Numerical Analysis',
'cs.NE': 'Neural and Evolutionary Computing',
'cs.NI': 'Networking and Internet Architecture',
'cs.OH': 'Other Computer Science',
'cs.OS': 'Operating Systems',

代码实现及官方讲解

导入需要的包

#导入所需的package
import seaborn as sns#用于做图
from bs4 import BeautifulSoup #用于爬去arXiv的数据
import re #用于正则表达式,匹配字符串的模型
import requests #用于网络连接,发生网络请求,使用域名获取对应信息
import json #读取数据,我们的数据为json格式
import pandas as pd#用于数据分析
import matplotlib.pyplot as plt#画图工具

#读入数据
data = [] #初始化
#使用with语句的优势:1.自动关闭文件句柄;2.自动显示(处理)文件读取数据异常
with open("arxiv-metadata-oai-2019.json",'r') as f:
    for line in f:
        data.append(json.loads(line))

data = pd.DataFrame(data) #将list转变为DataFrame格式,方便使用pandas进行分析

Json函数

函数描述
json.dumps将Python对象编码成Json字符串
json.dump
json.loads将已编码的Json字符串解码为Python对象
json.dump(obj, fp, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, cls=None, indent=None, separators=None, encoding="utf-8", default=None, sort_keys=False, **kw)

json.dumps(obj, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, cls=None, indent=None, separators=None, encoding="utf-8", default=None, sort_keys=False, **kw)

json.load(fp[, encoding[, cls[, object_hook[, parse_float[, parse_int[, parse_constant[, object_pairs_hook[, **kw]]]]]]]])

json.loads(s[, encoding[, cls[, object_hook[, parse_float[, parse_int[, parse_constant[, object_pairs_hook[, **kw]]]]]]]])
#上面的dumps和loads方法都在内存中转换,下面的dump和load的方法会多一个步骤,dump是把序列化后的字符串写到一个文件中,而
#load是从一个一个文件中读取文件

#然后来介绍dump方法
# import json
# d1 = {'name':'foot'}
#这一步就会把d1做序列化处理后的字符串写到db这个文件中

# json.dump(d1,open('db','w'))
# d1 = json.load(open('db','r'))
# print(d1,type(d1))

# {'name': 'foot'} <class 'dict'>

Json类型转换到Python的类型对照表

Jsonpython
objectdict
arraylist
stringunicode
number(int)int,long
number(real)float
trueTrue
falseFalse
nullNone

Python对象类型转化为Json类型对照表

PythonJson
dictobject
list,tuplearray
str,unicodestring
int,long,floatnumber
Truetrue
Falsefalse
Nonenull
data.shape#显示数据大小
(170618, 14)
data.head()
idsubmitterauthorstitlecommentsjournal-refdoireport-nocategorieslicenseabstractversionsupdate_dateauthors_parsed
00704.0297Sung-Chul YoonSung-Chul Yoon, Philipp Podsiadlowski and Step...Remnant evolution after a carbon-oxygen white ...15 pages, 15 figures, 3 tables, submitted to M...None10.1111/j.1365-2966.2007.12161.xNoneastro-phNoneWe systematically explore the evolution of t...[{'version': 'v1', 'created': 'Tue, 3 Apr 2007...2019-08-19[[Yoon, Sung-Chul, ], [Podsiadlowski, Philipp,...
10704.0342Patrice Ntumba PunguB. Dugmore and PP. NtumbaCofibrations in the Category of Frolicher Spac...27 pagesNoneNoneNonemath.ATNoneCofibrations are defined in the category of ...[{'version': 'v1', 'created': 'Tue, 3 Apr 2007...2019-08-19[[Dugmore, B., ], [Ntumba, PP., ]]
20704.0360ZaqarashviliT.V. Zaqarashvili and K MurawskiTorsional oscillations of longitudinally inhom...6 pages, 3 figures, accepted in A&ANone10.1051/0004-6361:20077246Noneastro-phNoneWe explore the effect of an inhomogeneous ma...[{'version': 'v1', 'created': 'Tue, 3 Apr 2007...2019-08-19[[Zaqarashvili, T. V., ], [Murawski, K, ]]
30704.0525Sezgin Ayg\"unSezgin Aygun, Ismail Tarhan, Husnu BaysalOn the Energy-Momentum Problem in Static Einst...This submission has been withdrawn by arXiv ad...Chin.Phys.Lett.24:355-358,200710.1088/0256-307X/24/2/015Nonegr-qcNoneThis paper has been removed by arXiv adminis...[{'version': 'v1', 'created': 'Wed, 4 Apr 2007...2019-10-21[[Aygun, Sezgin, ], [Tarhan, Ismail, ], [Baysa...
40704.0535Antonio PipinoAntonio Pipino (1,3), Thomas H. Puzia (2,4), a...The Formation of Globular Cluster Systems in M...32 pages (referee format), 9 figures, ApJ acce...Astrophys.J.665:295-305,200710.1086/519546Noneastro-phNoneThe most massive elliptical galaxies show a ...[{'version': 'v1', 'created': 'Wed, 4 Apr 2007...2019-08-19[[Pipino, Antonio, ], [Puzia, Thomas H., ], [M...

数据预处理

首先粗略统计论文的种类信息:

count:一列数据的元素个数
unique:一列数据中元素的种类
top:一列数据中出现频率最高的元素
freq:一列数据中出现频率最高的元素的个数
data['categories'].describe()
count     170618
unique     15592
top        cs.CV
freq        5559
Name: categories, dtype: object

结果表明:共有170618条数据,有15592个子类,其中最多的类型是cs.CV,共出现了5559次。

在这里插入图片描述

如上图所示,部分论文的类别不止一种,因为要判断在本数据集中出现了多少种独立的数据集。

unique_categories = set([i for l in [x.split(' ') for x in data['categories']] for i in l])
print(len(unique_categories))
unique_categories
172





{'acc-phys',
 'adap-org',
 'alg-geom',
 'astro-ph',
 'astro-ph.CO',
 'astro-ph.EP',
 'astro-ph.GA',
 'astro-ph.HE',
 'astro-ph.IM',
 'astro-ph.SR',
 'chao-dyn',
 'chem-ph',
 'cmp-lg',
 'comp-gas',
 'cond-mat',
 'cond-mat.dis-nn',
 'cond-mat.mes-hall',
 'cond-mat.mtrl-sci',
 'cond-mat.other',
 'cond-mat.quant-gas',
 'cond-mat.soft',
 'cond-mat.stat-mech',
 'cond-mat.str-el',
 'cond-mat.supr-con',
 'cs.AI',
 'cs.AR',
 'cs.CC',
 'cs.CE',
 'cs.CG',
 'cs.CL',
 'cs.CR',
 'cs.CV',
 'cs.CY',
 'cs.DB',
 'cs.DC',
 'cs.DL',
 'cs.DM',
 'cs.DS',
 'cs.ET',
 'cs.FL',
 'cs.GL',
 'cs.GR',
 'cs.GT',
 'cs.HC',
 'cs.IR',
 'cs.IT',
 'cs.LG',
 'cs.LO',
 'cs.MA',
 'cs.MM',
 'cs.MS',
 'cs.NA',
 'cs.NE',
 'cs.NI',
 'cs.OH',
 'cs.OS',
 'cs.PF',
 'cs.PL',
 'cs.RO',
 'cs.SC',
 'cs.SD',
 'cs.SE',
 'cs.SI',
 'cs.SY',
 'dg-ga',
 'econ.EM',
 'econ.GN',
 'econ.TH',
 'eess.AS',
 'eess.IV',
 'eess.SP',
 'eess.SY',
 'funct-an',
 'gr-qc',
 'hep-ex',
 'hep-lat',
 'hep-ph',
 'hep-th',
 'math-ph',
 'math.AC',
 'math.AG',
 'math.AP',
 'math.AT',
 'math.CA',
 'math.CO',
 'math.CT',
 'math.CV',
 'math.DG',
 'math.DS',
 'math.FA',
 'math.GM',
 'math.GN',
 'math.GR',
 'math.GT',
 'math.HO',
 'math.IT',
 'math.KT',
 'math.LO',
 'math.MG',
 'math.MP',
 'math.NA',
 'math.NT',
 'math.OA',
 'math.OC',
 'math.PR',
 'math.QA',
 'math.RA',
 'math.RT',
 'math.SG',
 'math.SP',
 'math.ST',
 'mtrl-th',
 'nlin.AO',
 'nlin.CD',
 'nlin.CG',
 'nlin.PS',
 'nlin.SI',
 'nucl-ex',
 'nucl-th',
 'patt-sol',
 'physics.acc-ph',
 'physics.ao-ph',
 'physics.app-ph',
 'physics.atm-clus',
 'physics.atom-ph',
 'physics.bio-ph',
 'physics.chem-ph',
 'physics.class-ph',
 'physics.comp-ph',
 'physics.data-an',
 'physics.ed-ph',
 'physics.flu-dyn',
 'physics.gen-ph',
 'physics.geo-ph',
 'physics.hist-ph',
 'physics.ins-det',
 'physics.med-ph',
 'physics.optics',
 'physics.plasm-ph',
 'physics.pop-ph',
 'physics.soc-ph',
 'physics.space-ph',
 'q-alg',
 'q-bio',
 'q-bio.BM',
 'q-bio.CB',
 'q-bio.GN',
 'q-bio.MN',
 'q-bio.NC',
 'q-bio.OT',
 'q-bio.PE',
 'q-bio.QM',
 'q-bio.SC',
 'q-bio.TO',
 'q-fin.CP',
 'q-fin.EC',
 'q-fin.GN',
 'q-fin.MF',
 'q-fin.PM',
 'q-fin.PR',
 'q-fin.RM',
 'q-fin.ST',
 'q-fin.TR',
 'quant-ph',
 'solv-int',
 'stat.AP',
 'stat.CO',
 'stat.ME',
 'stat.ML',
 'stat.OT',
 'stat.TH',
 'supr-con'}

共有172中论文类别

[i for l in [x.split(' ') for x in data['categories']] for i in l]

上述列表解析式值得注意,在列表解析式中循环的执行会有先后顺序,即按照for出现的先后顺序执行

任务是对2019年以后的论文进行分析,所以首先要对时间特征进行预处理,从而得到2019年以后所有种类的论文。

data.columns
Index(['id', 'submitter', 'authors', 'title', 'comments', 'journal-ref', 'doi',
       'report-no', 'categories', 'license', 'abstract', 'versions',
       'update_date', 'authors_parsed'],
      dtype='object')

data数据中update_date列看起来就是个时间数据,因此对其进行处理

data['year'] = pd.to_datetime(data['update_date']).dt.year
#将update_date转换为datatime格式,并提取year生成新的year列
data_2019 = data[data['year']>=2019].reset_index()
data_2019.head()
indexidsubmitterauthorstitlecommentsjournal-refdoireport-nocategorieslicenseabstractversionsupdate_dateauthors_parsedyear
000704.0297Sung-Chul YoonSung-Chul Yoon, Philipp Podsiadlowski and Step...Remnant evolution after a carbon-oxygen white ...15 pages, 15 figures, 3 tables, submitted to M...None10.1111/j.1365-2966.2007.12161.xNoneastro-phNoneWe systematically explore the evolution of t...[{'version': 'v1', 'created': 'Tue, 3 Apr 2007...2019-08-19[[Yoon, Sung-Chul, ], [Podsiadlowski, Philipp,...2019
110704.0342Patrice Ntumba PunguB. Dugmore and PP. NtumbaCofibrations in the Category of Frolicher Spac...27 pagesNoneNoneNonemath.ATNoneCofibrations are defined in the category of ...[{'version': 'v1', 'created': 'Tue, 3 Apr 2007...2019-08-19[[Dugmore, B., ], [Ntumba, PP., ]]2019
220704.0360ZaqarashviliT.V. Zaqarashvili and K MurawskiTorsional oscillations of longitudinally inhom...6 pages, 3 figures, accepted in A&ANone10.1051/0004-6361:20077246Noneastro-phNoneWe explore the effect of an inhomogeneous ma...[{'version': 'v1', 'created': 'Tue, 3 Apr 2007...2019-08-19[[Zaqarashvili, T. V., ], [Murawski, K, ]]2019
330704.0525Sezgin Ayg\"unSezgin Aygun, Ismail Tarhan, Husnu BaysalOn the Energy-Momentum Problem in Static Einst...This submission has been withdrawn by arXiv ad...Chin.Phys.Lett.24:355-358,200710.1088/0256-307X/24/2/015Nonegr-qcNoneThis paper has been removed by arXiv adminis...[{'version': 'v1', 'created': 'Wed, 4 Apr 2007...2019-10-21[[Aygun, Sezgin, ], [Tarhan, Ismail, ], [Baysa...2019
440704.0535Antonio PipinoAntonio Pipino (1,3), Thomas H. Puzia (2,4), a...The Formation of Globular Cluster Systems in M...32 pages (referee format), 9 figures, ApJ acce...Astrophys.J.665:295-305,200710.1086/519546Noneastro-phNoneThe most massive elliptical galaxies show a ...[{'version': 'v1', 'created': 'Wed, 4 Apr 2007...2019-08-19[[Pipino, Antonio, ], [Puzia, Thomas H., ], [M...2019

得到了所有2019年以后提交的论文,接下来就是挑选出计算机领域内的所有文章:

website_url = requests.get('https://arxiv.org/category_taxonomy').text
#获取网页的文本数据
soup = BeautifulSoup(website_url,'lxml')#爬取数据,使用lxml解析器
root = soup.find('div',{'id':'category_taxonomy_list'})
#找出BeautifulSoup对应的标签入口
tags = root.find_all(['h2','h3','h4','p'],recursive=True)

爬虫分析过程图片
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-7t9H5mIK-1610355533886)(./爬虫1.png)]

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-e85i1JMC-1610355533891)(./爬虫2.png)]

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-7drcNoc5-1610355533900)(./爬虫2-1.png)]

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-7DxTdBzJ-1610355533903)(./爬虫3.png)]

#初始化 str 和 list 变量
level_1_name = ""
level_2_name = ""
level_2_code = ""
level_1_names = []
level_2_codes = []
level_2_names = []
level_3_codes = []
level_3_names = []
level_3_notes = []
for t in tags:
    if t.name == "h2":#t.name指标签</>的内容即‘h2’、‘h3’等
        #h2标签为<h2 class="accordion-head">Mathematics</h2>,我们只需要获取“Mathematics”这个文本内容
        level_1_name = t.text#t.text为去掉</>标签后的文本内容 
        level_2_code = t.text
        level_2_name = t.text
    elif t.name == "h3":
        raw = t.text#<h3>Quantum Physics<br/><span>(quant-ph)</span></h3>,t.text:Quantum Physics(quant-ph)'
        level_2_code = re.sub(r"(.*)\((.*)\)",r"\2",raw) #正则表达式:模式字符串:(.*)\((.*)\);被替换字符串"\2";被处理字符串:raw
        #"(.*)\((.*)\)"匹配第一个括号前的内容和第一个括号内的内容,r"\2"表示获取匹配第二个(.*)的内容
        level_2_name = re.sub(r"(.*)\((.*)\)",r"\1",raw)
    elif t.name == "h4":
        raw = t.text#h4:<h4>stat.TH <span>(Statistics Theory)</span></h4>,h4.text:'stat.TH (Statistics Theory)'
        level_3_code = re.sub(r"(.*) \((.*)\)",r"\1",raw)
        level_3_name = re.sub(r"(.*) \((.*)\)",r"\2",raw)
    elif t.name == "p":
        notes = t.text
        #</p><p>stat.TH is an alias for math.ST. Asymptotics, Bayesian Inference, Decision Theory, Estimation, Foundations, Inference, Testing.</p>
        level_1_names.append(level_1_name)#在上面判断h2、h3、h4时已经赋值
        level_2_names.append(level_2_name)
        level_2_codes.append(level_2_code)
        level_3_names.append(level_3_name)
        level_3_codes.append(level_3_code)
        level_3_notes.append(notes)

#根据以上信息生成dataframe格式的数据
df_taxonomy = pd.DataFrame({
    'group_name' : level_1_names,
    'archive_name' : level_2_names,
    'archive_id' : level_2_codes,
    'category_name' : level_3_names,
    'categories' : level_3_codes,
    'category_description': level_3_notes
    
})

#按照 "group_name" 进行分组,在组内使用 "archive_name" 进行排序
df_taxonomy.groupby(["group_name","archive_name"])
df_taxonomy
group_namearchive_namearchive_idcategory_namecategoriescategory_description
0Computer ScienceComputer ScienceComputer ScienceArtificial Intelligencecs.AICovers all areas of AI except Vision, Robotics...
1Computer ScienceComputer ScienceComputer ScienceHardware Architecturecs.ARCovers systems organization and hardware archi...
2Computer ScienceComputer ScienceComputer ScienceComputational Complexitycs.CCCovers models of computation, complexity class...
3Computer ScienceComputer ScienceComputer ScienceComputational Engineering, Finance, and Sciencecs.CECovers applications of computer science to the...
4Computer ScienceComputer ScienceComputer ScienceComputational Geometrycs.CGRoughly includes material in ACM Subject Class...
.....................
150StatisticsStatisticsStatisticsComputationstat.COAlgorithms, Simulation, Visualization
151StatisticsStatisticsStatisticsMethodologystat.MEDesign, Surveys, Model Selection, Multiple Tes...
152StatisticsStatisticsStatisticsMachine Learningstat.MLCovers machine learning papers (supervised, un...
153StatisticsStatisticsStatisticsOther Statisticsstat.OTWork in statistics that does not fit into the ...
154StatisticsStatisticsStatisticsStatistics Theorystat.THstat.TH is an alias for math.ST. Asymptotics, ...

155 rows × 6 columns

这里主要说明一下代码中的正则表达式

Signature: re.sub(pattern, repl, string, count=0, flags=0)
Docstring:
Return the string obtained by replacing the leftmost
non-overlapping occurrences of the pattern in string by the
replacement repl.  repl can be either a string or a callable;
if a string, backslash escapes in it are processed.  If it is
a callable, it's passed the Match object and must return
a replacement string to be used.

返回通过替换最左边获得的字符串
字符串中模式的非重叠出现
更换代表repl可以是字符串,也可以是可调用的;
如果是字符串,则处理其中的反斜杠转义。如果是
一个可调用对象,它已传递Match对象,并且必须返回
要使用的替换字符串。

pattern : 正则中的模式字符串。
repl : 替换的字符串,也可为一个函数。
string : 要被查找替换的原始字符串。
count : 模式匹配后替换的最大次数,默认 0 表示替换所有的匹配。
flags : 编译时用的匹配模式,数字形式。
其中pattern、repl、string为必选参数
import re
phone = "2004-959-559 #一个电话号码"
#删除注释
num = re.sub(r"#.*$","",phone)
print("电话号码:",num)
#移除非数字的内容
num = re.sub(r'\D','',phone)
print("电话号码:",num)
电话号码: 2004-959-559 
电话号码: 2004959559

数据分析及可视化

接下来我们看一下所有大类的paper数量分布

_df = data_2019.merge(df_taxonomy,on='categories',how='left').drop_duplicates(['id','group_name']).groupby('group_name').agg({"id":"count"}).sort_values(by="id",ascending=False).reset_index()
#groupby('group_name').agg({"id":"count"})等价于.groupby('group_name').count()[['id']]
_df
group_nameid
0Physics38379
1Mathematics24495
2Computer Science18087
3Statistics1802
4Electrical Engineering and Systems Science1371
5Quantitative Biology886
6Quantitative Finance352
7Economics173
fig = plt.figure(figsize=(15,12))
explode = (0, 0, 0, 0.2, 0.3, 0.3, 0.2, 0.1) 
plt.pie(_df["id"],  labels=_df["group_name"], autopct='%1.2f%%', startangle=160, explode=explode)
plt.tight_layout()

plt.savefig("./各类论文分布图.png")
plt.show()

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-ms52m8dE-1610355533917)(output_28_0.svg)]

下面统计在计算机各个子领域2019年后的paper数量:

group_name="Computer Science"
cats = data_2019.merge(df_taxonomy, on="categories").query("group_name == @group_name")#相当于sql select d1.*,d2.* from data_2019 d1 join df_taxonomy d2 on d1.categories=d2.categories where group_name="Computer Science"
cats.groupby(["year","category_name"]).count().reset_index().pivot(index="category_name", columns="year",values="id")
year2019
category_name
Artificial Intelligence558
Computation and Language2153
Computational Complexity131
Computational Engineering, Finance, and Science108
Computational Geometry199
Computer Science and Game Theory281
Computer Vision and Pattern Recognition5559
Computers and Society346
Cryptography and Security1067
Data Structures and Algorithms711
Databases282
Digital Libraries125
Discrete Mathematics84
Distributed, Parallel, and Cluster Computing715
Emerging Technologies101
Formal Languages and Automata Theory152
General Literature5
Graphics116
Hardware Architecture95
Human-Computer Interaction420
Information Retrieval245
Logic in Computer Science470
Machine Learning177
Mathematical Software27
Multiagent Systems85
Multimedia76
Networking and Internet Architecture864
Neural and Evolutionary Computing235
Numerical Analysis40
Operating Systems36
Other Computer Science67
Performance45
Programming Languages268
Robotics917
Social and Information Networks202
Software Engineering659
Sound7
Symbolic Computation44
Systems and Control415

我们可以从结果看出,Computer Vision and Pattern Recognition(计算机视觉与模式识别)类是CS中paper数量最多的子类,遥遥领先于其他的CS子类,并且paper的数量还在逐年增加;另外,Computation and Language(计算与语言)、Cryptography and Security(密码学与安全)以及 Robotics(机器人学)的2019年paper数量均超过1000或接近1000,这与我们的认知是一致的。

  • 2
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值