学术前沿趋势分析
任务一,论文数据统计
任务说明
- 任务主题:统计2019年全年计算机各个方向论文数量
- 任务内容:赛题理解、使用Pandas读取数据并进行统计
数据集介绍
- 数据来源:[数据集地址]https://www.kaggle.com/Cornell-University/arxiv
wget https://cdn.coggle.club/arxiv-metadata-oai-2019.json.zip
- 数据集格式如下:
字段 | 解释 |
---|---|
id | arXiv Id,可用于访问论文 |
submitter | 论文提交者 |
authors | 论文作者 |
title | 论文标题 |
comments | 论文页数和图表等其他信息 |
journal-ref | 论文发表的期刊的信息 |
doi | 数字对象标识符,https://www.doi.org |
report-no | 报告编号 |
categories | 论文在arXiv系统的所属类别或标签 |
license | 文章的许可证 |
abstract | 论文摘要 |
version | 论文版本 |
authors_parsed | 作者的信息 |
- 数据集实例
"root":{
"id":string"0704.0001"
"submitter":string"Pavel Nadolsky"
"authors":string"C. Bal\'azs, E. L. Berger, P. M. Nadolsky, C.-P. Yuan"
"title":string"Calculation of prompt diphoton production cross sections at Tevatron and LHC energies"
"comments":string"37 pages, 15 figures; published version"
"journal-ref":string"Phys.Rev.D76:013009,2007"
"doi":string"10.1103/PhysRevD.76.013009"
"report-no":string"ANL-HEP-PR-07-12"
"categories":string"hep-ph"
"license":NULL
"abstract":string" A fully differential calculation in perturbative quantum chromodynamics is presented for the production of massive photon pairs at hadron colliders. All next-to-leading order perturbative contributions from quark-antiquark, gluon-(anti)quark, and gluon-gluon subprocesses are included, as well as all-orders resummation of initial-state gluon radiation valid at next-to-next-to leading logarithmic accuracy. The region of phase space is specified in which the calculation is most reliable. Good agreement is demonstrated with data from the Fermilab Tevatron, and predictions are made for more detailed tests with CDF and DO data. Predictions are shown for distributions of diphoton pairs produced at the energy of the Large Hadron Collider (LHC). Distributions of the diphoton pairs from the decay of a Higgs boson are contrasted with those produced from QCD processes at the LHC, showing that enhanced sensitivity to the signal can be obtained with judicious selection of events."
"versions":[
0:{
"version":string"v1"
"created":string"Mon, 2 Apr 2007 19:18:42 GMT"
}
1:{
"version":string"v2"
"created":string"Tue, 24 Jul 2007 20:10:27 GMT"
}]
"update_date":string"2008-11-26"
"authors_parsed":[
0:[
0:string"Balázs"
1:string"C."
2:string""]
1:[
0:string"Berger"
1:string"E. L."
2:string""]
2:[
0:string"Nadolsky"
1:string"P. M."
2:string""]
3:[
0:string"Yuan"
1:string"C. -P."
2:string""]]
}
arXiv论文类别介绍
我们从arXiv官网,查询到论文的类别名称及其解释。
链接:https://arxiv.org/help/api/user-manual 的 5.3 小节的 Subject Classifications 的部分,或 https://arxiv.org/category_taxonomy, 具体的153种paper的类别部分如下:
'astro-ph': 'Astrophysics',
'astro-ph.CO': 'Cosmology and Nongalactic Astrophysics',
'astro-ph.EP': 'Earth and Planetary Astrophysics',
'astro-ph.GA': 'Astrophysics of Galaxies',
'cs.AI': 'Artificial Intelligence',
'cs.AR': 'Hardware Architecture',
'cs.CC': 'Computational Complexity',
'cs.CE': 'Computational Engineering, Finance, and Science',
'cs.CV': 'Computer Vision and Pattern Recognition',
'cs.CY': 'Computers and Society',
'cs.DB': 'Databases',
'cs.DC': 'Distributed, Parallel, and Cluster Computing',
'cs.DL': 'Digital Libraries',
'cs.NA': 'Numerical Analysis',
'cs.NE': 'Neural and Evolutionary Computing',
'cs.NI': 'Networking and Internet Architecture',
'cs.OH': 'Other Computer Science',
'cs.OS': 'Operating Systems',
代码实现及官方讲解
导入需要的包
#导入所需的package
import seaborn as sns#用于做图
from bs4 import BeautifulSoup #用于爬去arXiv的数据
import re #用于正则表达式,匹配字符串的模型
import requests #用于网络连接,发生网络请求,使用域名获取对应信息
import json #读取数据,我们的数据为json格式
import pandas as pd#用于数据分析
import matplotlib.pyplot as plt#画图工具
#读入数据
data = [] #初始化
#使用with语句的优势:1.自动关闭文件句柄;2.自动显示(处理)文件读取数据异常
with open("arxiv-metadata-oai-2019.json",'r') as f:
for line in f:
data.append(json.loads(line))
data = pd.DataFrame(data) #将list转变为DataFrame格式,方便使用pandas进行分析
Json函数
函数 | 描述 |
---|---|
json.dumps | 将Python对象编码成Json字符串 |
json.dump | |
json.loads | 将已编码的Json字符串解码为Python对象 |
json.dump(obj, fp, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, cls=None, indent=None, separators=None, encoding="utf-8", default=None, sort_keys=False, **kw)
json.dumps(obj, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, cls=None, indent=None, separators=None, encoding="utf-8", default=None, sort_keys=False, **kw)
json.load(fp[, encoding[, cls[, object_hook[, parse_float[, parse_int[, parse_constant[, object_pairs_hook[, **kw]]]]]]]])
json.loads(s[, encoding[, cls[, object_hook[, parse_float[, parse_int[, parse_constant[, object_pairs_hook[, **kw]]]]]]]])
#上面的dumps和loads方法都在内存中转换,下面的dump和load的方法会多一个步骤,dump是把序列化后的字符串写到一个文件中,而
#load是从一个一个文件中读取文件
#然后来介绍dump方法
# import json
# d1 = {'name':'foot'}
#这一步就会把d1做序列化处理后的字符串写到db这个文件中
# json.dump(d1,open('db','w'))
# d1 = json.load(open('db','r'))
# print(d1,type(d1))
# {'name': 'foot'} <class 'dict'>
Json类型转换到Python的类型对照表
Json | python |
---|---|
object | dict |
array | list |
string | unicode |
number(int) | int,long |
number(real) | float |
true | True |
false | False |
null | None |
Python对象类型转化为Json类型对照表
Python | Json |
---|---|
dict | object |
list,tuple | array |
str,unicode | string |
int,long,float | number |
True | true |
False | false |
None | null |
data.shape#显示数据大小
(170618, 14)
data.head()
id | submitter | authors | title | comments | journal-ref | doi | report-no | categories | license | abstract | versions | update_date | authors_parsed | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0704.0297 | Sung-Chul Yoon | Sung-Chul Yoon, Philipp Podsiadlowski and Step... | Remnant evolution after a carbon-oxygen white ... | 15 pages, 15 figures, 3 tables, submitted to M... | None | 10.1111/j.1365-2966.2007.12161.x | None | astro-ph | None | We systematically explore the evolution of t... | [{'version': 'v1', 'created': 'Tue, 3 Apr 2007... | 2019-08-19 | [[Yoon, Sung-Chul, ], [Podsiadlowski, Philipp,... |
1 | 0704.0342 | Patrice Ntumba Pungu | B. Dugmore and PP. Ntumba | Cofibrations in the Category of Frolicher Spac... | 27 pages | None | None | None | math.AT | None | Cofibrations are defined in the category of ... | [{'version': 'v1', 'created': 'Tue, 3 Apr 2007... | 2019-08-19 | [[Dugmore, B., ], [Ntumba, PP., ]] |
2 | 0704.0360 | Zaqarashvili | T.V. Zaqarashvili and K Murawski | Torsional oscillations of longitudinally inhom... | 6 pages, 3 figures, accepted in A&A | None | 10.1051/0004-6361:20077246 | None | astro-ph | None | We explore the effect of an inhomogeneous ma... | [{'version': 'v1', 'created': 'Tue, 3 Apr 2007... | 2019-08-19 | [[Zaqarashvili, T. V., ], [Murawski, K, ]] |
3 | 0704.0525 | Sezgin Ayg\"un | Sezgin Aygun, Ismail Tarhan, Husnu Baysal | On the Energy-Momentum Problem in Static Einst... | This submission has been withdrawn by arXiv ad... | Chin.Phys.Lett.24:355-358,2007 | 10.1088/0256-307X/24/2/015 | None | gr-qc | None | This paper has been removed by arXiv adminis... | [{'version': 'v1', 'created': 'Wed, 4 Apr 2007... | 2019-10-21 | [[Aygun, Sezgin, ], [Tarhan, Ismail, ], [Baysa... |
4 | 0704.0535 | Antonio Pipino | Antonio Pipino (1,3), Thomas H. Puzia (2,4), a... | The Formation of Globular Cluster Systems in M... | 32 pages (referee format), 9 figures, ApJ acce... | Astrophys.J.665:295-305,2007 | 10.1086/519546 | None | astro-ph | None | The most massive elliptical galaxies show a ... | [{'version': 'v1', 'created': 'Wed, 4 Apr 2007... | 2019-08-19 | [[Pipino, Antonio, ], [Puzia, Thomas H., ], [M... |
数据预处理
首先粗略统计论文的种类信息:
count:一列数据的元素个数
unique:一列数据中元素的种类
top:一列数据中出现频率最高的元素
freq:一列数据中出现频率最高的元素的个数
data['categories'].describe()
count 170618
unique 15592
top cs.CV
freq 5559
Name: categories, dtype: object
结果表明:共有170618条数据,有15592个子类,其中最多的类型是cs.CV,共出现了5559次。
如上图所示,部分论文的类别不止一种,因为要判断在本数据集中出现了多少种独立的数据集。
unique_categories = set([i for l in [x.split(' ') for x in data['categories']] for i in l])
print(len(unique_categories))
unique_categories
172
{'acc-phys',
'adap-org',
'alg-geom',
'astro-ph',
'astro-ph.CO',
'astro-ph.EP',
'astro-ph.GA',
'astro-ph.HE',
'astro-ph.IM',
'astro-ph.SR',
'chao-dyn',
'chem-ph',
'cmp-lg',
'comp-gas',
'cond-mat',
'cond-mat.dis-nn',
'cond-mat.mes-hall',
'cond-mat.mtrl-sci',
'cond-mat.other',
'cond-mat.quant-gas',
'cond-mat.soft',
'cond-mat.stat-mech',
'cond-mat.str-el',
'cond-mat.supr-con',
'cs.AI',
'cs.AR',
'cs.CC',
'cs.CE',
'cs.CG',
'cs.CL',
'cs.CR',
'cs.CV',
'cs.CY',
'cs.DB',
'cs.DC',
'cs.DL',
'cs.DM',
'cs.DS',
'cs.ET',
'cs.FL',
'cs.GL',
'cs.GR',
'cs.GT',
'cs.HC',
'cs.IR',
'cs.IT',
'cs.LG',
'cs.LO',
'cs.MA',
'cs.MM',
'cs.MS',
'cs.NA',
'cs.NE',
'cs.NI',
'cs.OH',
'cs.OS',
'cs.PF',
'cs.PL',
'cs.RO',
'cs.SC',
'cs.SD',
'cs.SE',
'cs.SI',
'cs.SY',
'dg-ga',
'econ.EM',
'econ.GN',
'econ.TH',
'eess.AS',
'eess.IV',
'eess.SP',
'eess.SY',
'funct-an',
'gr-qc',
'hep-ex',
'hep-lat',
'hep-ph',
'hep-th',
'math-ph',
'math.AC',
'math.AG',
'math.AP',
'math.AT',
'math.CA',
'math.CO',
'math.CT',
'math.CV',
'math.DG',
'math.DS',
'math.FA',
'math.GM',
'math.GN',
'math.GR',
'math.GT',
'math.HO',
'math.IT',
'math.KT',
'math.LO',
'math.MG',
'math.MP',
'math.NA',
'math.NT',
'math.OA',
'math.OC',
'math.PR',
'math.QA',
'math.RA',
'math.RT',
'math.SG',
'math.SP',
'math.ST',
'mtrl-th',
'nlin.AO',
'nlin.CD',
'nlin.CG',
'nlin.PS',
'nlin.SI',
'nucl-ex',
'nucl-th',
'patt-sol',
'physics.acc-ph',
'physics.ao-ph',
'physics.app-ph',
'physics.atm-clus',
'physics.atom-ph',
'physics.bio-ph',
'physics.chem-ph',
'physics.class-ph',
'physics.comp-ph',
'physics.data-an',
'physics.ed-ph',
'physics.flu-dyn',
'physics.gen-ph',
'physics.geo-ph',
'physics.hist-ph',
'physics.ins-det',
'physics.med-ph',
'physics.optics',
'physics.plasm-ph',
'physics.pop-ph',
'physics.soc-ph',
'physics.space-ph',
'q-alg',
'q-bio',
'q-bio.BM',
'q-bio.CB',
'q-bio.GN',
'q-bio.MN',
'q-bio.NC',
'q-bio.OT',
'q-bio.PE',
'q-bio.QM',
'q-bio.SC',
'q-bio.TO',
'q-fin.CP',
'q-fin.EC',
'q-fin.GN',
'q-fin.MF',
'q-fin.PM',
'q-fin.PR',
'q-fin.RM',
'q-fin.ST',
'q-fin.TR',
'quant-ph',
'solv-int',
'stat.AP',
'stat.CO',
'stat.ME',
'stat.ML',
'stat.OT',
'stat.TH',
'supr-con'}
共有172中论文类别
[i for l in [x.split(' ') for x in data['categories']] for i in l]
上述列表解析式值得注意,在列表解析式中循环的执行会有先后顺序,即按照for出现的先后顺序执行
任务是对2019年以后的论文进行分析,所以首先要对时间特征进行预处理,从而得到2019年以后所有种类的论文。
data.columns
Index(['id', 'submitter', 'authors', 'title', 'comments', 'journal-ref', 'doi',
'report-no', 'categories', 'license', 'abstract', 'versions',
'update_date', 'authors_parsed'],
dtype='object')
data数据中update_date列看起来就是个时间数据,因此对其进行处理
data['year'] = pd.to_datetime(data['update_date']).dt.year
#将update_date转换为datatime格式,并提取year生成新的year列
data_2019 = data[data['year']>=2019].reset_index()
data_2019.head()
index | id | submitter | authors | title | comments | journal-ref | doi | report-no | categories | license | abstract | versions | update_date | authors_parsed | year | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 0704.0297 | Sung-Chul Yoon | Sung-Chul Yoon, Philipp Podsiadlowski and Step... | Remnant evolution after a carbon-oxygen white ... | 15 pages, 15 figures, 3 tables, submitted to M... | None | 10.1111/j.1365-2966.2007.12161.x | None | astro-ph | None | We systematically explore the evolution of t... | [{'version': 'v1', 'created': 'Tue, 3 Apr 2007... | 2019-08-19 | [[Yoon, Sung-Chul, ], [Podsiadlowski, Philipp,... | 2019 |
1 | 1 | 0704.0342 | Patrice Ntumba Pungu | B. Dugmore and PP. Ntumba | Cofibrations in the Category of Frolicher Spac... | 27 pages | None | None | None | math.AT | None | Cofibrations are defined in the category of ... | [{'version': 'v1', 'created': 'Tue, 3 Apr 2007... | 2019-08-19 | [[Dugmore, B., ], [Ntumba, PP., ]] | 2019 |
2 | 2 | 0704.0360 | Zaqarashvili | T.V. Zaqarashvili and K Murawski | Torsional oscillations of longitudinally inhom... | 6 pages, 3 figures, accepted in A&A | None | 10.1051/0004-6361:20077246 | None | astro-ph | None | We explore the effect of an inhomogeneous ma... | [{'version': 'v1', 'created': 'Tue, 3 Apr 2007... | 2019-08-19 | [[Zaqarashvili, T. V., ], [Murawski, K, ]] | 2019 |
3 | 3 | 0704.0525 | Sezgin Ayg\"un | Sezgin Aygun, Ismail Tarhan, Husnu Baysal | On the Energy-Momentum Problem in Static Einst... | This submission has been withdrawn by arXiv ad... | Chin.Phys.Lett.24:355-358,2007 | 10.1088/0256-307X/24/2/015 | None | gr-qc | None | This paper has been removed by arXiv adminis... | [{'version': 'v1', 'created': 'Wed, 4 Apr 2007... | 2019-10-21 | [[Aygun, Sezgin, ], [Tarhan, Ismail, ], [Baysa... | 2019 |
4 | 4 | 0704.0535 | Antonio Pipino | Antonio Pipino (1,3), Thomas H. Puzia (2,4), a... | The Formation of Globular Cluster Systems in M... | 32 pages (referee format), 9 figures, ApJ acce... | Astrophys.J.665:295-305,2007 | 10.1086/519546 | None | astro-ph | None | The most massive elliptical galaxies show a ... | [{'version': 'v1', 'created': 'Wed, 4 Apr 2007... | 2019-08-19 | [[Pipino, Antonio, ], [Puzia, Thomas H., ], [M... | 2019 |
得到了所有2019年以后提交的论文,接下来就是挑选出计算机领域内的所有文章:
website_url = requests.get('https://arxiv.org/category_taxonomy').text
#获取网页的文本数据
soup = BeautifulSoup(website_url,'lxml')#爬取数据,使用lxml解析器
root = soup.find('div',{'id':'category_taxonomy_list'})
#找出BeautifulSoup对应的标签入口
tags = root.find_all(['h2','h3','h4','p'],recursive=True)
爬虫分析过程图片
#初始化 str 和 list 变量
level_1_name = ""
level_2_name = ""
level_2_code = ""
level_1_names = []
level_2_codes = []
level_2_names = []
level_3_codes = []
level_3_names = []
level_3_notes = []
for t in tags:
if t.name == "h2":#t.name指标签</>的内容即‘h2’、‘h3’等
#h2标签为<h2 class="accordion-head">Mathematics</h2>,我们只需要获取“Mathematics”这个文本内容
level_1_name = t.text#t.text为去掉</>标签后的文本内容
level_2_code = t.text
level_2_name = t.text
elif t.name == "h3":
raw = t.text#<h3>Quantum Physics<br/><span>(quant-ph)</span></h3>,t.text:Quantum Physics(quant-ph)'
level_2_code = re.sub(r"(.*)\((.*)\)",r"\2",raw) #正则表达式:模式字符串:(.*)\((.*)\);被替换字符串"\2";被处理字符串:raw
#"(.*)\((.*)\)"匹配第一个括号前的内容和第一个括号内的内容,r"\2"表示获取匹配第二个(.*)的内容
level_2_name = re.sub(r"(.*)\((.*)\)",r"\1",raw)
elif t.name == "h4":
raw = t.text#h4:<h4>stat.TH <span>(Statistics Theory)</span></h4>,h4.text:'stat.TH (Statistics Theory)'
level_3_code = re.sub(r"(.*) \((.*)\)",r"\1",raw)
level_3_name = re.sub(r"(.*) \((.*)\)",r"\2",raw)
elif t.name == "p":
notes = t.text
#</p><p>stat.TH is an alias for math.ST. Asymptotics, Bayesian Inference, Decision Theory, Estimation, Foundations, Inference, Testing.</p>
level_1_names.append(level_1_name)#在上面判断h2、h3、h4时已经赋值
level_2_names.append(level_2_name)
level_2_codes.append(level_2_code)
level_3_names.append(level_3_name)
level_3_codes.append(level_3_code)
level_3_notes.append(notes)
#根据以上信息生成dataframe格式的数据
df_taxonomy = pd.DataFrame({
'group_name' : level_1_names,
'archive_name' : level_2_names,
'archive_id' : level_2_codes,
'category_name' : level_3_names,
'categories' : level_3_codes,
'category_description': level_3_notes
})
#按照 "group_name" 进行分组,在组内使用 "archive_name" 进行排序
df_taxonomy.groupby(["group_name","archive_name"])
df_taxonomy
group_name | archive_name | archive_id | category_name | categories | category_description | |
---|---|---|---|---|---|---|
0 | Computer Science | Computer Science | Computer Science | Artificial Intelligence | cs.AI | Covers all areas of AI except Vision, Robotics... |
1 | Computer Science | Computer Science | Computer Science | Hardware Architecture | cs.AR | Covers systems organization and hardware archi... |
2 | Computer Science | Computer Science | Computer Science | Computational Complexity | cs.CC | Covers models of computation, complexity class... |
3 | Computer Science | Computer Science | Computer Science | Computational Engineering, Finance, and Science | cs.CE | Covers applications of computer science to the... |
4 | Computer Science | Computer Science | Computer Science | Computational Geometry | cs.CG | Roughly includes material in ACM Subject Class... |
... | ... | ... | ... | ... | ... | ... |
150 | Statistics | Statistics | Statistics | Computation | stat.CO | Algorithms, Simulation, Visualization |
151 | Statistics | Statistics | Statistics | Methodology | stat.ME | Design, Surveys, Model Selection, Multiple Tes... |
152 | Statistics | Statistics | Statistics | Machine Learning | stat.ML | Covers machine learning papers (supervised, un... |
153 | Statistics | Statistics | Statistics | Other Statistics | stat.OT | Work in statistics that does not fit into the ... |
154 | Statistics | Statistics | Statistics | Statistics Theory | stat.TH | stat.TH is an alias for math.ST. Asymptotics, ... |
155 rows × 6 columns
这里主要说明一下代码中的正则表达式
Signature: re.sub(pattern, repl, string, count=0, flags=0)
Docstring:
Return the string obtained by replacing the leftmost
non-overlapping occurrences of the pattern in string by the
replacement repl. repl can be either a string or a callable;
if a string, backslash escapes in it are processed. If it is
a callable, it's passed the Match object and must return
a replacement string to be used.
返回通过替换最左边获得的字符串
字符串中模式的非重叠出现
更换代表repl可以是字符串,也可以是可调用的;
如果是字符串,则处理其中的反斜杠转义。如果是
一个可调用对象,它已传递Match对象,并且必须返回
要使用的替换字符串。
pattern : 正则中的模式字符串。
repl : 替换的字符串,也可为一个函数。
string : 要被查找替换的原始字符串。
count : 模式匹配后替换的最大次数,默认 0 表示替换所有的匹配。
flags : 编译时用的匹配模式,数字形式。
其中pattern、repl、string为必选参数
import re
phone = "2004-959-559 #一个电话号码"
#删除注释
num = re.sub(r"#.*$","",phone)
print("电话号码:",num)
#移除非数字的内容
num = re.sub(r'\D','',phone)
print("电话号码:",num)
电话号码: 2004-959-559
电话号码: 2004959559
数据分析及可视化
接下来我们看一下所有大类的paper数量分布
_df = data_2019.merge(df_taxonomy,on='categories',how='left').drop_duplicates(['id','group_name']).groupby('group_name').agg({"id":"count"}).sort_values(by="id",ascending=False).reset_index()
#groupby('group_name').agg({"id":"count"})等价于.groupby('group_name').count()[['id']]
_df
group_name | id | |
---|---|---|
0 | Physics | 38379 |
1 | Mathematics | 24495 |
2 | Computer Science | 18087 |
3 | Statistics | 1802 |
4 | Electrical Engineering and Systems Science | 1371 |
5 | Quantitative Biology | 886 |
6 | Quantitative Finance | 352 |
7 | Economics | 173 |
fig = plt.figure(figsize=(15,12))
explode = (0, 0, 0, 0.2, 0.3, 0.3, 0.2, 0.1)
plt.pie(_df["id"], labels=_df["group_name"], autopct='%1.2f%%', startangle=160, explode=explode)
plt.tight_layout()
plt.savefig("./各类论文分布图.png")
plt.show()
下面统计在计算机各个子领域2019年后的paper数量:
group_name="Computer Science"
cats = data_2019.merge(df_taxonomy, on="categories").query("group_name == @group_name")#相当于sql select d1.*,d2.* from data_2019 d1 join df_taxonomy d2 on d1.categories=d2.categories where group_name="Computer Science"
cats.groupby(["year","category_name"]).count().reset_index().pivot(index="category_name", columns="year",values="id")
year | 2019 |
---|---|
category_name | |
Artificial Intelligence | 558 |
Computation and Language | 2153 |
Computational Complexity | 131 |
Computational Engineering, Finance, and Science | 108 |
Computational Geometry | 199 |
Computer Science and Game Theory | 281 |
Computer Vision and Pattern Recognition | 5559 |
Computers and Society | 346 |
Cryptography and Security | 1067 |
Data Structures and Algorithms | 711 |
Databases | 282 |
Digital Libraries | 125 |
Discrete Mathematics | 84 |
Distributed, Parallel, and Cluster Computing | 715 |
Emerging Technologies | 101 |
Formal Languages and Automata Theory | 152 |
General Literature | 5 |
Graphics | 116 |
Hardware Architecture | 95 |
Human-Computer Interaction | 420 |
Information Retrieval | 245 |
Logic in Computer Science | 470 |
Machine Learning | 177 |
Mathematical Software | 27 |
Multiagent Systems | 85 |
Multimedia | 76 |
Networking and Internet Architecture | 864 |
Neural and Evolutionary Computing | 235 |
Numerical Analysis | 40 |
Operating Systems | 36 |
Other Computer Science | 67 |
Performance | 45 |
Programming Languages | 268 |
Robotics | 917 |
Social and Information Networks | 202 |
Software Engineering | 659 |
Sound | 7 |
Symbolic Computation | 44 |
Systems and Control | 415 |
我们可以从结果看出,Computer Vision and Pattern Recognition(计算机视觉与模式识别)类是CS中paper数量最多的子类,遥遥领先于其他的CS子类,并且paper的数量还在逐年增加;另外,Computation and Language(计算与语言)、Cryptography and Security(密码学与安全)以及 Robotics(机器人学)的2019年paper数量均超过1000或接近1000,这与我们的认知是一致的。