AMiner Author-Paper-Citation (APC) Network

This is the largest and the most complete academic dataset released by Arnetminer.org up to now (May 30th, 2014).

Overview
This dataset is designed for research purpose only.
The content of this data includes paper information, paper citation, author information and author collaboration.  2,092,356 papers and  8,024,869 citations between them are saved in the file  AMiner-Paper.zip1,712,433 authors are saved in the file  AMiner-Author.zip and  4,258,615 collaboration relationships are saved in the file  AMiner-Coauthor.zip.
FileName
Node
Number
Size
AMiner-Paper.zip

Paper

Citation

2,092,356

8,024,869

595 MB

AMiner-Author.zipAuthor1,712,433167 MB
AMiner-Coauthor.zipCollaboration   4,258,615   
31.5 MB 
Data Description
This dataset consists of three files:
This file saves the paper information and the citation network. The format is as follows:
#index ---- index id of this paper
#* ---- paper title
#@ ---- authors (separated by semicolons)
#t ---- year
#c ---- publication venue
#% ---- the id of references of this paper (there are multiple lines, with each indicating a reference)
#! ---- abstract
The following is an example:
#index 1083734
#* ArnetMiner: extraction and mining of academic social networks
#@ Jie Tang;Jing Zhang;Limin Yao;Juanzi Li;Li Zhang;Zhong Su
#t 2008
#c Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
#% 197394
#% 220708
#% 280819
#% 387427
#% 464434
#% 643007
#% 722904
#% 760866
#% 766409
#% 769881
#% 769906
#% 788094
#% 805885
#% 809459
#% 817555
#% 874510
#% 879570
#% 879587
#% 939393
#% 956501
#% 989621
#% 1117023
#% 1250184
#! This paper addresses several key issues in the ArnetMiner system, which aims at extracting and mining academic social networks. Specifically, the system focuses on: 1) Extracting researcher profiles automatically from the Web; 2) Integrating the publication data into the network from existing digital libraries; 3) Modeling the entire academic network; and 4) Providing search services for the academic network. So far, 448,470 researcher profiles have been extracted using a unified tagging approach. We integrate publications from online Web databases and propose a probabilistic framework to deal with the name ambiguity problem. Furthermore, we propose a unified modeling approach to simultaneously model topical aspects of papers, authors, and publication venues. Search services such as expertise search and people association search have been provided based on the modeling results. In this paper, we describe the architecture and main features of the system. We also present the empirical evaluation of the proposed methods.
This file saves the author information. The format is as follows:
#index ---- index id of this author
#n ---- name  (separated by semicolons)
#a ---- affiliations  (separated by semicolons)
#pc ---- the count of published papers of this author
#cn ---- the total number of citations of this author
#hi ---- the H-index of this author
#pi ---- the P-index with equal A-index of this author
#upi ---- the P-index with unequal A-index of this author
#t ---- extracted keyterms of this author  (separated by semicolons)
The following is an example:
#index 1488277
#n Juanzi Li
#a Tsinghua University;Department of Computer Science & Technology, Tsinghua, University, Beijing, China 100084
#pc 70
#cn 370
#pi 76.3254
#upi 73.7573
#t semantic web;social network;Semantic Annotation;ontology caching;semantic information;knowledge base
This file saves the collaboration network among the authors in the second file. The format is as follows:
#00 11 22 ---- 00 means the index id of one author, 11 means the index id of another author, 22 means the number of collaborations btween them
The following is an example:
#693708 1658058 2
References
If you use this dataset for research, please must cite the following paper:

You please also consider referring to the following papers:

@INPROCEEDINGS{Tang:08KDD,
    AUTHOR = "Jie Tang and Jing Zhang and Limin Yao and Juanzi Li and Li Zhang and Zhong Su",
    TITLE = "ArnetMiner: Extraction and Mining of Academic Social Networks",
    pages = "990-998",
    YEAR = {2008},
    BOOKTITLE = "KDD'08",
@article{Tang:10TKDD,
     author = {Jie Tang and Limin Yao and Duo Zhang and Jing Zhang},
     title = {A Combination Approach to Web User Profiling},
     journal = {ACM TKDD},
     year = {2010},
     volume = {5},
     number = {1},
    pages = {1--44},
@article{Tang:11ML,
     author = {Jie Tang and Jing Zhang and Ruoming Jin and Zi Yang and Keke Cai and Li Zhang and Zhong Su},
     title = {Topic Level Expertise Search over Heterogeneous Networks},
     year = {2011},
     volume = {82},
     number = {2},
     pages = {211--237},
     journal = {Machine Learning Journal},
@article{Tang:12TKDE,
    author = {Jie Tang and Alvis C.M. Fong and Bo Wang and Jing Zhang},
    title = {A Unified Probabilistic Framework for Name Disambiguation in Digital Library},
    journal ={IEEE Transactions on Knowledge and Data Engineering},
    volume = {24},
    number = {6},
    year = {2012},
    pages = {975-987},

@INPROCEEDINGS{Tang:07ICDM,

    AUTHOR = "Jie Tang and Duo Zhang and Limin Yao",
    TITLE = "Social Network Extraction of Academic Researchers",
    PAGES = "292-301",
    YEAR = {2007},
    BOOKTITLE = "ICDM'07",

Created by Huaiyu Wan, Qian Zhang, and Jie Tang.  Click here to edit.  Last updated on May 30, 2014.

  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值