每周大数据论文(二)Data Mining with Big Data

日常声明:论文均来自于谷歌学术或者其他国外付费论文站,博主只是读论文,译论文,分享知识,如有侵权联系我删除,谢谢。同时希望和大家一起学习,有好的论文可以推荐给我,我翻译了放上来,也欢迎大家关注我的读论文专栏https://blog.csdn.net/column/details/23027.html

Data Mining with Big Data
作者:Xindong Wu, Xingquan Zhu, Gong-Qing Wu, Wei Ding
这篇综述类的文献一共26页。文章分为6部分。
分别是:
1. Introduction
2. Big Data Characteristics: HACE Theorem
3. Data Mining Challenges with Big Data(个人认为这部分最值得精读)
4. Research Initiatives and Projects
5. Related Work
6. Conclusion
Abstract: Big Data concerns large-volume, complex, growing data sets with multiple, autonomous sources. With the fast development of networking, data storage, and the data collection capacity, Big Data is now rapidly expanding in all science and engineering domains, including physical, biological and bio-medical sciences. This article presents a HACE theorem that characterizes the features of the Big Data revolution, and proposes a Big Data processing model, from the data mining perspective. This data-driven model involves demand-driven aggregation of information sources, mining and analysis, user interest modeling, and security and privacy considerations. We analyze the challenging issues in the data-driven model and also in the Big Data revolution.
译:摘要:大数据涉及具有多个自主资源的大容量,复杂且不断增长的数据集。 随着网络,数据存储和数据收集能力的快速发展,大数据在包括物理,生物和生物医学科学在内的所有科学和工程领域迅速扩大。 本文介绍了一个表征大数据革命特征的HACE定理,并从数据挖掘角度提出了一个大数据处理模型。 这种数据驱动模型涉及需求驱动的信息源汇总,挖掘和分析,用户兴趣建模以及安全和隐私考虑。 我们分析了数据驱动模型以及大数据革命中的具有挑战性的问题。
1. Introduction
这一部分就不贴了,内容和大多数综述一样,大致说了这几件事。
1.用莫言的例子指出大数据目前在国际社会上热度
2.用例子指出大数据的体量越来越大,并不断增长
3.结合例子综合论述了,体量不断增大的数据,获取用的数据至关重要,引出数据挖掘的概念。
2. Big Data Characteristics: HACE Theorem
HACE Theorem: Big Data starts with large-volume, heterogeneous, autonomous sources with distributed and decentralized control, and seeks to explore complex and evolving relationships among data.
HACE定理:大数据始于具有分布式和分散控制的大容量,异构,自主的资源,并试图探索数据之间复杂且不断变化的关系。
Exploring the Big Data in this scenario is equivalent to aggregating heterogeneous information from different sources (blind men) to help draw a best possible picture to reveal the genuine gesture of the elephant in a real-time fashion. Indeed, this task is not as simple as asking each blind man to describe his feelings about the elephant and then getting an expert to draw one single picture with a combined view, concerning that each individual may speak a different language (heterogeneous and diverse information sources) and they may even have privacy concerns about the messages they deliberate in the information exchange process.
(本部分省去了几个盲人摸象的故事叙述)
在这种情况下探索大数据相当于汇总来自不同来源(盲人)的异构信息,以帮助绘制出最准确的面貌以实时显示大象的样子。 事实上,这个任务并不像要求每个盲人描述他对大象的感受,然后让专家用综合观点画出一张单一的照片那么简单。而是关注这些问题,比如:每个人可能说不同的语言(产生异构和多样的信息源 ),他们甚至可能会对他们在信息交换过程中所考虑的信息产生隐私担忧。
2.1 Huge Data with Heterogeneous and Diverse Dimensionality
异构且多维度的大量数据
One of the fundamental characteristics of the Big Data is the huge volume of data represented by heterogeneous and diverse dimensionalities. This is because different information collectors use their own schemata for data recording, and the nature of different applications also results in diverse representations of the data.
大数据的基本特征之一是以异构和多样化的维度表示大量的数据。这是因为不同的信息收集者使用他们自己的模式进行数据记录,不同的应用程序的性质也导致数据的表示多样化。
For a DNA or genomic related test, microarray expression images and sequences are used to represent the genetic code information because this is the way that our current techniques acquire the data. Under such circumstances, the heterogeneous features refer to the diff

  • 11
    点赞
  • 31
    收藏
    觉得还不错? 一键收藏
  • 2
    评论
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值