姜春宇1,2,魏凯1,2
1.中国信息通信研究院移动互联网与大数据部,北京 100191
2. 数据中心联盟大数据发展促进委员会,北京 100045
摘要:目前整个大数据技术还处于以开源方式为主导、多种技术并存的阶段。开源技术催生了大量的商业发行版大数据平台软件,大数据企业级市场竞争加剧,如何测试和评估这些大数据平台软件成为新的研究主题。简要地介绍了大数据技术发展的背景以及大数据技术标准的需求,综述了国际大数据平台标准化和评测的现状,详细介绍了数据中心联盟在大数据平台技术标准化和测评方面的实践,最后总结了当前工作的问题,并展望了下一步大数据技术和评测的发展方向。
关键词:大数据 ; 大数据技术标准化 ; 大数据产品评测 ; 数据 ; 负载
中图分类号:TP311 文献标识码:A
doi:10.11959/j.issn.2096-0271.2017040
Basic capability and performance test of big data platform
JIANG Chunyu1,2, WEI Kai1,2
1.Department of Mobile Internet and Big Data,China Academy of Information and Communications Technology,Beijing 100191,China
2. Council for the Promotion of Big Data Development,Beijing 100045,China
Abstract: The whole big data technology is now leaded by open source society which results in coexist of many competing technologies.Open sources also help to cultivate a great number of big data commercial software.The enterprise market is now crowded by various providers.How to evaluate these softwares becomes a new research topic.At the beginning,the development of big data system was briefly reviewed.Then the requirement of big data technology standardization was illustrated.After reviewing the progress of international big data technology standardization,the standardization and test practices in big data products under the organization of Data Center Alliance was introduced.Finally,the drawbacks of current practices were discussed,and the future direction of standardization and test for big dataproducts was summarized.
Key words: big data, big data technology standardization, big data products evaluation, data, workload
论文引用格式:姜春宇, 魏凯. 大数据平台的基础能力和性能测试[J}. 大数据, 2017, 3(4): 37-45.
JIANG C Y, WEI K. Basic capability and performance test of big data platform[J]. Big Data Research, 2017, 3(4): 37-45.
1 引言
大数据的应用和技术起源于互联网,首先是网站和网页的爆发式增长,搜索引擎公司最早感受到了海量数据带来的技术上的挑战,随后兴起的社交网络、视频网站、移动互联网的浪潮加剧了这一挑战。互联网企业发现新数据的增长量、多样性和对处理时效的要求是传统数据库、商业智能纵向扩展架构无法应对的。在此背景下,谷歌公司率先于2004年提出一套分布式数据处理的技术体系,即分布式文件系统——谷歌文件系统(Google file system,GFS