- 什么是大数据
定义:大数据是在可接受的时间内,对相关信息或者数据进行捕获,存储,搜索,共享,传送,分析和可视化的大型数据集。
大数据分析是通过检查大量的数据来获取洞察力的过程。
- 大数据的优势
1:系统性的研究:更好的了解目标客户,在医保行业削减开支,增加零售业的营业利润率,通过运营效率的提升带来十亿美金的资金节约等;
2:业务流程的变革:通过分析及跟踪表现和行为提高运动成绩,改善科研,通过更好的监控改善执法,通过更多的信息化决策改进金融交易;
- 挖掘各种大数据
大数据是由“大量数据”演变而来,另外涉及数据类型和数据多样化的概念。
- 大数据的演化
Evolution in Big Data technologies, help businesses to:
Enhance and streamline existing databases 增加和合理化现有的数据库
Add insight to existing opportunities 洞悉存在的机遇
Explore and exploit new opportunities 探索和利用新机遇
Provide faster access to information 提供更快的信息访问
Allow storage of large volumes of information 存储大量信息
Allow faster crunching of data for better insights 更快的处理数据,提高洞察力
- 大数据的结构化
Structured Data:
Refers to the data organized in a specific format 以预定义格式组织数据
Resides in fixed fields within a record or file 驻留在一个记录或文件中固定字段上的数据
Has a format on which the entities and their attributes are mapped 具有实体-属性映射的格式化数据
Allows querying and reporting against predetermined data types 用于对预定数据类型进行查询和报告
Some sources for structured data are:
Relational databases 关系型数据库
Flat files in the record format 使用记录格式的平面文件
Multidimensional databases 多维数据库
Legacy databases 遗留数据库
Customer ID | Name | Product ID | City | State |
12365 | Smith | 241 | Graz | Styria |
23658 | Jack | 365 | Wolfsberg | Carinthia |
32456 | Kady | 421 | Enns | Upper Austria |
Unstructured Data:
Usually contains a lot of metadata (data about data) 一般由元数据组成
Is not consistent across different records 包含不一致的数据
Can be from e-mails, text, audio, video, or image files 由不同的数据格式组成
Some sources for unstructured data are:
Text Internal to an Organization: Documents, logs, survey results, and e-mails 企业内部文本
Data from Social Media: Data collected from the social media platforms such as YouTube, Facebook, Twitter, LinkedIn, and Flickr 来自社交媒体的数据
Mobile Data: Data such as text messages and location information 文本消息或者位置信息
Semi-structured data is a form of structured data that is generally used to organize data into markup tags
for separating semantic elements. 半结构化数据不像关系数据库中的数据那样遵循适当的数据模型结构
- 大数据要素
Volume refers to the massive amount of data contributed by organizations or individuals.
The volume of data is approaching in Exabytes now. 数据量是指由企业或者个人产生的数据的量。
Velocity describes the speed with which data is generated, shared, and processed.
Enterprises can take advantage of such data if it is captured and shared in real-time. 速度用以描述数据生成,捕获和共享的速率。
Different variety of data is contributed by different sources, such as social media, PCs, and mobiles.
Various sources continue to add new types of data to traditional transactional data. 传统数据不断地在添加新的数据类型
- 大数据在商务环境中的应用
- 大数据行业中的就业机会