数据科学与大数据分析学习笔记-1理论练习题

大数据分析理论练习题


//题目和题目答案参考于上课的课件和网络,由个人整理仅供学习参考

1:What are the four (or five) characteristics of Big Data?
大数据的四个(或五个)特征是什么?
4V: Volume, Variety, Velocity and Veracity
数量,多样性,速度,真实性
5V: Volume, Variety, Velocity, Value and Veracity
5个特征在4V的情况下增加了数据价值这一特征。

2. What is an analytic sandbox, and why is it important?

An Analytics Sandbox is a separate environment that is part of the overall data lake architecture, meaning that it is a centralized environment meant to be used by multiple users and is maintained with the support of IT.
析沙箱是一个单独的环境,是整个数据湖架构的一部分,这意味着它是一个集中的环境,意味着可以由多个用户使用,并在它的支持下进行维护。

Traditional data architectures have several additional implications for data scientists
– Predictive analytics and data mining activities are last in the line for data (i.e., low priority)
– Limited to perform in-memory analytics, restricting the size of the datasets they can use
– Projects remain isolated and ad hoc, rather than centrally managed. Exist as nonstandard initiative
One solution: analytic sandboxes

In the analytic sandbox
The environment is controlled by the analyst
-Allows them to install and use the data tools of their choice
-Allows them to manage the scheduling and processing of the data assets
Enables analysts to explore and experiment with internal and external data
Can hold and process large amounts of data efficiently from many different data sources; big data (unstructured), transactional data (structured), web data, social media data, documents, et
对于数据科学家来说,传统的数据架构还有一些额外的含义
–预测分析和数据挖掘活动排在最后一位(即低优先级)
–受限于执行内存分析,限制了他们可以使用的数据集的大小
–项目仍然是孤立和临时的,而不是集中管理。作为非标准倡议存在

所以我们的解决方案是:分析沙盒

3. Explain the difference between BI and Data Science?

Data science deals with predictive analysis and prescriptive analysis, while BI deals with descriptive analysis. Other factors that differentiate are scope, data integration, and skill set.
数据科学处理的是预测分析和规定性分析,BI处理的是描述性分析。其他区分的因素包括范围、数据集成和技能集。

下面两张图片可以很清楚的理解两者之间的区别。
BI偏向是描述性而数据科学具有预测属性
图片来自网络
图片来自网络

4. Describe the challenges of the current analytical architecture for data scientists.
描述当前分析架构对数据科学家的挑战

Traditional data architectures inhibit data exploration and more sophisticated analysis.
传统的数据架构抑制数据探索和更复杂的分析。

It is evident that the architect should be concerned about the data and how we can deal with it. the toughest part for the data scientists is to analyze the data keenly to figure out the flaws and identify the issue. they should make the data in more readable format to the users. but now a days in order to avoid the manual glitches we have got new machine learning and deep learning. Big data would help the scientists to reach out to a greater number of data. they also use the data warehousing wherein the data challenges across the applications are met. Virtual data handling could be considered one of the best options for data scientist. Future exploration of data and appropriate model selection helps in the analysis of data for the scientists. explaining data science into the business language is important.

5. What are the key skills and characteristics of a data scientist?
What do data scientists do?
– Reframe business challenges to analytical challenges
– Design, implement, and deploy statistical models and data mining techniques on Big Data
• This is mainly what people think about them
– Develop insights that lead to actionable recommendations to derive new business value
数据科学家是做什么的?
–将业务挑战重新定义为分析挑战
–在大数据上设计、实施和部署统计模型和数据挖掘技术
•这主要是人们对他们的看法
–开发见解,提出可行的建议,以获得新的业务价值

The main characteristic of the data scientists is so much of patience and dealing with curiosity. they have the skills to interpret the data and see to it that they extract the maximum information from the data and deal with the vast raw data. they make use of various technologies and algorithms which help them in doing the take more accurately and determine exact data. analytical results would always help in getting the cost optimization, saving labor and costs. Data science is not a new field, but new discoveries are made every year. This is because great data scientists are always looking for alternative ways to solve problems. This includes searching for new and optimal ways to acquire and merge data, preprocess and engineer features, or develop models and improve their run for getting the accurate information. Although technical skills are important for a data scientist’s success, many characteristics are inherent and each and everything cannot be taught in a class. These characteristics can be acquired, but it takes time and practice and requires internal desire to learn new things and apply the learning’s in their activities.

6. How much data is involved in big Data?

Big Data does not necessarily imply that TB of data need to be processed at a given time.
There is no size requirement and no threshold value of the amount of data.
大数据并不一定意味着需要在给定的时间内处理大量的数据。没有大小要求,也没有数据量的阈值。

Big data refer to the processing of data string that may never end. -----我老师的原话

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值