Paper reading (二十四)：Sciences and data science

最新推荐文章于 2023-06-21 01:54:41 发布

盲人骑瞎马5555

最新推荐文章于 2023-06-21 01:54:41 发布

阅读量218

点赞数

分类专栏： Paper Reading 文章标签： data science

本文链接：https://blog.csdn.net/wxw060709/article/details/101827993

版权

Paper Reading 专栏收录该内容

133 篇文章 9 订阅

订阅专栏

论文题目：Science and data science

scholar 引用：18

页数：4

发表时间：2017.08

发表刊物：National Academy of Sciences

作者：David M. Bleia, and Padhraic Smythd

摘要：Key words：data science, statistics, machine learning

Data science has attracted a lot of attention, promising to turn vast amounts of data into useful predictions
and insights. In this article, we ask why scientists should care about data science. To answer, we discuss
data science from three perspectives: statistical, computational, and human. Although each of the three is
a critical component of data science, we argue that the effective combination of all three components is
the essence of what data science is about.

结论：

to solve real world problems, a data scientist will need to undertake tasks that are beyond their traditional training.
Holistic data science requires that we understand the context of data, appreciate the responsibilities involved in using private and public data, and clearly communicate what a dataset can and cannot tell us about the world.

Introduction：

data science is the child of statistics and computer science.
genetic data can potentially aid researchers in studying the human genome, helping them understand how it evolves, and how it governs observed traits.
Connecting genes and traits at large scale is a problem that is beyond the limits of classical genome analysis, both computationally and statistically.
Applying modern statistical and computational tools to modern scientific questions requires significant human judgment and deep disciplinary knowledge.

正文组织架构：

1. Introduction

2. Statistical Perspective

3. Computational Perspective

4. Human Perspective

5. Summary

正文部分内容摘录：

2. Statistical Perspective

All datasets involve uncertainty.
Statistics relates to data science through multiple statistical subfields: complex and structured data, high dimensionality, and causality.
To handle high-dimensional data, statisticians and computer scientists have developed powerful methods involving robustness, regularization, and stability

3. Computational Perspective

Computational thinking provides a way to understand and compare their computational footprints.
One well-known example of computational thinking revolves around optimization.
Another example of computational thinking is sampling methods. Sampling methods help compute approximate solutions of data analysis problems where the exact solutions are too complex for direct mathematical analysis: bootstrap; Bayesian data analysis（Markov chain Monte Carlo (MCMC)）.
A final example of computational thinking is in scaling data analysis with distributed computing.
While statistical thinking offers a suite of methods for understanding data, computational thinking provides the crucial considerations of how to balance statistical accuracy with limited computational resources.

4. Human Perspective

understanding a problem domain, deciding which data to acquire and how to process it, exploring and visualizing the data, selecting appropriate statistical models and computational methods, and communicating the results of the analyses.
The human perspective reveals how aspects of the data analysis process, such as metadata, data provenance, data analysis workflows, and scientific reproducibility, are critical to modern scientific research.

盲人骑瞎马5555

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
Paper reading (二十四)：Sciences and data science

论文题目：Science and data sciencescholar 引用：18页数：4发表时间：2017.08发表刊物：National Academy of Sciences作者：David M. Bleia, and Padhraic Smythd摘要：Key words：data science, statistics,machine learning...
复制链接

扫一扫

专栏目录