Paper reading (二十四):Sciences and data science

论文题目:Science and data science

scholar 引用:18

页数:4

发表时间:2017.08

发表刊物:National Academy of Sciences

作者:David M. Bleia, and Padhraic Smythd

摘要:Key words:data science,  statistics,  machine learning

Data science has attracted a lot of attention, promising to turn vast amounts of data into useful predictions
and insights. In this article, we ask why scientists should care about data science. To answer, we discuss
data science from three perspectives: statistical, computational, and human. Although each of the three is
a critical component of data science, we argue that the effective combination of all three components is
the essence of what data science is about.

结论:

  • to solve real world problems, a data scientist will need to undertake tasks that are beyond their traditional training.
  • Holistic data science requires that we understand the context of data, appreciate the responsibilities involved in using private and public data, and clearly communicate what a dataset can and cannot tell us about the world.

Introduction:

  • data science is the child of statistics and computer science.
  • genetic data can potentially aid researchers in studying the human genome, helping them understand how it evolves, and how it governs observed traits.
  • Connecting genes and traits at large scale is a problem that is beyond the limits of classical genome analysis, both computationally and statistically.
  • Applying modern statistical and computational tools to modern scientific questions requires significant human judgment and deep disciplinary knowledge.

正文组织架构:

1. Introduction

2. Statistical Perspective

3. Computational Perspective

4. Human Perspective

5. Summary

正文部分内容摘录:

2. Statistical Perspective

  • All datasets involve uncertainty.
  • Statistics relates to data science through multiple statistical subfields: complex and structured data, high dimensionality, and causality.
  • To handle high-dimensional data, statisticians and computer scientists have developed powerful methods involving robustness, regularization, and stability

3. Computational Perspective

  • Computational thinking provides a way to understand and compare their computational footprints.
  • One well-known example of computational thinking revolves around optimization.
  • Another example of computational thinking is sampling methods. Sampling methods help compute approximate solutions of data analysis problems where the exact solutions are too complex for direct mathematical analysis: bootstrap; Bayesian data analysis(Markov chain Monte Carlo (MCMC)).
  • A final example of computational thinking is in scaling data analysis with distributed computing.
  • While statistical thinking offers a suite of methods for understanding data, computational thinking provides the crucial considerations of how to balance statistical accuracy with limited computational resources.

4. Human Perspective

  • understanding a problem domain, deciding which data to acquire and how to process it, exploring and visualizing the data, selecting appropriate statistical models and computational methods, and communicating the results of the analyses.
  • The human perspective reveals how aspects of the data analysis process, such as metadata, data provenance, data analysis workflows, and scientific reproducibility, are critical to modern scientific research.
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值