Paper reading (二十九):Big Data Biology: Between Eliminative Inferencees and Exploratory Experiments

论文题目:Big Data Biology: Between Eliminative Inferencees and Exploratory Experiments

scholar 引用:17



发表刊物:Philosophy of Science

作者:Emanuele Ratti


Recently, biologists have argued that data-driven biology fosters a new scientific methodology, namely, one that is irreducible(不可复归的) to traditional methodologies of molecular biology defined as the discovery strategies elucidated by mechanistic philosophy. Here I show how data-driven studies can be included in the traditional mechanistic approach in two respects. On the one hand, some studies provide eliminative inferential procedures to prioritize and develop mechanistic hypotheses. On the other, different studies play an exploratory role in providing useful generalization to complement the procedure of prioritization. Overall, this article, aims to shed light on the structure of contemporary research in molecular biology.


  • In this paper  I have tried to make sense of the recent debate on the nature of big data studies in molecular biology.
  • I considered a recent proposal claiming that data-driven studies in biology are hybirdized with traditional methodologies of mechanistic discovery.
  • difference: traditional methodologies provide quasi-algorithmic procedures to follow strategies of decomposition and localization.
  • find a place in comtemporary biology for what I call "mining studies"
  • Molecular biology is undergoing a deep change, especially after the human Genome Project.
  • It seems that the way biological systems are analyzed has not changed as dramatically as other aspects related to biological research.


  • ‘Data-driven’ research is understood as a ‘hypothesis-free’ methodology, while ‘traditional’ molecular biology methodologies are taken to be ‘hypothesis-driven’. 争议的热点就在于‘Data-driven’ research是否会取代‘traditional’ molecular biology methodologies。
  • 提出问题:is data-driven molecular biology a new approach to discovery in biology that mechanistic philosophy cannot account for ?  答案:negative
  • data-driven and hypothesis-driven并非一个二选一的问题,而是需要hybridized。原因:data-driven strategies aim to generate hypotheses;hypotheses are developed and tested with procedures that are consistent with the epistemic program of ‘mechanistic philosophy’.
  • DDHD:the hybridization of data-driven and hypothesis-driven research
  • DDHD面临两个问题:the proposal is mostly informal;the proposal includes only a part of studies that are labeled as data-driven.
  • the aim of this article: I show how DDHD is compatible with the broader approach of mechanistic philosophy; I argue that mining studies play an important role in DDHD.


1. Introduction

2. Data-Driven Research and Hypothesis-Drive Research

2.1 Establishing the Initial Universe of Hypotheses((凭空的)猜想,猜测)

2.2 Eliminating Hypotheses

2.3 The Phase of Hypothesis((有少量事实依据但未被证实的)假说,假设) Testing and Final Remarks on the Data-Driven/Hypothesis-Driven Opposition

3. The Role of Mining Studies in Contemporary Biology

3.1 The Structure of Mining Studies

3.2 What Mining Studies Are Not

        3.2.1 Mining Studies Are Not Driven by Eliminative Induction, and Background Assumptions Are Weaker than in DDHD

        3.2.2 Mining Studies Are Not Explanatory

3.3 Predictions Established by Mining Studies Are Generalizations Forming Additional Background Assumptions for DDHD

3.4 Mining Studies as Instances of Exploratory Experimentations

4. Conclusion


2. Data-Driven Research and Hypothesis-Drive Research

  • DDHD is constituted by three phases:
  1. formulation of an initial set of competing hypotheses;
  2. elimination of false (or less probable) hypotheses;
  3. test (validation) of hypotheses not eliminated in phase 2.
  • I think it is better to frame eliminative inferences as a process of prioritization of theories and hypotheses rather than theory choice.
  • one might frame DDHD phases as follows:
  1. the generation of a preliminary set of hypotheses from an established set of premises;
  2. the prioritization of some hypotheses and discarding of others by means of other premises and new evidence;
  3. the search for more stringent evidence for prioritized hypotheses.
  • in the initial universe of hypotheses there will be more than one true hypothesis.
  • it is worth emphasizing that the structure I have identified does not capture the structure of all big data studies.

2.1 Establishing the Initial Universe of Hypotheses((凭空的)猜想,猜测)

  • The first point to note is that phase 1(formulation of an initial set of competing hypotheses) is not a hypothesis-free step.
  • data-driven research must make use of hypotheses of various kinds. I call these hypotheses “background assumptions.”
  • There are two kinds of background assumptions:
  1. The first set of assumptions concerns the particular scientific problem stimulating a research.
  2. Next, there is a second loose guideline, that is, a tentative solution to the problem.
  • Background assumptions of DDHD not only suggest relevant data but also draw the boundaries of an initial set of abstract hypotheses.
  • a scientific question and tentative answers prescribe facts of interest.

2.2 Eliminating Hypotheses

  • There are two types of criteria :
  1. The first is based on a statistical analysis.
  2. The second criterion is employed not only to eliminate other hypotheses but also to develop more probable guesses. This criterion is biologically driven.

2.3 The Phase of Hypothesis((有少量事实依据但未被证实的)假说,假设) Testing and Final Remarks on the Data-Driven/Hypothesis-Driven Opposition

  • DDHD employs a discovery procedure that is compatible with the ones depicted by mechanistic philosophy.
  • In molecular biology, elucidating mechanisms provide both explanation and prediction. In most cases, explanation and description are equated.
  • knowing how a phenomenon is produced might provide clues on (1) what we should expect if we modify one of the components of the mechanisms or (2) under which conditions we should expect that the mechanisms would be in place
  • it should be uncontroversial that DDHD is compatible with the epistemic perspective embraced by mechanistic philosophers.
  • The interesting part is that DDHD provides a set of mechanical procedures to go through decomposition and localization.
  • there is not an opposition between data-driven and hypothesis-driven approaches.
  • in contemporary biology there is a hybrid of the two (DDHD) are corroborated by actual scientific practices.

3. The Role of Mining Studies in Contemporary Biology

  • Terms such as ‘comparative analyses’, ‘system-level characterizations’, and ‘emerging landscapes’ have become keywords.
  • Despite the increasing number of these studies, their purpose is not clear.

3.1 The Structure of Mining Studies

  1. Scientists look for associations between different metadata labels in order to uncover macroregularities(宏观规律).
  2. Macroregularities are, strictly speaking, predictions in the sense that they provide an expectation of what is likely to be observed in similar contexts.

3.2 What Mining Studies Are Not

        3.2.1 Mining Studies Are Not Driven by Eliminative Induction, and Background Assumptions Are Weaker than in DDHD

  • Background assumptions of mining studies include:
  1. the theoretical basis of the computational tools used to identify associations;
  2. the fact that pattern discovery is metadata-laden that is, it is possible to find associations only within the categories of a preexisting taxonomy system x (e.g., gene ontology).
  • Background assumptions in mining studies provide weaker guidelines than DDHD.
  • Moreover, the tentative answer to this broad problem is not sufficiently specific.

        3.2.2 Mining Studies Are Not Explanatory

  • In mining studies, the ideal of discovering mechanisms plays no role. What mining studies provide are predictions or generalization.

3.3 Predictions Established by Mining Studies Are Generalizations Forming Additional Background Assumptions for DDHD

  • Predictions may also be seen as generalizations
  • generalizations inferred from mining studies might provide some of the eliminative principles used to narrow the universe of hypotheses elaborated in phases 1 and 2 of DDHD.
  • generalizations drawn through mining studies provide new eliminative principles or complement existing ones.

3.4 Mining Studies as Instances of Exploratory Experimentations

  • the goal of mining studies, which aim to obtain patterns of data, extract generalizations, and elaborate new classificatory frameworks.
  • so-called big data biology is still in its infancy, and only recently have scientists started to uncover preliminary generalizations.




当前余额3.43前往充值 >
领取后你会自动成为博主和红包主的粉丝 规则
钱包余额 0


