生物信息学简史，A brief history of bioinformatics- Briefings in Bioinformatics综述解读

cling5899

已于 2022-04-02 13:08:33 修改

阅读量1.8k

点赞数

分类专栏：生物信息学日常学习文章标签：学习其他

于 2022-03-22 10:01:29 首次发布

本文链接：https://blog.csdn.net/dunghill_cock/article/details/123652801

版权

日常学习同时被 2 个专栏收录

12 篇文章 0 订阅

订阅专栏

生物信息学

10 篇文章 21 订阅

订阅专栏

本文回顾了生物信息学从1950年代至今的历史，从最初的蛋白质分析到DNA序列研究的转变。文章提及了Margaret Dayhoff作为首位生物信息学家的贡献，以及在DNA序列分析、基因克隆、PCR技术、基因组学和结构生物信息学等领域的重要进展。随着计算机科学的发展，生物信息学工具和方法不断演进，从早期的COMPROTEIN软件到现代的高通量测序和生物大数据处理。

摘要由CSDN通过智能技术生成

A brief history of bioinformatics 原文

DOI号：10.1093/bib/bby063
在这里插入图片描述

参考链接

生物信息学简史-生信自学网

A Brief history of Bioinformatics
1950–1970: The origins
Protein analysis was the starting point
生物信息学在DNA中的应用要落后于蛋白质研究将近20年时间，因为50年代早期普遍认为蛋白质才是遗传信息载体。
Dayhoff: the first bioinformatician
Margaret Dayhoff（1925-1983）是一位美国物理化学家，他开创了计算方法在生物化学领域的应用
COMPROTEIN：the first bioinformatics software
a complete computer program for the IBM 7090’ designed to determine protein
primary structure using Edman peptide sequencing data
用于使用Edman测序数据确定蛋白质的一级结构，使用fortran 语言开发并运行在打孔卡上，完全运行在这个软件运行在IBM 7090 大型机上
后来三氨基酸缩写变成单字母缩写，就有了1965年蛋白质序列和结构图谱，而它是有史以来第一个生物序列数据库
The computer-assisted genealogy of life
1970 by Needleman and Wunsch , who developed the first dynamic program-ming algorithm for pairwise protein sequence alignments
第一个成对蛋白质序列比对的动态编程算法(多序列比对MSA),但是价值不大
A mathematical framework for amino acid substitutions
In 1978, Dayhoff, Schwartz and Orcutt [34] contributed to another bioinformatics milestone by developing the first probabilistic model of amino acid substitutions.
1970–1980: Paradigm shift from protein to DNA
analysis
Deciphering of the DNA language: the genetic code
1968年，64个密码子都被解析出来，DNA 成为可读信息后要求我们能都快速获得DNA序列。
Cost-efficient reading of DNA
Maxam–Gilbert sequencing method in 1976（quit）
The first DNA sequencing method，radioactivity and haz-
ardous chemicals largely
1977,Frederick Sanger’‘plus and minus’ DNA sequencing
The first software dedicated to analyzing Sanger sequencing
reads was published by Roger Staden in 1979
搜索Sanger凝胶读数之间的重叠;
验证，编辑和连接序列读数到重叠群;
注释和操作序列文件
第一个包含附加字符（Staden称为“不确定代码”）的序列分析软件，以用于记录序列读取中的非确定性碱基
Using DNA sequences in phylogenetic inference
Felsenstein 1973, 1979.the first to develop a maximum likelihood (ML) method to
infer phylogenetic trees from DNA sequences
Felsenstein 是第一个利用最大似然（ML）方法从DNA序列推断系统发育树的软件。
1980–1990: Parallel advances in biology and
computer science
Molecular methods to target and amplify specific genes
一开始与基因是不能通过生物化学方法像蛋白质、RNA一样进行分馏，然后单独测序
Jackson, Symons and Berg (1972) used restriction endonucleases and DNA ligase to cut and insert the circular SV40 viral DNA into lambda DNA
产生出数百万拷贝的单个DNA插入物
The second milestone in manipulating DNA was the polymerase chain reaction (PCR), which allows to amplify DNA without cloning procedures.
1971年由Kjell Kleppe 等人首次提出，Kary Mullis使用tap酶发明PCR
基因克隆和PCR现在常用于DNA文库制备，这对于获得序列数据至关重要。
Access to computers and specialized software
小计算机进入市场
In 1984, the University of Wisconsin Genetics
Computer Group published the eponymous ‘GCG’ software suite. The GCG package was a collection of 33 command-line
tools to manipulate DNA, RNA or protein sequences.
1984年，威斯康星大学遗传学计算机课题组(Genetics Computer Group)发表了与他们同名的“GCG”软件合集。GCG软件包是包括33个命令行工具的集合，可以用于操作DNA，RNA或蛋白质序列。要记住，这是为序列分析开发的第一个软件集合
Bioinformatics and the free software movement
In 1985, Richard Stallman published the GNU Manifesto, which outlined his motivation for creating a free Unix-based operating system called GNU (GNU’s Not Unix)
GNU 宣言，概述了他创建名为GNU（基于Unix）免费操作系统的动机
a journal specialized in bioinformatics, Computer Applications in the Biosciences (CABIOS), was established in 1985.now named Bioinformatics.
1985年成立，现在这个期刊已经更名为为Bioinformatics
Desktop computers and new programming languages
Larry Wall于1987年创建Perl（实用提取和报告语言）是一种高级，多范式，解释性脚本语言,BioPerl于1996年的发展
Perl的灵活性，加上其严格的语法，很容易导致代码可读性低下。这使得Perl代码维护变得困难
Guido van Rossum于1989年开发Python
Python更简单的语法使代码读取和维护更容易
1990–2000: Genomics, structural
bioinformatics and the information superhighway
Dawn of the genomics era
1995, the first complete genome sequencing of a free-living organism (Haemophilus influenzae) was sequenced by The Institute for Genomic Research (TIGR) led by geneticist J. Craig Venter
流感嗜血杆菌全基因组测序
人类基因组计划于1991年由美国国立卫生研究院（NIH）发起,1998年，Celera Genomics私人公司也开始
那时候测序的 reads 仍然要使用Sanger毛细管测序仪产生。最大的测序量也不过是每个run产生96个长度800 bp 的 reads，这比二代测序仪要低几个数量级。对人类基因组进行测序（3.0 Gbp）需要大约40 000个runs 才能得到一倍的覆盖率。
20世纪90年代中后期，一些基于perl的开创性软件被开发出来用于组装全基因组测序reads
Bioinformatics online
1993年在网上公布的世界上第一个核苷酸序列数据库 EMBL Nucleotide Sequence Data Library
1992年GenBank数据库也成为NCBI负责的主要内容之一
NCBI于1994年开始提供在线服务，随后建立了今天仍在使用的几大主要数据库：Genomes（1995），PubMed（1997）和Human Genome（1999）
Beyond sequence analysis: structural bioinformatics
蛋白质的第一个三维结构，即肌红蛋白的三维结构，是在1958年使用X射线衍射实验确定的。然而，Pauling和Corey在1951年提出关于蛋白质结构预测并发表了两篇报道α-螺旋和β-折叠预测的文章才是第一个真正意义的里程碑。现在，人们可以使用计算机进行计算并预测蛋白质的二级和三级结构。
2000–2010: High-throughput bioinformatics
Second-generation sequencing
DNA测序随着二代测序（也称为新一代测序或NGS）的出现而平民化。这种测序始于454焦磷酸测序技术，该技术允许在一台机器对数千至数百万个DNA分子进行测序。处理来自454数据的黄金标准工具 Newbler 至今仍然由Roche 维护，直到2016年454逐步被淘汰。
Biological Big Data
High-performance bioinformatics and collaborative computing
高性能计算的重要性也促使一些公司，如亚马逊和微软提供生物信息学服务
2010–Today: Present and future perspectives
Clearly defining the bioinformatician profession
Towards modeling life as a whole: systems biology
对整个生物体及其环境进行计算建模，同时考虑所有分子类别。