A Comparison of Oracle Berkeley DB and
Relational Database Management Systems
Berkeley DB 和关系型数据库的对比
INTRODUCTION
介绍
Oracle is well known for its industry-leading database engine, Oracle Database.
Oracle Database is an extremely reliable, highly scalable, client-server, relational
database management system (RDBMS). It serves as the information repository for
a wide range of applications in a large variety of industries around the world.
Oracle 以其领先的工业级数据库引擎而著称,Oracle 数据库是一种非常可靠的,高速发展的关系数据库管理系统,在全世界很多工业化应用中,Oracle 为广大应用程序提供了信息仓储服务。
In addition to Oracle Database, the company offers a number of other database
products for systems and applications that have special performance, resource or
run-time requirements. Oracle’s TimesTen database is a high-performance inmemory
engine that provides relational database services with outstanding
responsiveness and throughput. The Oracle Lite database serves the mobile and
device markets, providing easy-to-use storage with synchronization services that
allow occasionally-connected devices to share information with a centralized
Oracle Database server. Oracle Berkeley DB is the company’s only non-relational
storage engine, and is aimed at applications and devices that need fast local storage
without using SQL for data access.
除了Oracle数据库之外,公司也提供了大量的数据库产品,针对存在特殊的平台, 资源,时间性能的需求。Oracle’s TimesTen 数据库拥有杰出的响应速度和系统吞吐量。Oracle Lite 数据库服务于移动和驱动市场, 提供了简单易用的存储,并且支持允许发生偶然连接用于共享 集中Oracle数据库服务器的同步数据服务。Oracle Berkeley DB是公司仅有的非关系数存储引擎, 旨在解决本地数据在没有使用SQL进行数据访问的情况下满足快速的存储需求。
Because Oracle Berkeley DB is non-relational, it is very different from Oracle’s
other database products. Users familiar with Oracle Database are often surprised
by Berkeley DB, and wonder how and where it can best be used. This paper
explores both the similarities and the differences between RDBMS products and
Berkeley DB, in order to help customers decide when Berkeley DB is, and is not, a
good choice for the system that they are building.
由于 Oracle Berkeley DB是非关系型的, 它很土同于Oracle公司的其他数据库产品,熟悉Oracle数据的用户经常对Berkeley DB感到惊奇,疑惑如何,怎样使用它,这篇文章探索了RDBMS产品与Berkeley DB之间的相似和不同之处,从而帮助客户决定何时能够对Berkeley DB作出选择,何时不能选择,在我们创建的系统中。
WHAT IS BERKELEY DB?
Berkeley DB 是什么?
Berkeley DB is a high performance, scalable and reliable non-relational storage
system. An outgrowth of research performed initially at the University of
California at Berkeley , and later continued at Harvard University , Berkeley DB was
designed from the very beginning to be an embeddable database engine, so no
separate server is required, and no runtime human administration is needed.
Berkeley DB 是一种具有高效,快速升级,高可靠性等众多优点的非关系型存储系统。 最开始的研究成功来之 加州大学伯克利分校。后来在哈佛大学得到了发展,一开始 Berkeley DB就被设计成一种嵌入式的数据库引擎, 因此不需要单独的服务器,在线管理员也不需要。
Philosophically, Berkeley DB is a tool for software developers, not for IT
professionals or DBAs. It is intended to provide fast, reliable data storage for
applications. The only way to use Berkeley DB is to write code. There is no
standalone server and no SQL query tool for working with Berkeley DB databases.
Developers often find it easiest to think of Berkeley DB not as a database system,
but rather as a library with a good B-tree, hash table and persistent queue for
storing records in an application. That is a reasonable simplification. Unlike simple
B-trees or hash tables, though, Berkeley DB offers enterprise-grade, concurrent,
transactional storage services, so that multiple threads or processes can operate on
the same collection of B-trees and hash tables at the same time without risk of data
corruption or loss.
从哲学上说,Berkeley DB是针对软件开发人员的一种工具,而不是为IT专业人员或数据库管理员,它更倾向于为应用程序提供快速,可靠的数据存储。 使用Berkeley DB的唯一方法就是编写代码,没有长久的服务器和SQL 查询工具当使用Berkeley DB 存出数据记录的时候。 开发者经常发现把Berkeley DB 想象成一种具有优秀B-Tree,hash table 和persistent queue的库 比 当成一个数据库系统更加简单、那是很合理的简化,不像简单的B-Trees或hash tables, 然而Berkeley DB提供了企业级,支持并发,事务型存储服务,因此在B-Trees 和hash tables上的多线程或多个进程同时操作不会引起数据丢失或冲突。
The Oracle Berkeley DB product line consists of three different embeddable
offerings.
Oracle Berleley DB产品线包含了三个可嵌入的组件。
In the lower left corner, the product named Berkeley DB is an embeddable library
written in the C programming language. It supports different on-disk storage
structures, provides full transactional storage services and offers replication for
fault tolerance and high availability. The product supports C, C++ and Java
language access, as well as a wide variety of popular scripting languages.
在下脚左边,名为 Berkeley DB 的产品是一个可嵌入的库,该库使用C语言完成,它支持不同的硬盘存储结构,提供了完全的事务存储服务和对容错的复制和可用性, 这个产品支持C,C++和JAVA 访问,也支持许多流行脚本语言。
Developers building XML-based applications can choose the Berkeley DB XML
product, layered on top of the C engine. Berkeley DB XML understands XML
schemas, is able to parse and index XML documents, and uses XPATH and
XQuery for data retrieval. It supports C++, Java and a variety of popular scripting
languages.
开发者创建基于XML的应用程序能选择Berkeley DB XML 这个产品,处于C 引擎的顶端,Berkeley DB XML支持XML模式,能够解析和索引XML文档,也可以使用XPATH和XQUERY进行数据查询, 它支持C++ ,Java 和大量流行脚本语言。
Java developers that want the same embeddable storage services as C programmers
get, but who need to deploy a 100% pure Java system, can choose Berkeley DB
Java Edition. This embeddable engine runs as a single JAR in a Java Virtual
Machine. It provides the same transactional storage and retrieval services as the C
product, but is implemented entirely in Java.
那些想要和C程序员具有相同的嵌入式存储服务的Java开发人员,当他们想要开发一个100%的纯Java系统时, 他们可以选择Berkeley DB Java Edition, 他和个嵌入式引擎以一个JAR文件在Java虚拟机上运行,它提供相同的事务存储和检索服务 就像C产品一样, 但完全在Java环境下执行。
CHOOSING THE RIGHT TOOL FOR THE JOB
为你的工作选择正确工具
Sometimes, it is easy to decide how to store data in an application.
有时候,在应用程序中决定怎样存储数据是很容易的。
For example, large accounting or customer support systems often come already
integrated with a relational database product. These systems use the relational
engine to store and retrieve the data they manage, and allow administrators and
others to query and manipulate that data separately, using SQL and SQL-based
reporting tools.
例如,大账户或客户支持系统经常和关系型数据库产品相集成,那些系统使用关系型数据库引擎存储或检索他们所管理的数据, 并且允许管理员和其他人单独查询和操作数据,使用SQL,或基于SQL的报告工具。
In other cases, a simple file system is all that is needed. Word processing
applications, spreadsheets and software development tools like editors and
compilers read from, and write to, the file system. Users of these applications work
with files by remembering what folders they appear in, and what they are named.
No more elaborate query services are required.
然而,有时应用程序不需要所有的关系型数据库引擎,但一些低级的文件系统又达不到要求, 在这种情况下,RDBMS是冗余的,它提供了太多功能,并且太大并且复杂,对比起来,文件系统又显得力不从心,它的功能太少,并缺乏数据库系统所拥有的完整,并发和性能优势。、
Berkeley DB is aimed squarely at that middle ground.
Berkeley DB 正好处于中间状态。
Important Services
重要的服务
When choosing a repository for a new application or service, an architect must
think about the data storage and retrieval services that the application requires.
当为一个应用程序或服务选择知识库的时候,架构师必须考虑所需要的数据存储和检索服务。
Storage
存储
File systems offer fairly rudimentary storage services, as a rule. They allow
applications to read or write any kind of data, but make no guarantees about how
simultaneous updates to the same region of a single file are handled. Applications
that need to make several interdependent changes to different files – for example,
adding a password file entry for a new user, creating a home directory and
allocating a mailbox – get no help from the file system, and must handle that
interdependency themselves. If the system loses power or suffers a software crash
in the middle of a change to a file, most file systems make no promises about what
data will survive after restarting.
文件系统提供很基本的存储服务,他们允许应用程序读写任何数据,但当该数据被访问的时候,不能保证同步更新,需要对几个不同文件创建互相依赖的变化,例如为一个用户添加密码文件, 创建主目录和分配一个邮箱- 文件系统将不能奏效,不需访问互相依赖的文件自身,如果系统对现存的操作失去控制或软件出现异常,大部分文件系统不能保证重启之后那些数据仍然存在。
Most relational databases, by contrast, offer much richer storage services.
Competing updates to the same information in a table are handled according to
simple rules that guarantee the correct outcome. If an application needs to make
several related changes to different tables, it simply starts a transaction, makes the
changes, and commits the transaction. The database system guarantees that none
of the changes will be permanent until the transaction commits, but once it
commits, the changes are all guaranteed to be present, even after a system crash
and restart. Most relational systems do impose strict rules about the kinds of
information they will store – applications must turn their data into records in
tables, and must translate from between the application’s internal representation,
and the database representation, at runtime.
相对照,大部分关系型数据库提供了更丰富的存储服务,完成对一张表中的相同信息进行更新,通过简单的规则就可以保证正确的结果。如果应用程序需要更新多个不同的表,仅仅需要创建一个事务,更新数据库,提交事务,数据库系统就可以保证直到事务结束后才接收其他数据更新,然而,一旦更新提交了,新结果保证可以被呈现,甚至系统出现瘫痪掉或重启后,大部分关系型系统对它们存储的信息强加一些严格的规则-应用程序必须把数据转换到表中, 在运行时,应用程序必须在程序内部展现和数据库展现中做出翻译/转换。
Queries
查询
File systems organize information by filename. Files are stored in folders. In order
to use a file, the application must know the name of the file, and of the (likely
nested) folder in which it is stored. Of course applications exist that can search a
file system for particular file contents, owners and so on, the only built-in query
facility in most file systems is name-based lookup.
文件系统以文件名组织信息,文件存储在文件夹中,为了使用文件,应用程序必须知道文件的名称和文件路径,当然应用程序可以用文件名方式来查询文件内容。
Relational systems allow applications and users to search for information by typing
queries in SQL. These queries can examine the contents of individual records in
individual tables, and can combine information from different tables easily. Table
names are important, but otherwise, a query in a relational system describes the data
it wants instead of naming it.
关系型数据库系统允许应用程序和用户使用SQL进行数据查询,查询能够在每个表中的每个字段上进行,也可以在不同的表中合并数据记录,表名是不同的,除此之外,在关系型数据库中的查询描述他想要的数据而不是为它命名。
Relational systems, by virtue of SQL, provide an important kind of flexibility: They
allow arbitrary queries in the future. In a file system, a file name is fixed and is the
only way to look up a file. In a relational system, any combination of columns in
any combination of tables can be used to search for, and to combine, data. Rather
than prescribing search criteria in advance, relational systems support ad hoc
queries.
关系型系统,在SQL的帮助下,提供了一种重要的特性:他们支持任意可预见的查询,在文件系统中,文件名是固定的,文件名也是查询一个文件的仅有方式,在关系型系统中,不管是不是来自同一个表,任意列都可以用来查询,也可以组合成数据,比起预先指定查询标准,关系型系统支持多表联合查询(ad boc)
Berkeley DB
Berkeley DB offers many of the storage services – transactions, concurrency, and
recovery – of relational database engines. It is closer to a file system in that it stores
any kind of data in any format, but provides no easy tools for ad hoc queries.
Some applications need transactions and recovery, but not the ability to support
arbitrary end-user searches against the data they store. For those applications,
Berkeley DB is often a better choice than either the file system or a relational
database engine. Berkeley DB combines the simplicity of file system-style data
storage and retrieval with the enterprise-grade scalability, reliability and
transactional guarantees of a high-end relational system like Oracle Database.
Berkeley DB 提供许多存储服务- 事务,并发,恢复,这些关系型数据库引擎,更贴切的说它和文件系统很相近,因为可以任一种格式存储数据,然而没有提供很简单的工具在ad boc查询的时候。
开发人员必须编写代码在Berkeley DB中进行那个数据查询。
一些应用程序需要事务和数据恢复,但不支持任意的最终用户在数据库再次查询数据, 对于这些应用来说,Berkeley DB是一种不错的选择。Berkeley DB 组合了文件系统的简单特性和像Oracle等高端关系库的复杂,高级特性。
KEY DIFFERENCES
主要区别
Berkeley DB is different from relational systems in several important ways.
Berkeley DB不同于关系型系统在以下几点。
Berkeley DB Is a Library
Berkeley DB是一个库
Most enterprise-grade relational database engines are heavyweight servers. They are
often installed on a special purpose machine somewhere in the data center.
Applications that want to store data must connect, usually over a network, to the
server, and must read and write data remotely.
大多数企业级关系型数据库都是重量级的服务器。他们被安装在数据中心的特殊目标机器上, 想去存储数据的应用程序必须连接,通过网络访问服务器,并且只能远程读写数据。
Berkeley DB, by contrast, is a library. Each of the three Berkeley DB products
links into the same address space as the application that uses it. If multiple
applications want to share a single database, they can do so. Each links the
Berkeley DB code into its address space, and Berkeley DB uses shared memory
and operating system primitives for mutual exclusion to be sure that each thread of
control cooperates with the others.
对比起来,Berkeley DB 只是一个库,三个Berkeley DB产品中的每个都连接到应用程序使用的地址空间,如果多个应用程序想要分享单独的数据库,他们可以这样做,共享内存和操作系统而不会产生冲突。