One Size Does Not Fit All

One Size Does Not Fit All 

Last week AWS announced the Amazon  

Relational Database Service (Amazon RDS) and I blogged that it was big step forward for the cloud storage world: Amazon RDS, More Memory, and Lower Prices. This really is an important step forward in that a huge percentage of commercial applications are written to depend upon Relational Databases.  But, I was a bit surprised to get a couple of notes asking about the status ofSimple DB and whether the new service was a replacement. These questions were perhaps best characterized by the forum thread The End is Nigh for SimpleDB[gh1] . I can understand why some might conclude that just having a relational database would be sufficient but the world of structured storage extends far beyond relational systems.In essence[gh2] , one size does not fit all and both SimpleDB and RDS are important components in addressing[gh3]  the needs of the broader database market. 

Relational databases have become soubiquitous[gh4]  that the term “database” is often treated as synonymous[gh5]  with relational databases like Oracle, SQL Server, MySQL, or DB2. However, the termpreceded[gh6]  the invention and implementation of the relational model and non-relational data stores remain important today.

 

Relational databases are incredibly[gh7]  rich and able to support a very broad class of applications but with incredible breadth[gh8]  comes significant complexity. Many applications don’t need the rich programming model of relational systems and some applications are better serviced by lighter-weight, easier-to-administer, and easier-to-scale solutions. Both relational and non-relational structured storage systems are important and no single solution is appropriate for all applications. I’ll refer to this broader, beyond-relational database market as “structured storage” to differentiate it from file stores and blob stores.

 

There are a near infinite number of differenttaxonomies[gh9]  for the structured storage market, but one I find useful is a simple one based upon customerintent[gh10] : 1) features-first, 2) scale-first, 3) simple structure storage, and 4) purpose-optimized stores. In the discussion that follows, I assume that no database would ever be considered as viable[gh11]  that wasn’t secure and didn’t maintain data integrity.  These are base requirements of any reasonable solutions.

 

Feature-First

The feature-first segment is perhaps the simplest to talk about in that there is near universal agreement. After 35 to 40 years, depending upon how you count, Relational Database Management Systems (RDBMSs) are the structured storage system of choice when a feature-rich solution is needed. Common Feature-Firstworkloads[gh12] are enterprise financial systems, human resources systems, and customer relationship management systems. In even very large enterprises, a single database instance can often support the entire workload and nearly all of these workloads are hosted on non-sharded relational database management systems.

 

Examples of products that meet this objective well include OracleSQL ServerDB2MySQLPostgreSQL amongst[gh13] others. And the Amazon Relational Database Service announced last week is a good example of a cloud-based solution. Generally, the feature-first segment use RDBMSs.

 

Scale-First

The Scale-first segment is considerably less clear and the source of much more debate[gh14] . Scale-first applications are those that absolutely must scale without bound[gh15] and being able to do this without restriction is much more important than more features. These applications are exemplified[gh16]  by very high scale web sites such as Facebook, MySpace, Gmail, Yahoo, and Amazon.com. Some of these sites actually do make use of relational databases but many do not. The common theme across all of these services is that scale is more important than features and none of them could possibly run on a single RDBMS. As soon as a single RDBMS instance won’t handle the workload, there are two broad possibilities: 1) shard[gh17]  the application data over a large number of RDBMS systems, or 2) use a highly scalable key-value store.

 

Looking first at sharding[gh18]  over multiple RDBMS instances, this model requires that the programming model be significantlyconstrained[gh19] to not expect cross-database instance joins, aggregations[gh20] , globally unique secondary indexes, global stored procedures, and all the other relational database features that are incredibly hard to scale. Effectively, in this first usage mode, an RDBMS is being used as the implementation but the full relational model is not being exposed to the developer since the full model is incredibly difficult to scale. In this approach, the data is sharded over 10s or even 100s of independent database instances. The Windows Live Messenger group store is an excellent example of the Sharded RDBMS model of Scale-First. 

There may be some that will jump in and say that DB2 Parallel Edition (DB2 PE, now part of the DB2 Enterprise Edition) and Oracle Real Application Clusters (Oracle RAC) actually do scale the full relational model. I was lucky enough to work closely with the DB2 PE team when I was Lead Architect on DB2 so I know it well. There is no question that both DB2 and RAC are great products but, as good as they are, very high scale sites still typically chose to either 1) shard over multiple instances or 2) use a high-scale, key-value store.

 

This first option, that of using an RDBMS as an implementation component, and sharding data over many instances is a perfectly reasonable and rational[gh21]  approach and one that is frequently used. The second option is to use a scalable key-value store. Some key-value store product examples include Project VoldemortRingoScalarisKaiDynomiteMemcacheDBThruDBCouchDBCassandraHBase and Hypertable (seeKey Value Stores).  Amazon SimpleDB is a good example of a cloud-based offering.

 

Simple Structured Storage

There are many applications that have a structured storage requirement but they really don’t need the features, cost, or complexity of an RDBMS. Nor are they focused on the scale required by the scale-first structured storage segment. They just need a simple key value store. A file system or BLOB-store is not sufficiently rich in that simple query and index access is needed but nothing even close to the full set of RDBMS features is needed. Simple, cheap, fast, and low operationalburden[gh22]  are the most important requirements of this segment of the market.

 

Uses of Simple Structured Storage at unremarkable and, as a consequence, there are less visible examples at the low-end of the scale spectrum[gh23]  to reference. Towards the high-end, we have email inbox search at Facebook (using Cassandra), Last.fm reports they will be using Project-Voldemort (using Project-Voldemort), and Amazon uses Dynamo for the retail shopping cart (using Dynamo). Perhaps the widest used example of this class of storage system is Berkeley DB.  On the cloud-side, SimpleDB again is a good example (AdaptiveBlueLivemocha, andAlexa).

 

Purpose-Optimized Stores

Recently Mike Stonebraker wrote an influential[gh24]  paper titled One Size Fits All: An Idea Whose Time Has Come and Gone. In this paper, Mike argued that the existing commercial RDBMS offerings do not meet the needs of many important market segments. In a presentation with the same title, Stonebraker argues that StreamBase special purpose stream processing system  beat the RDBMS solutions in benchmarks[gh25]  by 27x, that Vertica, a special purpose data warehousing[gh26] product beat the RDBMS incumbents[gh27]  by never less than 30x, and H-Store (now VoltDB), a special purpose transaction processing system, beat the standard RDBMS offerings by a full 82x.

 

Many other Purpose-Optimized stores have emerged (for example, Aster DataNetezza, and Greenplum) and this category continues to grow quickly. Clearly there is space and customer need for more than a single solution.

 

Where do SimpleDB and RDS Fit in?

The Amazon RDS service is aimedsquarely[gh28]  at the first category above, Feature-First. This is a segment that needs features and mostly uses RDBMS databases. And RDS is amongst the easiest ways to bring up one or more databases quickly and efficiently without needing to hire a database administrator.

 

Amazon SimpleDB is a good solution for the third category, Simple Structured Storage. SimpleDB is there when you need it, is incredibly easy to use, and is inexpensive.  The SimpleDB team will continue to focus on 1) very high availability, 2) supporting scale without bound, 3) simplicity and ease of use, and 4) lowest possible cost and this service will continue to evolve.

 

The second category, scale-first, is served by both SimpleDB and RDS.  Solutions based upon RDS will shard the data over multiple, independent RDS database instances. Solutions based upon SimpleDB will either use the service directly or shard the data over multiple SimpleDB Domains. Of the two approaches, SimpleDB is the easiest to use and more directly targets this usage segment.

 

The SimpleDB team is incredibly busy right now getting ready for several big announcements over the next 6 to 9 months. Expect to see SimpleDB continue to get easier to use while approaching the goal of scaling without bound. The team is working hard and I’m looking forward to the new features being released.

 

The AWS solution for the final important category, purpose optimized storage, is based upon the Elastic Compute Cloud (EC2) and the Elastic Block Store (EBS). EC2 provides the capability to host specialized data engines and EBS provides virtualized storage for the data engine hosted in EC2. This combination is sufficiently rich to support Purpose-Optimized Stores such as Aster DataVertica, or Greenplum or any of the commonly used RDBMS offerings such as OracleSQL ServerDB2MySQLPostgreSQL.

 

The Amazon Web Services plan is to continue to invest deeply in both SimpleDB and RDS as direct structured storage solutions and to continue to rapidly enhance EC2 and EBS to ensure that broadly-used database solutions as well as purpose-built stores run extremely well in the cloud. This year has been a busy one in AWS storage and I’m looking forward to the same pace next year.

                                                                --jrh

 

James Hamilton

 


 [gh1]nigh [nai]

·        prep. 近于

·        adv. 几乎;在附近地

 

[gh2]in essence

·        vt. 领先,在之前;优于,高于

·        n. 分类学;分类法

[gh10]intent [in'tent]

·        n. 意图;目的;含义

·        adj. 可行的;能养活的;能生育的 

 [gh12]workload ['w?:kl?ud]

·        vi. 辩论,争论,讨论

·        vt. 束缚;使跳跃

·        vi. 限制;弹起

[gh16]exemplify [iɡ'zemplifai]

·        vt. 例证;例示

[gh17]shard [:d]

·        n. (甲虫的)[]鞘翅;陶瓷碎片

[gh18]sharding

·        n. 分片;分区

[gh19]constrain [k?n'strein]

·        vt. 驱使;强迫;束缚

·        v. 强迫;驱使;勉强(constrain的过去分词)

[gh20]Aggregations

·        n. 雌虫聚集

[gh21]rational ['ræ??n?l]

·        adj. 合理的;理性的

·        [ 比较级more rational

[gh22]burden ['b?:d?n]

·        vt. 使负担;烦扰;装货于

[gh23]spectrum ['spektr?m]

·        n. 光谱;频谱;范围;余象

·        adj. 有影响的;有势力的

[gh25]benchmarks

·        n. 基准;标竿;水准点;基准测试程序数值(benchmark的复数形式)

·        vt. 储入仓库;以他人名义购进(股票)

·        n. 在职者;现任者;领圣俸者

[gh28]squarely ['skwε?li]

·        adv. 直角地;诚实地;正好;干脆地;正当地

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值