Redisql: the lightning fast data polyglot【翻译】

最新推荐文章于 2024-03-01 07:00:00 发布

Linvo

最新推荐文章于 2024-03-01 07:00:00 发布

阅读量2.2k

点赞数

分类专栏： SQL/noSQL 文章标签： redis sql database nosql数据库 application mysql

SQL/noSQL 专栏收录该内容

30 篇文章 0 订阅

订阅专栏

老外就是能搞，硬让Redis全面支持SQL。。。彪悍的人生不需要解释~！

本文是对其博文的翻译，还有些地方不是太明白，敬请指正~！

英文原文：http://jaksprats.wordpress.com/2010/09/28/introducing-redisql-the-lightning-fast-polyglot/

——————————————————————————华丽丽的分割线————————————————————————————

For about a year, I have been using the NOSQL datastore redis, in various web-serving environments, as a very fast backend to store and retrieve key-value data and data that best fits in lists, sets, and hash-tables. In addition to redis, my backend also employed mysql, because some data fits much better in a relational table. Getting certain types of data to fit into redis data objects would have added to the complexity of the system and in some cases: it’s simply not doable. BUT, I hated having 2 data-stores, especially when one (mysql) is fundamentally slower, this created a misbalance in how my code was architected. The Mysql calls can take orders of magnitude longer to execute, which is exacerbated when traffic surges. So I wrote Redisql which is an extension of redis that also supports a large subset of SQL. The Idea was to have a single roof to house both relational data and redis data and both types of data would exhibit similar lookup/insert latencies under similar concurrency levels, i.e. a balanced backend.

我坚持使用NOSQL数据库redis大概有一年了吧，在各种Web服务环境中，它是一个灰常快的后端存储，并且可以使用适合的列表、集合和哈希表来检索Key-value数据。除了redis，我的后端也会采用MySQL，因为有些数据用关系表来搞更给力。在一些情况下，把得到的符合redis数据对象类型的数据添加到复杂的系统中，这根本就不靠谱。但是，我讨厌用两种数据存储，尤其是当其中一个（mysql）比较慢的时候，它使得我在设计代码的时候感觉很不河蟹。流量激增时，那些Mysql调用会花大把的时间去执行。所以我给redis写了个扩展版本——Redisql，支持大量的SQL子集。当时的想法是搞一坨既有关系数据又有redis数据的东东，两种类型的数据在相似的并发级别下，有着差不多的查询和写入延迟，也就是一个河蟹的后端。

Redisql supports all redis data types and functionality (as it’s an extension of redis) and it also supports SQL SELECT/INSERT/UPDATE/DELETE (including joins, range-queries, multiple indices, etc…) -> lots of SQL, short of stuff like nested joins and Datawarehousing functionality (e.g. FOREIGN KEY CONSTRAINTS). So using a Redisql library (in your environment’s native language), you can either call redis operations on redis data objects or SQL operations on relational tables, its all in one server accessed from one library. Redisql morph commands convert relational tables (including range query and join results) into sets of redis data objects. They can also convert the results of redis commands on redis data objects into relational tables. Denormalization from relation tables to sets of redis hash-tables is possible, as is normalization from sets of redis hash-tables (or sets of redis keys) into relational tables. Data can be reordered and shuffled into the data structure (relational table, list, set, hash-table, OR ordered-set) that best fits your use cases, and the archiving of redis data objects into relational tables is made possible.

Redisql支持所有的redis数据类型和功能（因为它是redis的扩展），也支持SQL语句 SELECT/INSERT/UPDATE/DELETE （包括连接、范围查询、多索引等等），大量的SQL，以及一些嵌套的连接和数据仓库功能（例如外键约束）。所以使用Redisql库（在你的语言环境下），你既可以调用redis来操作redis的数据对象，也可以操作关系数据表，这一切都只用了单个服务来访问单个库。Redisql的变形命令会把关系数据表（包括范围查询和连接结果）转换成redis数据对象集合。同时也能把redis这样的数据对象的结果转换成关系表。从目前redis哈希表的集合（或redis键集合）到关系表的标准转换来看，从关系表到redis的哈希表的山寨转换方法也是靠谱的。数据能以最适合你的情况来被重新排序，并塞到数据结构（关系表、列表、集合、哈希表，或全序集合）中，redis数据对象的归档塞入关系表中，这都是可行的。

Not only is all the data under a single data roof in Redisql, but the lookup/insert speeds are uniform, you can predict the speed of a SET, an INSERT, an LPOP, a SELECT range query … so application code runs w/o kinks (no unexpected bizarro waits due to mysql table locks -> that lock up an apache thread -> that decrease the performance of a single machine -> which creates an imbalance in the cluster).

Redisql不仅仅是把各种数据放到同一个容器里，而且他们查询和写入的速度也是统一的，你可以对SET、INSERT、LPOP、SELECT查询做出预估……所以应用程序代码能按预期的来运行（不会意外的等待mysql锁表 -> 锁apache线程 -> 单机性能降低 -> 集群不平衡）。

Uniform data access patterns between front-end and back-end can fundamentally change how application code behaves. On a 3.0Ghz CPU core, Redis SET/GET run at 110K/s and Redisql INSERT/SELECT run at 95K/s, both w/ sub millisecond mean-latencies, so all of a sudden the application server can fetch data from the datastore w/ truly minimal delay. The oh-so-common bottleneck: “I/O between app-server and datastore” is cut to a bare minimum, which can even push the bottleneck back into the app-servers, and that’s great news as app-servers are dead simple (e.g. add server) to scale horizontally. Redisql is an event-driven non-blocking asynchronous-I/O in-memory database, which i have dubbed an Evented Relational Database, for brevity’s sake.

只有统一前端和后端之间的数据访问模式，才能从根本上改变应用程序代码的这些毛病。在一个3.0GHz的CPU上，redis能达到每秒11万次的SET/GET，redisql则达到9.5万次的INSERT/SELECT，两者的子毫秒级延迟意味着应用程序服务能从数据库以真正的最小耗时取到数据。常见的瓶颈“应用服务和数据存储间的I/O”被降低到最低，甚至可以把瓶颈推回给应用程序，丫就可以靠使用简单的方法（例如增加服务器）来实现伸缩性，这实在太犀利了。Redisql是一个事件驱动的非阻塞异步I/O内存数据库，为了简洁我称其为事件触发的关系数据库。

During the development of Redisql, it became evident that optimizing the number of bytes a row occupied was an incredibly important metric, as Redisql is an In-Memory database (w/ disk persistence snapshotting). Unlike redis, Redisql can function if you go into swap space, but this should be done w/ extreme care. Redisql has lots of memory optimisations, it has been written from the ground up to allow you to put as much data as is possible into your machine’s RAM. Relational table per-row overhead is minimal and TEXT columns are stored in compressed form, when possible (using algorithms w/ negligible performance hits). Analogous to providing predictable request latencies at high concurrency levels, Redisql gives predictable memory usage overhead for data storage and provides detailed per-table, per-index memory usage via the SQL DESC command, as well as per row memory usage via the “INSERT … RETURN SIZE” command. The predictability of Redisql, which translates into tweakability for the seasoned programmer, changes the traditional programming landscape where the datastore is slower than the app-server.

随着Redisql的发展，作为一个内存数据库（写磁盘快照实现持久化），优化已占用的连续字节很明显的成为一个非常重要的指标。不像redis，Redisql有很多功能如果你进入交换空间的话，但是得非常小心。Redisql大量使用了内存优化技术，从根本上允许你把尽可能多的数据写入到机器内存中。关系表每行开销最小，如果可能的话文本列将被压缩存储。就像在高并发下提供可预估的请求延迟一样，redisql还带来可预估的数据存储在内存中的使用情况，且提供每个表的详情，还有通过SQL DESC命令时每个索引的内存使用情况，以及通过“INSERT……RETURN SIZE”命令时每行的内存使用情况。Redisql的可预估性，使得传统程序员逃出了数据存储慢于应用服务的魔掌。

Redisql is architected to handle the c10K problem, so it is world class in terms of networking speed AND all of Redisql’s data is in RAM, so there are no hard disk seeks to engineer around, you get all your data in a predictably FAST manner AND you can pack a lot of data into RAM as Redisql aggressively minimizes memory usage AND Redisql combines SQL and NOSQL under one roof, unifying them w/ commands to morph data betwixt them …. the sum of these parts, when integrated correctly w/ a fast app-server architecture is unbeatable as a dynamic web page serving platform with low latency at high concurrency.

Redisql被设计来解决c10K问题，因此它的公网的速度靠谱且数据都在内存中，以至于可以把硬盘打入冷宫，你能以预估的那样迅速地取到数据，并且能通过很低的内存使用率把很多数据塞进去，何况Redisql是把SQL和NOSQL放在同一个容器中，在他们中使用的是统一的一套命令。

The goal of Redisql is to be the complete datastore solution for applications that require the fastest data lookups/inserts possible. Pairing Redisql w/ an event driven language like Node.js, Ruby Eventmachine, or Twisted Python, should yield a dynamic web page serving platform capable of unheard of low latency at high concurrency, which when paired w/ intelligent client side programming, could process user events in the browser quickly enough to finally realize the browser as an applications platform.

Redisql的目标是为应用程序提供完整的数据存储解决方案，使其能够已最快的速度查询和写入。通过使用一些与Redisql匹配的事件驱动语言，像Node.js，Ruby Eventmachine，还有Twisted Python，应该能搞出一套前推500年后推500年都没有（凤姐语——译者注）的低延迟高并发的动态网站服务平台。当搭配上智能客户端，就能够足够迅速地处理浏览器中的用户事件，最终实现用浏览器来作为应用程序平台。

Redisql: the polyglot that speaks SQL and redis, was written to be the Evented Relational Database, the missing piece in the 100% event driven architecture spanning from browser to app-server to database-server and back.

Redisql：作为事件触发的关系数据库，应该说是支持多语言的SQL和redis，缺少的部分在100%事件驱动架构下，会跨越到浏览器，再到应用程序服务器，再到数据库服务器，然后返回（虾米意思？——译者注）。