Columnar Database

A column store database is a database which stores table data as sections of columns of data rather than as rows of data. That means all the values for one particular column will be stored together and these sets of data might be distributed for different columns. Different column values for same row are linked by pointers within. When any query is run against a row store database, it fetches data from all the columns for the rows that match the query conditions. In contrast, when the same query is run against a column store database, only those columns are read which are required by the query. So if you consider a table with 30 columns and your query needs( this is often less than 15% of the columns in a typical fact table, according to a Microsoft white paper) only five of those fields the column store database will at least give you 6x performance over its row store counterpart (provided the data in each column is similar). Considering the fact that disk access is still the slowest operation for our computers, this becomes significant because of much lesser pages to be read from the disk. Also, the columns which are not used in the query are not loaded at all in the memory which becomes essential to avoid frequent PLE drops when dealing with large amount of data.

Also, most of the column store databases use high values of compression for the data they store. The extremely high values of compression are results of the redundancy we usually see in columns. Again, the values in the same column are invariably of the same type, unlike the data you will see in a row, thus facilitating even more compression. Most of the column store database systems use more than one type of compression schemes based on the type of values stored in the column. For instance, Vertica uses Run Length Encoding schemes to store values in a column that are highly repetitive. These encoding schemes allow the database system to compress each column separately based on the underlying data values which can’t be done in a row store database because of different types of values stored in each row for each column.

The defining concept of a column-store is that the values of a table are stored contiguously by column. Thus the classic supplier table from CJ Date's supplier and parts database:

SNO  STATUS CITY    SNAME

---  ------ ----    -----

S1       20 London  Smith

S2       10 Paris   Jones

S3       30 Paris   Blake

S4       20 London  Clark

S5       30 Athens  Adams

would be stored on disk or in memory something like:

S1S2S3S4S5;2010302030;LondonParisParisLondonAthens;SmithJonesBlakeClarkAdams

This is in contrast to a traditional row store which would store the data more like this:

S120LondonSmith;S210ParisJones;S330ParisBlake;S420LondonClark;S530AthensAdams

From this simple concept flows all of the fundamental differences in performance, for better or worse, between a column-store and a row-store. For example, a column store will excel at doing aggregations like totals and averages, but inserting a single row can be expensive, while the inverse holds true for row-stores. This should be apparent from the above diagram.

How do they differ from relational databases? A relation database is a logical concept. A columnar database, or column-store, is a physical concept. Thus the two terms are not comparable in any meaningful way. Column- oriented DMBSs may be relational or not, just as row-oriented DBMS's may adhere more or less to relational principles.
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值