Columnar Database

最新推荐文章于 2023-07-01 04:03:55 发布

miller_lover

最新推荐文章于 2023-07-01 04:03:55 发布

阅读量670

点赞数

分类专栏： big data 文章标签： columnar database

big data 专栏收录该内容

101 篇文章 0 订阅

订阅专栏

A column store database is a database which stores table data as sections of columns of data rather than as rows of data. That means all the values for one particular column will be stored together and these sets of data might be distributed for different columns. Different column values for same row are linked by pointers within. When any query is run against a row store database, it fetches data from all the columns for the rows that match the query conditions. In contrast, when the same query is run against a column store database, only those columns are read which are required by the query. So if you consider a table with 30 columns and your query needs( this is often less than 15% of the columns in a typical fact table, according to a Microsoft white paper) only five of those fields the column store database will at least give you 6x performance over its row store counterpart (provided the data in each column is similar). Considering the fact that disk access is still the slowest operation for our computers, this becomes significant because of much lesser pages to be read from the disk. Also, the columns which are not used in the query are not loaded at all in the memory which becomes essential to avoid frequent PLE drops when dealing with large amount of data.

Also, most of the column store databases use high values of compression for the data they store. The extremely high values of compression are results of the redundancy we usually see in columns. Again, the values in the same column are invariably of the same type, unlike the data you will see in a row, thus facilitating even more compression. Most of the column store database systems use more than one type of compression schemes based on the type of values stored in the column. For instance, Vertica uses Run Length Encoding schemes to store values in a column that are highly repetitive. These encoding schemes allow the database system to compress each column separately based on the underlying data values which can’t be done in a row store database because of different types of values stored in each row for each column.

The defining concept of a column-store is that the values of a table are stored contiguously by column. Thus the classic supplier table from CJ Date's supplier and parts database:

SNO STATUS CITY SNAME

--- ------ ---- -----

S1 20 London Smith

S2 10 Paris Jones

S3 30 Paris Blake

S4 20 London Clark

S5 30 Athens Adams

would be stored on disk or in memory something like:

S1S2S3S4S5;2010302030;LondonParisParisLondonAthens;SmithJonesBlakeClarkAdams

This is in contrast to a traditional row store which would store the data more like this:

S120LondonSmith;S210ParisJones;S330ParisBlake;S420LondonClark;S530AthensAdams

From this simple concept flows all of the fundamental differences in performance, for better or worse, between a column-store and a row-store. For example, a column store will excel at doing aggregations like totals and averages, but inserting a single row can be expensive, while the inverse holds true for row-stores. This should be apparent from the above diagram.

How do they differ from relational databases? A relation database is a logical concept. A columnar database, or column-store, is a physical concept. Thus the two terms are not comparable in any meaningful way. Column- oriented DMBSs may be relational or not, just as row-oriented DBMS's may adhere more or less to relational principles.

miller_lover

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Columnar Database

A column store database is a database which stores table data as sections of columns of data rather than as rows of data. That means all the values for one particular column will be stored together an
复制链接

扫一扫

专栏目录