In-Memory Column Store
内存中列的存储:
Starting in Oracle Database 12c Release 1 (12.1.0.2), the
In-Memory Column Store
(IM column store) is an optional, static SGA pool that stores copies of tables and partitions in a special columnar format optimized for rapid scans. The IM column store does not replace the buffer cache, but acts as a supplement so that both memory areas can store the same data in different formats. By default, only objects specified as
INMEMORY
using DDL are candidates to be populated in the IM column store.
从oracle 12c R1开始,内存中的列存储是一个可选的(原来一致是以row进行存储,列进行查询)。
静态的SGA池,它存储着表和分区的副本在一个特殊列式的格式下,可以实现优秀的快速扫描。
IM column存储不能代替缓冲区缓存,
但作为一个补充,使两个内存区域可以将相同的数据存储在不同的格式。默认地,只有指定INMEMORY 的DDL 才会被填充到IM column.
Note:
It is not necessary for objects populated in the IM column store to be loaded into the database buffer cache.
The columnar format exists only in memory. 下图
shows three tables from the
sh
schema stored in the IM column store:
customers
,
products
, and
sales
. The IM column store stores the data by column rather than row. The database keeps the columnar data transactionally consistent with the buffer cache.
注意:
在IM列存储,
不需要对对象填冲被加载
到数据库缓冲区高速缓存,这个列式格式只存在于内存中,下图展示了来自sh 模式的3个表存储在IM column中:
customers
,
products
, 和
sales.
IM column 存储数据使用的是列而不是行。数据库保存列式数据的事务与buffer cache一致
1.1Benefits of the IM Column Store
The IM column store enables the database to perform scans, joins, and aggregates much faster than when it uses the on-disk format exclusively. In particular, the IM column store is useful for:
1、Scanning many rows and applying filters that use operators such as =, <, >, and IN
2、Querying a subset of columns in a table, for example, selecting 5 of 100 columns
3、Accelerating joins by converting predicates on small dimension tables into filters on a large fact table
IM 列存储 的好处
使用IM列存储,数据库可以执行扫描、连接和聚合,速度远远超过当它完全使用磁盘上的格式。在IM列存储用于以下情况:
1、
扫描行数和应用过滤器 ,使用运算符,例如=、<,>、in
2、针对一张表列的子查询
,例如,选择100列中的5列
3、
加速连接通过转换
谓词在
小维度表上到
过滤器在大型事实表
1.2Dual Memory Formats: Columnar and Row
When fetching data, Oracle Database can read either the IM column store or the database buffer cache, or both within the same query. The database transparently sends OLTP queries (such as primary key lookups) to the buffer cache, and analytic and reporting queries to the IM column store. Thus, dual memory formats provide the best of both worlds.
Figure 下图 shows a sample IM column store. The sh.sales table is stored on disk in traditional row format. The SGA stores the data in columnar format in the IM column store, and in row format in the database buffer cache.
双格式存储:列式和行式
当读取数据时,
oracle 可以读取IM列存储或者读取database buffer cache,或者在两者中同时查询。数据库透明地发送OLTP查询(主键查找)到缓冲区缓存,然后分析、报告查询出IM列存储。因此双格式存储提供最好的世界。
看下图表现的一个IM存储的样本,sh.sales 表是一个存储在磁盘上的传统行格式,SGA存放数据(block):
在IM列存储中用列格式,在database buffer cache中使用行格式
1.3Population of the In-Memory Column Store
You can specify that the database populates data in the IM column store either at database instance startup or when the data is accessed. The population algorithm also varies depending on whether you use single-instance or Oracle RAC.
IM列存储的群体
你可以指定数据库填充数据在IM列存储中当数据库启动时或者当数据被访问时(两者之一),群体算法也取决于你使用的是单实例还是一个RAC.
1.4In-Memory Columnar Compression
The IM column store uses special compression formats optimized for access speed rather than storage reduction. The database increases speed in the following ways:
a\The compression formats enable the database to decrease the amount of memory processed for each column. SQL executes directly on the compressed columns.
b\The database uses SIMD vector (array) instructions to process an array of column values in a single CPU clock cycle. The database can store many values in a vector, which maximizes the performance benefits with SIMD vector processing.
IM列存储使用特殊的压缩格式优化访问速度,而不是减少存储大小。增加了数据库访问速度:
a\压缩格式,使数据库能够降低每列的内存处理量。直接在压缩列上执行SQL。
b\数据库使用SIMD向量(数组)指令来处理大量的列值在一个CPU时钟周期,数据库可以存储许多向量值,使用SIMD向量来增加性能
1.5Scan Optimization for the IM Column Store
The columnar format enables queries to scan only necessary columns. For example, suppose a user executes the following adhoc query:
IM列存储扫描优化
列格式使查询只扫描必要的列。例如,假设用户执行下面的特别的查询:
SELECT cust_id, time_id, channel_id
FROM sales
WHERE prod_id > 14
AND prod_id < 29
When using the buffer cache, the database would typically scan an index to find the product IDs, use the rowids to fetch the rows from disk into the buffer cache, and then discard the unwanted column values. Using the IM column store, however, the database can scan only the requested sales columns, avoiding disk altogether.
The IM column store uses an internal storage index to filter data, further reducing unnecessary scans. The storage index stores the minimum and maximum for each in-memory compression unit. Thus, the database can scan only the in-memory compression units that satisfy the query predicate, as shown in the WHERE prod_id > 14 AND prod_id < 29 example in Figure 14-10.
当使用buffer cache,数据库将扫描一个producat IDs索引,使用rowid从磁盘取回(行的数据)到buffer cache,接着放弃不需要的列。使用IM列存储,然后,数据库只扫描cust_id, time_id, channel_id这3个列,避免磁盘完全扫描。
IM列存储存储使用内部的存储索引过滤索引,进一步减少不需要的扫描。这个内部存储的index存储着每个
压缩内存单元
最小值和最大的值,这样数据库可以直接扫描内存压缩单元适合的查询谓语,也就是where prod_id>14和prod_id<29,如下图: