Repopulation of the IM Column Store

数据的population表示把新数据放入IM Column Store, repopulation则表示对已存在数据进行更改,因为IMCU及IMEU均为只读结构所以所谓更新即为重建

population与repopulation的过程其实是有区别的,前者会从磁盘及buffer cache读数据进行行列转化;后者对于不同的数据更新方式repolulation方式其是不同的(下面有详细说明),但一般更新基于已存在的IMCU及事务日志生成的

Population is a streaming mechanism, converting row data into columnar format, and then compressing it. IMCO automatically initiates population of INMEMORY objects with any priority other than NONE. When objects with priority NONE are accessed, IMCO populates them using Space Management Worker Process (Wnnn) processes.

During repopulation, the Wnnn processes create new versions of the IMCUs based on the existing IMCUs and transactions journals.

IM Column Store repopulation主要指定IMCU的repolulation,文尾有关于IMEU的repolulation

IM Column Store Repopulation方式

如果In-memory数据被更改,查询会通过日志到buff cache取数据以保证取数是最新的, 在满足一定条件后会自动对ICUM进行repopulation

An In-Memory Compression Unit (IMCU) is a read-only structure that does not modify the data in place when DML occurs on the table.

Instead, the Snapshot Metadata Unit (SMU) associated with each IMCU tracks row modifications in a transaction journal. If a query accesses the data, and discovers modified rows, then it can obtain the corresponding rowids from the transaction journal, and then retrieve the modified rows from the buffer cache.

As the number of modifications increase, so do the size of SMUs, and the amount of data that must be fetched from the transaction journal or database buffer cache. To avoid degrading query performance through journal access, background processes repopulate modified objects.

不同的数据更改方式会使用不同的机制进行repopulation

The IM column store uses different mechanisms depending on the type of data loading: conventional DML, direct path loads, and partition exchange loads.

  1. How Conventional DML Works with the IM Column Store

Conventional DML processes one row or array of rows at a time, and inserts rows below the high water mark. Regardless of whether the IM column store is enabled, the database processes DML using the buffer cache.

IMCUs are read-only. When a statement modifies a row in an IMCU, the IM column store records the rowid in the associated SMU.

Column Compression Unit (CU) entry becomes stale when its value differs from the value in its corresponding journal entry. For example, a transaction may change an employee’s weekly salary from 1000 to 1200, but the actual value in the IMCU is still 1000. The transaction journal records the rowid of the stale row and its SCN.

Note: The transaction journal does not record the new value. Rather, it indicates the corresponding row as stale as of a specific SCN.

1.1 Staleness Threshold 旧数据阀值

As the number of stale entries in an IMCU increases, the speed of the IMCU scan decreases. Performance decreases because the database must fetch the modified rows from the buffer cache or disk, rather than from the IM column store.

For this reason, Oracle Database repopulates an IMCU when the number of stale entries in an IMCU reaches an internal staleness threshold.

The database determines the threshold using heuristics that consider the frequency of IMCU access and the number of stale rows. Repopulation is more frequent for IMCUs that are accessed frequently or have a higher percentage of stale rows.

1.2 Double Buffering 双缓存机制

When the database begins either threshold-based repopulation or trickle repopulation, the IM column store uses double buffering.

During repopulation, the Wnnn processes create new versions of the IMCUs based on the existing IMCUs and transactions journals, while temporarily retaining the old versions. This mechanism, which is called double buffering, ensures that queries do not decrease performance because an IMCU is unavailable during repopulation. After repopulation completes, the IM column store retains the non-current IMCU for a short time (2 minutes by default) to optimize queries with older SCNs. Typically, the default behavior is sufficient. However, you can force deallocation of non-current IMCUs by using the SEGMENT_DEALLOCATE_VERSIONS procedure.

As shown in the following figure, the IM column store maintains two versions of an IMCU simultaneously, with the original stale IMCU remaining accessible to queries.

Figure 8-1 Double Buffering


The basic steps of double buffering are:

  1. In the original SMU, the database marks the existing IMCU as the original version as of a specific SCN.
  2. Background processes create a new version of the IMCU by combining the original rows with the latest versions of the modified rows.
  3. In the journal of the new SMU, the database tracks DML operations that occur during IMCU creation.

In this way, the original IMCU stays online. The database keeps both old and new IMCUs versions for as long as they are useful, or until the IM column store is under space pressure.

  1. How Direct Path Loads Work with the IM Column Store

A direct path load is an INSERT /*+APPEND*/ statement or a SQL*Loader operation in which DIRECT=true .

In a direct path load, the database writes formatted data blocks directly to the data files, bypassing the database buffer cache. The database appends the data above the high water mark, which is the boundary between used and unused space in a segment. Direct path loads operate are “all or nothing” operations: the operation either inserts all data or no data.

Figure 8-2 Direct Path Loads and the High Water Mark


When the segment is populated in the IM column store, a direct path load works as follows:

  1. You load data using a CREATE TABLE AS SELECT or INSERT /*+APPEND*/ statement. Only the current session is aware of the DML.
  2. You commit the statement.
  3. The high water mark moves to encompass the new data, which alerts the IMCU that data is missing. V$IM_SEGMENTS.BYTES_NOT_POPULATED now indicates the size of the newly inserted data.
  4. The IM column store manages repopulation based on the following algorithm:
  1. If the affected object has a PRIORITY set to a value other than NONE, then the database repopulates the data.
  2. If the affected object has a PRIORITY set to NONE, then the database repopulates at the next full scan of the object.

  1. How a Partition Exchange Load Works with the IM Column Store

A partition exchange load is a technique that exchanges a table for a partition. An exchange load is almost instantaneous because it modifies metadata instead of data.

To perform an exchange load, follow these steps:

  1. Create a nonpartitoned table, called a source table.
  2. Load rows into the source table.
  3. Exchange an existing table partition, called the target partition, with the table.

通过exchange将非分区表加入到分区表中,原非分区表称为source table,exchange后新分区为target partition

For the target partition to be populated in the IM column store after the exchange, the source table must be populated in the IM column store before the exchange. The following scenarios are possible, depending on the whether the target partition is populated:

  1. Before the exchange, the target partition is not populated in the IM column store. For example, the partition is empty.

After the exchange, the source table is no longer populated in the IM column store. The source IMCUs are now associated with the target partition.

  1. Before the exchange, the target partition is populated in the IM column store.

After the exchange, the source table remains populated in the IM column store.

Example 8-1 INMEMORY Partition Exchange Load

In this example, the sales table, which is partitioned, has the INMEMORY attribute set at the table level. All non-empty partitions in this table are currently populated. The sales_p042616 partition is currently empty. Your goal is to populate the empty partition sales_p042616 with data contained in text files. The following figure illustrates the before and after scenarios.

Figure 8-3 Partition Exchange

To perform the exchange, do the following:

  1. Create an external table sales_tmp_ext using the CREATE TABLE ... ORGANIZATION EXTERNAL statement.

The external table does not reside in the database, and can be in any format for which an access driver is provided. The table is read-only.

  1. Create a nonpartitioned table named sales_tmp_ld using CREATE TABLE ... AS SELECT * FROM sales_tmp_ext.

The sales_tmp_ld table is not external, which means it stores rows in the data files.

  1. Set the INMEMORY attribute in sales_tmp_ld using an ALTER TABLE statement.

The sales_tmp_ld table is now marked as INMEMORY, but it is not yet populated into the IM column store.

  1. Populate sales_tmp_ld into the IM column store by forcing a full table scan.

For example, the following query forces a full scan:

SELECT /*+ FULL(s) NO_PARALLEL(s) */ COUNT(*) FROM sales_tmp_ld s;

  1. Exchange the sales_p042616 partition with the sales_tmp_ld table.

For example, alter the sales table as follows:

ALTER TABLE sales EXCHANGE PARTITION sales_p042616 WITH TABLE sales_tmp_ld;

After the exchange completes, the sales_p042616 partition is populated in the IM column store, and the sales_tmp_ld is no longer populated.

IM Column Store Automatic Repopulates触发

The database repopulates the IM column store automatically according to an internal algorithm. You can manually disable repopulation, and influence its aggressiveness.

  1. Threshold-Based and Trickle Repopulation

自动repopulation有两种形式:threshold-based repopulationtrickle repopulation

Automatic repopulation takes two forms: threshold-based repopulation and trickle repopulation.

Automatic repopulation always checks stale journal entries and uses double buffering. However, repopulation has different triggers:

  1. Threshold-based repopulation

The database repopulates IMCUs when the number of changes recorded in the transaction journal reaches an internal staleness threshold.

  1. Trickle repopulation(默认每2min一次对所有变动的IMCU进行repopulation,不论它是否达到阀值)

The IMCO (In-Memory Coordinator) background process periodically checks whether stale rows exist, and then adds IMCUs to a repopulation queue. This mechanism does not depend on meeting the staleness threshold. 

Trickle repopulation类似java的垃圾回收

Trickle repopulation is analogous to Java garbage collection. The mechanism works as follows:

  1. IMCO wakes up.
  2. IMCO determines whether any population tasks need to be performed, including whether any stale entries exist in the transaction journal associated with an IMCU.
  3. If IMCO finds stale entries, then it triggers a Space Management Worker Process (Wnnn) to create a new version of the IMCU.

During IMCU creation, the database records the rowids of modified rows in the transaction journal.

  1. IMCO sleeps for two minutes, and then returns to Step 1.

Figure 8-4 Trickle Repopulation

  1. Factors Affecting Repopulation

The algorithm that triggers repopulation is internal, and depends on several factors.

The principal factors affecting repopulation are as follows:

  1. Rate of DML changes

As the number of modified rows increases, the percentage of stale columnar data increases. The transaction journal grows, increasing the need to use the buffer cache to satisfy queries.

  1. Type of DML operations

Typically, inserts have less performance overhead than deletes and updates because inserts often go into a new data block.

  1. Location of modified rows within a data block

Changes grouped within the same database block or table partition have less effect then changes distributed across an entire table. Versioning every IMCU has a greater impact than versioning a small number of IMCUs.

  1. Compression level applied to INMEMORY objects

Because of double buffering, tables with higher compression levels incur more query and DML overhead during repopulation. For example, MEMCOMPRESS FOR CAPACITY HIGH incurs more overhead than MEMCOMPRESS FOR DML.

  1. Number of active worker processes

As the number of worker processes increases, more work occurs in parallel. Consequently, the rate of repopulation increases.

IM Column Store Repopulation管理

相关Initialization Parameters

  1. INMEMORY_MAX_POPULATE_SERVERS

This parameter limits the maximum number of Wnnn processes available for population and repopulation (threshold-based and trickle). The default value is half the CPU_COUNT. This parameter acts as a throttle, preventing these server processes from overloading the rest of the database. Setting this parameter to 0 disables both population and repopulation.

Caution: Be careful not to set the value of this parameter too high. If it is set close to the number of cores or higher, then no CPU may be available for the rest of the system to run.

  1. INMEMORY_TRICKLE_REPOPULATE_SERVERS_PERCENT

用于设置trickle repopulation使用的进程数

The value for this parameter is a percentage of the INMEMORY_MAX_POPULATE_SERVERS value. For example, if INMEMORY_TRICKLE_REPOPULATE_SERVERS_PERCENT is 5 percent, and if INMEMORY_MAX_POPULATE_SERVERS is 20, then the IM column store uses an average of 1 core (.05 * 20) for trickle repopulation.

To increase throughput at the expense of increased background CPU, set this parameter to higher values such as 5 or 10. A value greater than 50 is not allowed, so that at least half of the INMEMORY_MAX_POPULATE_SERVERS processes are available for other tasks.

Setting this parameter to 0 disables trickle population.

  1. DBMS_INMEMORY.REPOPULATE Procedure

单列在dbms部分了

(四)IMEU的Population/Repopulation

在IMCU populate/repopulate同时会对IMEU数据populate/repopulate,但同时IMCU也可以手动进行populate及repopulate,它可能会对IMCU产生影响

以下过程并不会立即对IMEU操作,而是在对应IMCU repopulation时生效

DBMS_INMEMORY_ADMIN.IME_CAPTURE_EXPRESSIONS

DBMS_INMEMORY.IME_DROP_EXPRESSIONS

DBMS_INMEMORY_ADMIN.IME_DROP_ALL_EXPRESSIONS

注:DROP相关操作会drop表相应的SYS_IME列,同时它还会清除这些表使用的IMEU与EMCU,是否repopulation看对象IM PRORITY设置

下面过程是立即对IMEU更新:

DBMS_INMEMORY_ADMIN.IME_POPULATE_EXPRESSIONS

IMEU population过程:

When creating IMEUs, the worker processes perform the following tasks:

  1. Identify virtual columns for population
  2. Create virtual column values
  3. Compute values for each row, transform the data into columnar format, and compress it
  4. Register the objects with the space layer
  5. Associate the IMEUs with their corresponding IMCUs

The database can quickly move IM expressions in and out of the IM column store. For example, if an IMCU was created without an IMEU, then the database can add an IMEU later without forcing the IMCU to undergo the full repopulation mechanism.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值