分区表的总结(一)

     此总结主要便于记忆,或者给需要的朋友提供一点帮助。

    Partitioning addresses key issues in supporting very large tables and indexes by letting you decompose them into smaller and more manageable pieces called partitions. SQL queries and DML statements do not need to be modified in order to access partitioned tables. However, after partitions are defined, DDL statements can access and manipulate individuals partitions rather than entire tables or indexes. This is how partitioning can simplify the manageability of large database objects. Also, partitioning is entirely transparent to applications.

随着表数据的持续增长,需要将历史数据和当前的数据分开单独处理的时候我们就会考虑使用分区表。

Each partition of a table or index must have the same logical attributes, such as column names, datatypes, and constraints, but each partition can have separate physical attributes such as pctfree, pctused, and tablespaces.

每个分区的表或索引必须具有相同的逻辑属性,如列名称,数据类型,和约束,但每个分区可以有单独的物理属性,如pctfree,pctused和表空间。

Partitioning is useful for many different types of applications, particularly applications that manage large volumes of data. OLTP systems often benefit from improvements in manageability and availability, while data warehousing systems benefit from performance and manageability.


Partitioning offers these advantages:

  • Partitioning enables data management operations such data loads, index creation and rebuilding, and backup/recovery at the partition level, rather than on the entire table. This results in significantly reduced times for these operations.

  • Partitioning improves query performance. In many cases, the results of a query can be achieved by accessing a subset of partitions, rather than the entire table. For some queries, this technique (called partition pruning) can provide order-of-magnitude gains in performance.

  • Partitioning can significantly reduce the impact of scheduled downtime for maintenance operations.

    Partition independence for partition maintenance operations lets you perform concurrent maintenance operations on different partitions of the same table or index. You can also run concurrent SELECT and DML operations against partitions that are unaffected by maintenance operations.

  • Partitioning increases the availability of mission-critical databases if critical tables and indexes are divided into partitions to reduce the maintenance windows, recovery times, and impact of failures.

  • Partitioning can be implemented without requiring any modifications to your applications. For example, you could convert a nonpartitioned table to a partitioned table without needing to modify any of the SELECT statements or DML statements which access that table. You do not need to rewrite your application code to take advantage of partitioning.

(对于分区的优点

    1、提高查询性能:只需要搜索特定分区,而非整张表,提高查询速度

    2、节约维护时间:单个分区的数据装载,索引重建,备份,维护等将远小于整张表的维护时间。

    3、节约维护成本:可以单独备份和恢复每个分区

    4、均衡I/O:将不同的分区映射到不同的磁盘以平衡I/O,提高并发。)

ORACLE 11g中支持的分区:

Tables can be partitioned into up to 1024K-1 separate partitions. Any table can be partitioned except those tables containing columns with LONG or LONG RAW datatypes. You can, however, use tables containing columns with CLOB or BLOB datatypes.

注意:

To reduce disk use and memory use (specifically, the buffer cache), you can store tables and partitioned tables in a compressed format inside the database. This often leads to a better scaleup for read-only operations. Table compression can also speed up query execution. There is, however, a slight cost in CPU overhead.

减少buffer cache,可以将表和分区表创建成压缩的形式,从而可以降低CPU的开销成本

You can partition index-organized tables by range, list, or hash. Partitioned index-organized tables are very useful for providing improved manageability, availability, and performance for index-organized tables. In addition, data cartridges that use index-organized tables can take advantage of the ability to partition their stored data.

For partitioning an index-organized table:

  • Partition columns must be a subset of primary key columns

  • Secondary indexes can be partitioned—locally and globally

  • OVERFLOW data segments are always equipartitioned with the table partitions

分区的类型:

范围分区、散列分区、列表分区、组合分区

    可以对索引和表分区,全局索引只能按范围分区,可以将其定义在任何类型的分区或非分区表上。

    通常全局索引比局部索引需要更多的维护

    局部索引与基础表是等同分区的,用于反映其基础表的结构

Partitioning Methods

Oracle provides the following partitioning methods:

  • Range partitioning maps data to partitions based on ranges of partition key values that you establish for each partition.

  • Interval partitioning is an extension of range partitioning which instructs the database to automatically create partitions of a specified interval when data inserted into the table exceeds all of the range partitions.

  • Hash partitioning maps data to partitions based on a hashing algorithm that evenly distributes rows among partitions, giving partitions approximately the same size.

  • List partitioning enables you to explicitly control how rows map to partitions by specifying a list of discrete values in the description for each partition.

  • Reference partitioning enables you to partition a table based on the partitioning scheme of the table referenced in its referential constraint.

  • Composite partitioning is a combination of two partitioning methods to further divide the data into subpartitions:

    • Composite range-range partitioning partitions data using the range method, and within each partition, subpartitions it using the range method.

    • Composite range-hash partitioning partitions data using the range method, and within each partition, subpartitions it using the hash method.

    • Composite range-list partitioning partitions data using the range method, and within each partition, subpartitions it using the list method.

    • Composite list-range partitioning partitions data using the list method, and within each partition, subpartitions it using the range method.

    • Composite list-hash partitioning partitions data using the list method, and within each partition, subpartitions it using the hash method.

    • Composite list-list partitioning partitions data using the list method, and within each partition, subpartitions it using the list method.

  • System partitioning enables application-controlled partitioning for arbitrary tables.

Range分区是以列的值的范围来做为分区的划分条件,将记录存放到列值所在的range分区中。

 每个分区都包含VALUES LESS THAN,定义了分区的上层边界。任何等于和大于分区键值的二进制值都被添加到下一个高层分区中。MAXVALUE可以用来定义最高层的分区。MAXVALUE表示了虚拟的无限值


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Hive中的分区表和分桶表是用来提高查询性能和管理数据的两种技术。 分区表是将数据按照某个列的值进行分区存储的表。通过将数据分散存储在不同的分区中,可以减少查询时需要扫描的数据量,提高查询效率。创建分区表的语法是使用ALTER TABLE语句,并指定分区的列和值。删除分区可以使用ALTER TABLE语句的DROP PARTITION子句。可以使用DESC FORMATTED命令查看分区表的结构,使用SELECT语句查询分区表的数据。\[1\] 分桶表是将数据按照哈希函数的结果进行分桶存储的表。分桶表可以提高数据的读取和查询效率,特别是在进行连接操作时。创建分桶表的语法是使用CLUSTERED BY子句指定分桶的列,并使用INTO子句指定分桶的数量。可以使用INSERT INTO TABLE语句将数据导入分桶表。\[2\] 在Hive 0.x和1.x版本中,需要设置hive.enforce.bucketing=true来强制分桶,以便程序可以根据表结构自动选择正确数量的Reducer和cluster by column来进行分桶。可以使用CTAS语句将数据从一个表复制到分桶表中。\[3\] 总结起来,分区表和分桶表都是用来提高查询性能和管理数据的技术,分区表是按照某个列的值进行分区存储,而分桶表是按照哈希函数的结果进行分桶存储。 #### 引用[.reference_title] - *1* *2* [大数据—Hive(七)_ 分区表和分桶表](https://blog.csdn.net/m0_52968216/article/details/128667517)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^insert_down28v1,239^v3^insert_chatgpt"}} ] [.reference_item] - *3* [Hive分区表和分桶表](https://blog.csdn.net/mxk4869/article/details/125635202)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^insert_down28v1,239^v3^insert_chatgpt"}} ] [.reference_item] [ .reference_list ]

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值