Wednesday, March 28, 2012

Partitioning

Introduction
Oracle Partitioning, first introduced in Oracle 8.0 in 1997, is one of the most important and successful functionalities of the Oracle DB, improving the performance, manageability, and availability for tens of thousands of applications. 
11gR2 introduces the 9th generation of partitioning, enabling customers to model even more business scenarios and making partitioning easier to use, enabling “Partitioning for the masses”.

Partitioning allows a table, index, or index-organized table to be subdivided into smaller pieces called partitions using single or set of columns called partitioning key. For ex, orders table can be partitioned based on order_date as partitioning key. Each piece of such a DB object is called a partition. Each partition has its own name, and may optionally have its own storage characteristics. 

Basics of Partitioning
  • Partitioning addresses key issues in supporting very large tables and indexes by decomposing them into smaller and more manageable pieces called partitions, which are entirely transparent to an application.
  • SQL queries and DML statements do not need to be modified to access partitioned tables. However, after partitions are defined, DDL statements can access and manipulate individual partitions rather than entire tables or indexes. This is how partitioning can simplify the manageability of large database objects.
  • Each partition of a TABLE / INDEX must have the same logical attributes, such as column names, data types, and constraints, but each partition can have separate physical attributes, such as compression enabled or disabled, physical storage settings, and tablespaces.
  • Database objects - tables, indexes, and index-organized tables - are partitioned using a 'partitioning key', a set of columns that determine in which partition a given row will reside.
  • However, the DBA can manage and store each monthly partition individually, potentially using different storage tiers, applying table compression to the older data, or store complete ranges of older data in read only tablespaces.
  • From the perspective of a DBA, a partitioned object has multiple pieces that can be managed either collectively / individually. This gives the DBA considerable flexibility in managing a partitioned object. 
  • From the perspective of the Application, a partitioned table is identical to a non-partitioned table; no modifications are necessary when accessing a partitioned table using SQL DML commands.
  • Note: All partitions of a partitioned object must reside in tablespaces of a single block size.

Partitioning Key: Each row in a partitioned table is unambiguously assigned to a single partition. Partitioning key consists of one or more columns that determine the partition where each row is stored. Oracle automatically directs insert, update, and delete operations to the appropriate partition with the partitioning key.
Partitioned Tables: Any table can be partitioned into a million separate partitions except those tables containing columns with LONG / LONG RAW data types. You can, however, use tables containing columns with CLOB / BLOB data types.
Partitioned Index-Organized Tables: They are very useful for providing improved performance, manageability, and availability for index-organized tables. For partitioning an index-organized table:
  • Partition columns must be a subset of the primary key columns.
  • Secondary indexes can be partitioned (both locally and globally).
  • OVERFLOW data segments are always equi-partitioned with the table partitions.


Exactly, when to use partitioning is a rather subjective decision. Some general guidelines that Oracle and I suggest are listed below.
When to Partition a Table
  • Tables greater than 2 GB should always be considered as candidates for partitioning.
  • Tables containing historical data, in which new data is added into the newest partition. A typical ex. is a historical table where only the current month's data is updatable and the other 11 months are read only.
  • When the contents of a table need to be distributed across different types of storage devices.
When to Partition an Index
  • Avoid rebuilding the entire index when data is removed.
  • Perform maintenance on parts of the data without invalidating the entire index.
  • Reduce the impact of index skew caused by an index on a column with a monotonically increasing value

Benefits of Partitioning
  • It enables the DB objects to be managed and accessed at a finer level of granularity. 
  • It improves the performance of certain queries or maintenance operations by an order of magnitude. 
  • It can greatly reduce the total cost of data ownership, using a “tiered archiving” approach of keeping older relevant information still online on low cost storage devices.
  • It enables an efficient and simple, yet very powerful approach when considering Information Lifecycle Management (ILM) for large environments.
  • It also enables DB designers and DBA's to tackle some of the toughest problems posed by cutting-edge applications. 
  • It is a key tool for building multi-terabyte systems or systems with extremely high availability requirements.
  • It can provide tremendous benefit to a wide variety of applications by improving Performance, Manageability, and Availability. 

Partitioning for 
  • Availability: Partitioned DB objects provide partition independence. This characteristic of partition independence can be an important part of a high-availability strategy.
  • Manageability: It allows tables and indexes to be partitioned into smaller, more manageable units, providing DBA with the ability to pursue a "divide and conquer" approach to data management.
  • Performance: By limiting the amount of data to be examined / operated on, partitioning provides a number of performance benefits. These features include: Partitioning Pruning and Partition-wise Joins.
Oracle additionally provides a comprehensive set of SQL commands for managing partitioning tables. These include commands for adding new partitions, dropping, splitting, moving, merging, truncating, and optionally compressing partitions.

Partitioning Pruning / Partition Elimination
  • It is the simplest and also the most substantial means to improve performance using partitioning. Partition pruning can often improve query performance by several orders of magnitude.
  • In Partition Pruning, the optimizer analyzes FROM and WHERE clauses in SQL statements to eliminate unneeded partitions when building the partition access list. As a result, Oracle DB performs operations only on those partitions that are relevant to the SQL statement. Partition pruning dramatically reduces the amount of data retrieved from disk and shortens processing time, thus improving query performance and optimizing resource utilization.
  • Partition pruning works with all of Oracle's other performance features. Oracle will utilize partition pruning in conjunction with any indexing technique, join technique, or parallel access method. 
 Partition-wise Joins
  • It breaks a large join into smaller joins that occur between each of the partitions, completing the overall join in less time. It can be applied when two tables are being joined together, and at least one of these tables is partitioned on the join key or when a reference partitioned table is joined with its parent table.
  • It reduces query response time by minimizing the amount of data exchanged among parallel execution servers when joins execute in parallel. This significantly reduces response time and improves the use of both CPU and memory resources. Partition-wise joins can be Full or Partial. Oracle DB decides which type of join to use.
  • It break a large join into smaller joins of “identical” data sets for the joined tables. “Identical” here is defined as covering exactly the same set of partitioning key values on both sides of the join, thus ensuring that only a join of these 'identical' data sets will produce a result and that other data sets do not have to be considered.
  • Oracle is using either the fact of already (physical) equi-partitioned tables for the join or is transparently redistributing (= “repartitioning”) one table at runtime to create equi-partitioned data sets matching the partitioning of the other table, completing the overall join in less time. This offers significant performance benefits both for serial and parallel execution.


Evolution of Partitioning in Oracle
Oracle Version #
Core functionality
Performance
Manageability
8
Range partitioning
Global range indexes
“Static” partition pruning
Basic maintenance operations:
add, drop, exchange
8i
Hash partitioning
Composite range-hash partitioning
Partition-wise joins
“Dynamic” pruning
Merge operation
9i
List partitioning

Global index maintenance
9i R2
Composite range-list partitioning
Fast partition split

10g
Global hash indexes

Local index maintenance
10g R2
1M partitions per table
“Multi-dimensional” pruning
Fast drop table
11g
More composite choices
REF partitioning
Virtual column partitioning

Interval partitioning
Partition Advisor

Partitioning Strategies in 11g R2
Each partitioning strategy has different advantages and design considerations. Thus, each strategy is more appropriate for a particular situation.
Basic Partitioning Strategies
Partitioning Extensions
Single-level partitioning
  • Range
  • Hash
  • List
Composite partitioning
  • Range-Range
  • Range-Hash
  • Range-List
  • Hash-Range
  • Hash-Hash
  • Hash-List
  • List-Range
  • List-Hash
  • List-List
Manageability Extensions
  • Interval Partitioning

  1. Interval
  2. Interval-Range
  3. Interval-List
  4. Interval-Hash
  • Partition Advisor
Partitioning Key Extensions
  • REF Partitioning / Partition by Reference
  • Virtual column-based Partitioning



Partitioning Strategies
Data Distribution
Sample Business Case
Single-level partitioning

Based on consecutive ranges of values
Based on unordered lists of values
Based on a hash algorithm

Orders table range partitioned by order_date
Orders table list partitioned by country
Orders table range partitioned by customer_id
Composite partitioning
Based on a combination of two of the above mentioned basic techniques of Range, List, Hash, and Interval
Orders table is range partitioned by order_date and sub_partitioned by
  • hash on customer_id
  • range on shipment_date
Interval partitioning
An extension to Range Partition.
Defined by an interval, providing equi-width ranges. With exception of the first partition all partitions are automatically created on-demand when matching data arrives.
Orders table partitioned by order_date with a predefined daily interval, starting with '01-Jan-2012'
REF Partitioning or
Partition by Reference
Partitioning for a child table is inherited from the parent table through a primary key - foreign key relationship. The partitioning keys are not stored in actual columns in the child table.
(parent) Orders table range partitioned by order_date and inherits the partitioning technique to (child) order lines table. Column order_date is only present in the parent orders table
Virtual column-based
partitioning
Defined by one of the above mentioned partition techniques and the partitioning key is based on a virtual column. Virtual columns are not stored on disk and only exist as metadata.
Orders table has a virtual column that derives the sales region based on the first three digits of the customer account number. The orders table is then list partitioned by sales region.

Single-Level Partitioning: A table is defined by specifying one of the following data distribution methodologies, using one or more columns as the partitioning key:
  1. Range: It maps data to partitions based on ranges of values of the partitioning key that you establish for each partition. It is the most common type of partitioning and is often used with dates.
  2. List: It enables you to explicitly control how rows map to partitions by specifying a list of discrete values for the partitioning key in the description for each partition. The advantage of list partitioning is that you can group and organize unordered and unrelated sets of data in a natural way.
  3. Hash: It maps data to partitions based on a hashing algorithm that Oracle applies to the partitioning key that you identify. Hashing algorithm evenly distributes rows among partitions, giving partitions approximately the same size.

Composite Partitioning: It is a combination of the basic data distribution methods; a table is partitioned by one data distribution method and then each partition is further subdivided into subpartitions using a second data distribution method. It supports historical operations, such as adding new range partitions, but also provides higher degrees of potential partition pruning and finer granularity of data placement through subpartitioning.
  1. Range-Range: It enables logical range partitioning along two dimensions; for example, partition by order_date and range subpartition by shipping_date.
  2. Range-Hash: It partitions data using the range method, and within each partition, subpartitions it using the hash method. Composite range-hash partitioning provides the improved manageability of range partitioning and the data placement, striping, and parallelism advantages of hash partitioning.
  3. Range-List: It partitions data using the range method, and within each partition, subpartitions it using the list method. Composite range-list partitioning provides the manageability of range partitioning and the explicit control of list partitioning for the subpartitions.
  4. List-Range: It enables logical range subpartitioning within a given list partitioning strategy; for example, list partition by country_id and range subpartition by order_date.
  5. List-Hash: It enables hash subpartitioning of a list-partitioned object; for example, to enable partition-wise joins.
  6. List-List: It enables logical list partitioning along two dimensions; for example, list partition by country_id and list subpartition by sales_channel.

Manageability Extensions: The following extensions significantly enhance the manageability of partitioned tables
Interval Partitioning: It is an extension of range partitioning which instructs the DB to automatically create partitions of a specified interval when data inserted into the table exceeds all of the existing range partitions. Range partitioning key value determines the high value of the range partitions, which is called the transition point, and DB creates interval partitions for data with values that are beyond that transition point. 
   When using interval partitioning, consider the following restrictions:
  • You can only specify one partitioning key column, and it must be of NUMBER or DATE type.
  • Interval partitioning is not supported for index-organized tables.
  • You cannot create a domain index on an interval-partitioned table.
   You can create single-level interval partitioned tables and the following composite partitioned tables:
  • Interval-Range
  • Interval-Hash
  • Interval-List
Partition Advisor: It is part of the SQL Access Advisor. Beginning with 11gR2, the SQL Access Advisor has been enhanced to generate partitioning recommendations, in addition to the ones it already provides for indexes, materialized views, and materialized view logs. 
  • It can recommend a partitioning strategy for a table based on a supplied workload of SQL statements which can be supplied by the SQL Cache, a SQL Tuning set, or be defined by the user.
  • Recommendations generated by the SQL Access Advisor - either for Partitioning only or holistically - will show the anticipated performance gains that will result if they are implemented. The generated script can either be implemented manually or submitted onto a queue within Oracle Enterprise Manager.
  • With the extension of partitioning advice, customers not only can get recommendation specifically for partitioning but also a more comprehensive holistic recommendation of SQL Access Advisor, improving the collective performance of SQL statements overall. 
  • The Partition Advisor, integrated into the SQL Access Advisor, is part of Oracle's Tuning Pack, an extra licensable option. It can be used from within Enterprise Manager or via a command line interface.

Partitioning Key Extensions: The following extensions extend the flexibility in defining partitioning keys
Reference partitioning: It enables the partitioning of two tables that are related to one another by referential constraints. The partitioning key is resolved through an existing parent-child relationship, enforced by enabled and active primary key and foreign key constraints.
  • The benefit of this extension is that tables with a parent-child relationship can be logically equi-partitioned by inheriting the partitioning key from the parent table without duplicating the key columns. The logical dependency also automatically cascades partition maintenance operations, thus making application development easier and less error-prone.
  • All basic partitioning strategies are available for reference partitioning. Interval partitioning cannot be used with reference partitioning.
Virtual Column-Based partitioning: In previous releases of Oracle DB, a table could only be partitioned if the partitioning key physically existed in the table.
  • Virtual columns remove that restriction and enable the partitioning key to be defined by an expression, using one or more existing columns of a table. The expression is stored as metadata only.
  • Virtual column-based partitioning is supported with all basic partitioning strategies, including reference partitioning, and interval and interval-* composite partitioning.

Partitioned Indexes
Just like partitioned tables, partitioned indexes improve manageability, availability, performance, and scalability. They can either be partitioned independently (global indexes) or automatically linked to a table's partitioning method (local indexes). In general, you should use global indexes for OLTP applications and local indexes for DW or DSS applications. Also, whenever possible, try to use local indexes because they are easier to manage.
Types of Partitioned Indexes
  • Local Indexes: A local index is an index on a partitioned table that is coupled with the underlying partitioned table, 'inheriting' the partitioning strategy from the table. Consequently, each partition of a local index corresponds to one - and only one - partition of the underlying table. The coupling enables optimized partition maintenance; for example, when a table partition is dropped, Oracle simply has to drop the corresponding index partition as well. No costly index maintenance is required. Local indexes are most common in DW environments.
  • Global Partitioned Indexes: A global partitioned index is an index on a partitioned or non-partitioned table that is partitioned using a different partitioning-key / partitioning strategy than the table. Global-partitioned indexes can be partitioned using range or hash partitioning and are uncoupled from the underlying table. For ex, a table could be range-partitioned by month and have 12 partitions, while an index on that table could be hash-partitioned using a different partitioning key and have a different number of partitions. Global partitioned indexes are more common for OLTP than for DW environments.
  • Global Non-Partitioned Indexes: A global non-partitioned index is essentially identical to an index on a non-partitioned table. The index structure is not partitioned and uncoupled from the underlying table. In DW environments, the most common usage of global non-partitioned indexes is to enforce PK constraints. OLTP environments on the other hand mostly rely on global non-partitioned indexes.
Deciding on the Type of Partitioned Index to use: Consider the following guidelines as shown below
  • If the table partitioning column is a subset of the index keys, then use a local index. If this is the case, then you are finished. If this is not the case, then continue to guideline 2.
  • If the index is unique and does not include the partitioning key columns, then use a global index. If this is the case, then you are finished. Otherwise, continue to guideline 3.
  • If your priority is manageability, then use a local index. If this is the case, then you are finished. If this is not the case, continue to guideline 4.
  • If the application is an OLTP type and users need quick response times, and then use a global index. If the application is a DSS type and users are more interested in throughput, and then use a local index.
Miscellaneous Information about Creating Indexes on Partitioned Tables
  • You can create bitmap indexes on partitioned tables, with the restriction that the bitmap indexes must be local to the partitioned table. They cannot be global indexes.
  • Global indexes can be unique. Local indexes can only be unique if the partitioning key is a part of the index key.
Partitioned Indexes on Composite Partitions
Here are a few points to remember when using partitioned indexes on composite partitions:
  • Subpartitioned indexes are always local and stored with the table subpartition by default.
  • Tablespaces can be specified at either index or index subpartition levels.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.