4
The Optimizer

This chapter discusses SQL processing, optimization methods, and how the optimizer chooses to execute SQL statements.

This chapter contains the following sections:

SQL Processing Architecture

The SQL processing architecture is comprised of the following main components:

Figure 4-1 illustrates the SQL processing architecture:

Figure 4-1 SQL Processing Architecture

The parser, the optimizer, and the row source generator form the SQL Compiler. This compiles the SQL statements into a shared cursor. Associated with the shared cursor is the execution plan.

Parser

The parser performs two functions:

Syntax analysis: This checks SQL statements for correct syntax.
Semantic analysis: This checks, for example, that the current database objects and object attributes referenced are correct.

Optimizer

The optimizer is the heart of the SQL processing engine. The Oracle server provides two methods of optimization: rule-based optimizer (RBO) and cost-based optimizer (CBO).

Row Source Generator

The row source generator receives the optimal plan from the optimizer. It outputs the execution plan for the SQL statement. The execution plan is a collection of row sources structured in the form of a tree. A row source is an iterative control structure. It processes a set of rows, one row at a time, in an iterated manner. A row source produces a row set.

SQL Execution

SQL execution is the component that operates on the execution plan associated with a SQL statement. It then produces the results of the query.

EXPLAIN PLAN

You can examine the execution plan chosen by the optimizer for a SQL statement by using the EXPLAIN PLAN statement. This causes the optimizer to choose the execution plan, and then insert data describing the plan into a database table.

Simply issue the EXPLAIN PLAN statement and then query the output table. The following output table describes the statement examined in the previous section:

ID    OPERATION           OPTIONS        OBJECT_NAME 
------------------------------------------------------------ 
0     SELECT STATEMENT 
1       FILTER 
2         NESTED LOOPS 
3           TABLE ACCESS  FULL           EMP 
4           TABLE ACCESS  BY ROWID       DEPT 
5             INDEX       UNIQUE SCAN    PK_DEPTNO 
6           TABLE ACCESS  FULL           SALGRADE

Each box in Figure 4-2 and each row in the output table corresponds to a single step in the execution plan. For each row in the listing, the value in the ID column is the value shown in the corresponding box in Figure 4-2.

See Also:
For detailed information on how to use EXPLAIN PLAN and how to produce and interpret its output, see Chapter 5, "Using EXPLAIN PLAN".

What Is The Optimizer?

The optimizer determines the most efficient way to execute a SQL statement. This is an important step in the processing of any data manipulation language (DML) statement: SELECT, INSERT, UPDATE, or DELETE. There are often many different ways to execute a SQL statement; for example, by varying the order in which tables or indexes are accessed. The procedure Oracle uses to execute a statement can greatly affect how quickly the statement executes.

The optimizer considers many factors among alternative access paths. It can use either a a cost-based or a rule-based approach (see "Cost-Based Optimizer (CBO)" and "Rule-Based Optimizer (RBO)").

Note:
The optimizer may not make the same decisions from one version of Oracle to the next. In recent versions, the optimizer may make different decisions based on better information available to it.

You can influence the optimizer's choices by setting the optimizer approach and goal, and by gathering statistics for the CBO. Sometimes, the application designer, who has more information about a particular application's data than is available to the optimizer, can choose a more effective way to execute a SQL statement. The application designer can use hints in SQL statements to specify how the statement should be executed.

See Also:

For more information on optimization goals, see "Choosing an Optimizer Approach and Goal".

For more information on using statistics, see Chapter 8, "Gathering Statistics".

For more information about using hints in SQL statements, see Chapter 7, "Using Optimizer Hints".

Execution Plan

To execute a DML statement, Oracle may need to perform many steps. Each of these steps either retrieves rows of data physically from the database or prepares them in some way for the user issuing the statement. The combination of the steps Oracle uses to execute a statement is called an execution plan. An execution plan includes an access method for each table that the statement accesses and an ordering of the tables (the join order).

See Also:
For descriptions of the various access methods, including indexes, hash clusters, and table scans, see "Access Paths for the RBO" and "Access Paths for the CBO".

The following SQL statement selects the name, job, salary, and department name for all employees whose salaries do not fall into a recommended salary range:

SELECT ename, job, sal, dname 
FROM emp, dept 
WHERE emp.deptno = dept.deptno 
AND NOT EXISTS 

(SELECT * 
FROM salgrade 
WHERE emp.sal BETWEEN losal AND hisal);

Figure 4-2 shows a graphical representation of the execution plan for this SQL statement.

Figure 4-2 An Execution Plan

Steps of Execution Plan

Each step of the execution plan returns a set of rows that either are used by the next step or, in the last step, are returned to the user or application issuing the SQL statement. A set of rows returned by a step is called a row source.

Figure 4-2 is a hierarchical diagram showing the flow of row sources from one step to another. The numbering of the steps reflects the order in which they are displayed in response to the EXPLAIN PLAN statement. Generally, this is not the order in which the steps are executed.

See Also:
EXPLAIN PLAN is described in "EXPLAIN PLAN" section below. The order in which the steps are executed is described in "Execution Order".

Each step of the execution plan either retrieves rows from the database or accepts rows from one or more row sources as input:

Steps indicated by the shaded boxes physically retrieve data from an object in the database. Such steps are called access paths:
- Steps 3 and 6 read all the rows of the emp and salgrade tables, respectively.
- Step 5 looks up each deptno value in the pk_deptno index returned by step 3. There it finds the rowids of the associated rows in the dept table.
- Step 4 retrieves the rows whose rowids were returned by step 5 from the dept table.

Steps indicated by the clear boxes operate on row sources:

Step 2 performs a nested loops operation, accepting row sources from steps 3 and 4, joining each row from step 3 source to its corresponding row in step 4, and returning the resulting rows to step 1.

Step 1 performs a filter operation. It accepts row sources from steps 2 and 6, eliminates rows from step 2 that have a corresponding row in step 6, and returns the remaining rows from step 2 to the user or application issuing the statement.

See Also:
For more information on access paths, see "Access Paths for the RBO" and "Access Paths for the CBO". For more information on the methods by which Oracle joins row sources, see "Optimizing Joins".

Execution Order

The steps of the execution plan are not performed in the order in which they are numbered. Rather, Oracle first performs the steps that appear as leaf nodes in the tree-structured graphical representation of the execution plan (steps 3, 5, and 6 in Figure 4-2). The rows returned by each step become the row sources of its parent step. Then, Oracle performs the parent steps.

For example, Oracle performs the following steps to execute the statement in Figure 4-2:

Oracle performs step 3 and returns the resulting rows, one by one, to step 2.
For each row returned by step 3, Oracle performs the following steps:
- Oracle performs step 5 and returns the resulting rowid to step 4.
- Oracle performs step 4 and returns the resulting row to step 2.
- Oracle performs step 2, joining the single row from step 3 with a single row from step 4, and returns a single row to step 1.
- Oracle performs step 6 and returns the resulting row, if any, to step 1.
- Oracle performs step 1. If a row is not returned from step 6, then Oracle returns the row from step 2 to the user issuing the SQL statement.

Note that Oracle performs steps 5, 4, 2, 6, and 1 once for each row returned by step 3. If a parent step requires only a single row from its child step before it can be executed, then Oracle performs the parent step (and possibly the rest of the execution plan) as soon as a single row has been returned from the child step. If the parent of that parent step also can be activated by the return of a single row, then it is executed as well.

Thus, the execution can cascade up the tree, possibly to encompass the rest of the execution plan. Oracle performs the parent step and all cascaded steps once for each row in turn retrieved by the child step. The parent steps that are triggered for each row returned by a child step include table accesses, index accesses, nested loops joins, and filters.

If a parent step requires all rows from its child step before it can be executed, then Oracle cannot perform the parent step until all rows have been returned from the child step. Such parent steps include sorts, sort-merge joins, and aggregate functions.

Choosing an Optimizer Approach and Goal

By default, the goal of the CBO is the best throughput; i.e., using the least amount of resources necessary to process all rows accessed by the statement.

Oracle can also optimize a statement with the goal of best response time; i.e., using the least amount of resources necessary to process the first row accessed by a SQL statement.

For parallel execution of a SQL statement, the optimizer can choose to minimize elapsed time at the expense of resource consumption. The initialization parameter OPTIMIZER_PERCENT_PARALLEL specifies how much the optimizer attempts to parallelize execution.

The execution plan produced by the optimizer can vary depending on the optimizer's goal. Optimizing for best throughput is more likely to result in a full table scan rather than an index scan, or a sort-merge join rather than a nested loops join. Optimizing for best response time, however, more likely results in an index scan or a nested loops join.

For example, suppose you have a join statement that is executable with either a nested loops operation or a sort-merge operation. The sort-merge operation may return the entire query result faster, while the nested loops operation may return the first row faster. If your goal is to improve throughput, then the optimizer is more likely to choose a sort-merge join. If your goal is to improve response time, then the optimizer is more likely to choose a nested loops join.

Choose a goal for the optimizer based on the needs of your application:

For applications performed in batch, such as Oracle Reports applications, optimize for best throughput. Throughput is usually more important in batch applications, because the user initiating the application is only concerned with the time necessary for the application to complete. Response time is less important, because the user does not examine the results of individual statements while the application is running.
For interactive applications, such as Oracle Forms applications or SQL*Plus queries, optimize for best response time. Response time is usually important in interactive applications, because the interactive user is waiting to see the first row accessed by the statement.
For queries that use ROWNUM to limit the number of rows, optimize for best response time. Because of the semantics of ROWNUM queries, optimizing for response time provides the best results.

The optimizer's behavior when choosing an optimization approach and goal for a SQL statement is affected by the following factors:

OPTIMIZER_MODE Initialization Parameter

The OPTIMIZER_MODE initialization parameter establishes the default behavior for choosing an optimization approach for the instance. It can have the following values:

CHOOSE

The optimizer chooses between a cost-based approach and a rule-based approach based on whether statistics are available for the CBO. If the data dictionary contains statistics for at least one of the accessed tables, then the optimizer uses a cost-based approach and optimizes with a goal of best throughput. If the data dictionary contains no statistics for any of the accessed tables, then the optimizer uses a rule-based approach. This is the default value for the parameter.

ALL_ROWS

The optimizer uses a cost-based approach for all SQL statements in the session regardless of the presence of statistics and optimizes with a goal of best throughput (minimum resource use to complete the entire statement).

FIRST_ROWS

The optimizer uses a cost-based approach for all SQL statements in the session regardless of the presence of statistics and optimizes with a goal of best response time (minimum resource use to return the first row of the result set).

RULE

The optimizer chooses a rule-based approach for all SQL statements regardless of the presence of statistics.

If the optimizer uses the cost-based approach for a SQL statement, and if some tables accessed by the statement have no statistics, then the optimizer uses internal information (such as the number of data blocks allocated to these tables) to estimate other statistics for these tables.

Statistics in the Data Dictionary

Oracle stores statistics about columns, tables, clusters, indexes, and partitions in the data dictionary for the CBO. You can collect exact or estimated statistics about physical storage characteristics and data distribution in these schema objects by using the DBMS_STATS package, the ANALYZE statement, or the COMPUTE STATISTICS clause of the CREATE or ALTER INDEX statement.

To provide the optimizer with up-to-date statistics, you should collect new statistics after modifying the data or structure of schema objects in ways that could affect their statistics.

See Also:
For more information about statistics, see Chapter 8, "Gathering Statistics".

OPTIMIZER_GOAL Parameter of the ALTER SESSION Statement

The OPTIMIZER_GOAL parameter of the ALTER SESSION statement can override the optimizer approach and goal established by the OPTIMIZER_MODE initialization parameter for an individual session.

The value of this parameter affects the optimization of SQL statements issued by stored procedures and functions called during the session, but it does not affect the optimization of recursive SQL statements that Oracle issues during the session.

The OPTIMIZER_GOAL parameter can have these values:

CHOOSE

The optimizer chooses between a cost-based approach and a rule-based approach based on whether statistics are available for the cost-based approach. If the data dictionary contains statistics for at least one of the accessed tables, then the optimizer uses a cost-based approach and optimizes with a goal of best throughput. If the data dictionary contains no statistics for any of the accessed tables, then the optimizer uses a rule-based approach.

ALL_ROWS

The optimizer uses a cost-based approach for all SQL statements in the session regardless of the presence of statistics and optimizes with a goal of best throughput (minimum resource use to complete the entire statement).

FIRST_ROWS

The optimizer uses a cost-based approach for all SQL statements in the session regardless of the presence of statistics and optimizes with a goal of best response time (minimum resource use to return the first row of the result set).

RULE

The optimizer chooses a rule-based approach for all SQL statements issued to the Oracle instance regardless of the presence of statistics.

Changing the Goal with Hints

A FIRST_ROWS, ALL_ROWS, CHOOSE, or RULE hint in an individual SQL statement can override the effects of both the OPTIMIZER_MODE initialization parameter and the OPTIMIZER_GOAL parameter of the ALTER SESSION statement.

By default, the cost-based approach optimizes for best throughput. You can change the goal of the CBO in the following ways:

To change the goal of the CBO for all SQL statements in your session, issue an ALTER SESSION SET OPTIMIZER_MODE statement with the ALL_ROWS or FIRST_ROWS clause.
To specify the goal of the CBO for an individual SQL statement, use the ALL_ROWS or FIRST_ROWS hint.

Example

The following statement changes the goal of the CBO for your session to best response time:

ALTER SESSION SET OPTIMIZER_MODE = FIRST_ROWS;

See Also:
For information on how to use hints, see Chapter 7, "Using Optimizer Hints".

Cost-Based Optimizer (CBO)

In general, you should always use the cost-based approach. The rule-based approach is available for the benefit of existing applications.

The CBO determines which execution plan is most efficient by considering available access paths and by factoring in information based on statistics for the schema objects (tables or indexes) accessed by the SQL statement. The CBO also considers hints, which are optimization suggestions placed in a comment in the statement.

See Also:
For more information on hints, see Chapter 7, "Using Optimizer Hints".

The CBO consists of the following steps:

The optimizer generates a set of potential plans for the SQL statement based on its available access paths and hints.
The optimizer estimates the cost of each plan based on statistics in the data dictionary for the data distribution and storage characteristics of the tables, indexes, and partitions accessed by the statement.

The cost is an estimated value proportional to the expected resource use needed to execute the statement with a particular plan. The optimizer calculates the cost of each possible access method and join order based on the estimated computer resources, including (but not limited to) I/O and memory, that are required to execute the statement using the plan.

Serial plans with greater costs take more time to execute than those with smaller costs. When using a parallel plan, however, resource use is not directly related to elapsed time.
The optimizer compares the costs of the plans and chooses the one with the smallest cost.

To maintain the effectiveness of the CBO, you must gather statistics and keep them current. Gather statistics on your objects using either of the following:

For releases prior to Oracle8i, use the ANALYZE statement.
For Oracle8i releases, use the DBMS_STATS package.

For table columns which contain skewed data (i.e., values with large variations in number of duplicates), you must collect histograms.

The resulting statistics provide the CBO with information about data uniqueness and distribution. Using this information, the CBO is able to compute plan costs with a high degree of accuracy. This enables the CBO to choose the best execution plan based on the least cost.

See Also:
For detailed information on gathering statistics, see Chapter 8, "Gathering Statistics".

Architecture of the CBO

The CBO consists of the following three main components:

The CBO architecture is illustrated in Figure 4-3.

Figure 4-3 Cost-Based Optimizer Architecture

Query Transformer

The input to query transformer is a parsed query, which is represented by a set of query blocks. The query blocks are nested or interrelated to each other. The form of the query determines how the query blocks are interrelated to each other. The main objective of the query transformer is to determine if it is advantageous to change the form of the query, so that it enables generation of a better query plan. Three different query transformation techniques are employed by the query transformer: view merging, subquery unnesting, and query rewrite using materialized views. Any combination of these transformations may be applied to a given query.

View Merging

Each view referenced in a query is expanded by the parser into a separate query block. The query block essentially represents the view definition, and therefore the result of a view. One option for the optimizer is to optimize the view query block separately, and generate a subplan. Then, optimize the rest of the query by using the view subplan in the generation of overall query plan. Doing so usually leads to a sub-optimal query plan, because the view is optimized separately from rest of the query.

The query transformer removes the potential sub-optimality by merging the view query block into the query block that contains the view. Most of the views are merged, with an exception of few types of views. When a view is merged, the query block representing the view is merged into the containing query block. Now, there is no need to generate a subplan, because view query block is eliminated.

For those views that are not merged, the query transformer pushes the relevant predicates from the containing query block into the view query block. Doing so improves the subplan of the non-merged view, because the pushed in predicates act either as index drivers or as filters.

Subquery Unnesting

Like a view, a subquery is also represented by a separate query block. Because a subquery is nested within the main query or another subquery, this constrains the plan generator in trying out different possible plans before it finds a plan with the lowest cost. For this reason, the query plan produced may not be the optimal one. The restrictions due to the nesting of subqueries can be removed by unnesting the subqueries and converting them into joins. Most of the subqueries are unnested. For those subqueries that remain as nested subqueries, separate subplans are generated. To improve the execution speed of the overall query plan, the subplans are ordered in an efficient manner.

See Also:
For more information on subquery unnesting, see "Use Care When Unnesting Subqueries" in Chapter 9, "Optimizing SQL Statements".

Query Rewrite with Materialized Views

A materialized view is like a query whose result is materialized and stored in a table. When a user query is found compatible with the query associated with a materialized view, the user query can be rewritten in terms of the materialized view. Doing so improves the execution of the user query, because most of the query result has already been precomputed. The query transformer looks for any materialized views that are compatible with the user query, and selects one or more materialized views to rewrite the user query. The use of materialized views to rewrite a query is cost-based. That is, the query is not rewritten if the plan generated without the materialized views has lower cost than the plan generated with the materialized views.

See Also:
For more information on query rewrite, see Oracle8i Data Warehousing Guide.

Estimator

The estimator is the heart of the CBO. Its estimates three different types of measures: selectivity, cardinality, and cost. These measures are related to each other, and one is derived from another. The end goal of the estimator is to estimate the overall cost of a given plan. If statistics are available, then the estimator uses them to compute the measures. The statistics improve the degree of accuracy of the measures.

Selectivity

The first type of measure is the selectivity, which represents a fraction of rows from a row set. The row set can be a base table, a view, or the result of a join or a GROUP BY operator. The selectivity is tied to a query predicate, such as last_name = 'Smith', or a combination of predicates, such as last_name = 'Smith' AND job_type = 'Clerk'. A predicate acts as a filter that filters certain number of rows from a row set. Therefore, the selectivity of a predicate indicates how many rows from a row set will pass the predicate test. The selectivity lies in the value range 0.0 to 1.0. A selectivity of 0.0 means that no rows will be selected from a row set, and a selectivity of 1.0 means that all rows will be selected.

The estimator uses an internal default value for the selectivity if no statistics are available. Different internal defaults are used depending on the predicate type. For example, the internal default for an equality predicate (last_name = 'Smith') is lower than the internal default for a range predicate (last_name > 'Smith'). This is because an equality predicate is expected to usually return a smaller fraction of rows than a range predicate.

When statistics are available, the estimator estimates selectivity based on statistics. For example, for an equality predicate (last_name = 'Smith') the selectivity is set to the reciprocal of the number of distinct values of last_name, because the query selects rows that all contain one out of N distinct values. If a histogram is available on the last_name column, then the estimator uses it instead of the number of distinct values statistic. The histogram captures the distribution of different values in a column, so its use yields better selectivity estimate. Therefore, having histograms on columns that contain skewed data (i.e., values with large variations in number of duplicates) greatly helps the CBO to generate good plans.

Cardinality

Cardinality represents the number of rows in a row set. Here, the row set can be a base table, a view, or the result from a join or GROUP BY operator. The base cardinality is the number of rows in a base table. The base cardinality can be captured by analyzing the table. If table statistics are not available, then the estimator uses the number of extents occupied by the table to estimate the base cardinality.

The effective cardinality is the number of rows that will be selected from a base table. The effective cardinality is dependent on the predicates specified on different columns of a base table. This is because each predicate acts as a successive filter on the rows of a base table. The effective cardinality is computed as the product of base cardinality and combined selectivity of all predicates specified on a table. When there is no predicate on a table, its effective cardinality equals its base cardinality.

The join cardinality is the number of rows produced when two row sets are joined together. A join is a Cartesian product of two row sets with the join predicate applied as a filter to the result. Therefore, the join cardinality is the product of the cardinalities of two row sets, multiplied by the selectivity of the join predicate.

A distinct cardinality is the number of distinct values in a column of a row set. The distinct cardinality of a row set is based on the data in the column. For example, in a row set of 100 rows, if distinct column values are found in 20 rows, then the distinct cardinality is 20.

The group cardinality is the number of rows produced from a row set after the GROUP BY operator is applied. The effect of the GROUP BY operator is to decrease the number of rows in a row set. The group cardinality depends on the distinct cardinality of each of the grouping columns. For example, if a row set of 100 rows is grouped by colx, whose distinct cardinality is 30, then the group cardinality is 30. If the row set of 100 rows is grouped by colx and coly, and distinct cardinalities of colx and coly are 30 and 60 respectively, then the group cardinality lies between max(30,60) and 100.

Cost

The cost represents units of work or resource used. The CBO uses disk I/O as a unit of work. The other possible work units are cpu and network usage. So, the cost used by the CBO represents an estimate of the number of disk I/Os incurred in performing an operation. The operation can be scanning a table, accessing rows from a table using an index, joining two tables together, or sorting a row set. The cost of a query plan is the number of disk I/Os that are expected to be incurred when the query is executed and its result produced.

The access cost represents the number of units of work done in accessing data from a base table. The access path can be a table scan, a fast full index scan, or an index scan. During table scan or fast full index scan, multiple blocks are read from the disk in a single I/O operation. Therefore, the cost of a table scan or a fast full index scan depends on the number of blocks to scan and the multiblock read count value. The cost for an index scan depends on the levels in the B-tree, the number of index leaf blocks to scan, and the number of rows to fetch using the rowid in the index keys. The cost to fetch rows using rowids depends on the index clustering factor. The higher the clustering factor, the more randomly scattered the individual rows are on the disk. So, a higher clustering factor means it costs more to fetch rows by rowid.

The join cost represents the combination of the individual access costs of the two row sets being joined. In a join, one row set is called inner, and the other is called outer. In a nested loops join, for every row in the outer row set, the inner row set is accessed to find all matching rows to join. Therefore, in a nested loops join, the inner row set is accessed as many times as the number of rows in the outer row set. The cost of nested loops join = outer access cost + (inner access cost * outer cardinality).

In sort merge join, the two row sets being joined are sorted by the join keys, if they are not already in key order. The cost of sort merge join = outer access cost + inner access cost + sort costs (if sort used).

In hash join, the inner row set is hashed into memory, and a hash table is built using the join key. Then, each row from the outer row set is hashed, and the hash table is probed to join to all matching rows. If the inner row set is very large, then only a portion of it is hashed into memory. This is called a hash partition.

Each row from the outer row set is hashed to probe matching rows in the hash partition. After this, the next portion of the inner row set is hashed into memory, followed by a probe from the outer row set. This process is repeated until all partitions of the inner row set are exhausted. The cost of hash join = (outer access cost * # of hash partitions) + inner access cost.

See Also:
For more information on joins, see "Optimizing Joins".

Plan Generator

The main function of the plan generator is to try out different possible plans for a given query and pick the one that has the lowest cost. Many different plans are possible because of the various combination of different access paths, join methods and join orders that can be used to access and process data in different ways and produce the same result.

A join order is the order in which different join items (such as tables) are accessed and joined together. For example, in a join order of t1, t2, and t3, table t1 is accessed first. This is followed by access of t2, whose data is joined to t1 data to produce a join of t1 and t2. Finally, t3 is accessed, and its data is joined to the result of join between t1 and t2.

The plan for a query is established by first generating subplans for each of the unnested subqueries and non-merged views. Each unnested subquery or non-merged view is represented by a separate query block. The query blocks are optimized separately in a bottom-up order. That is, the innermost query block is optimized first, and a subplan is generated for it. The outermost query block, which represents the entire query, is optimized last.

The plan generator explores different plans for a query block by trying out different access paths, join methods, and join orders. The number of possible plans for a query block is proportional to the number of join items in the FROM clause. This number rises exponentially with the number of join items.

Because of this reason, the plan generator uses an internal cutoff to reduce the number of plans it tries to find the one with the lowest cost. The cutoff is based on the cost of the current best plan. If current best cost is large, then the plan generator tries harder (i.e., explores more alternate plans) to find a better plan with lower cost. If current best cost is small, then the plan generator ends the search swiftly, because further cost improvement will not be significant.

The cutoff works very well if the plan generator starts with an initial join order that produces a plan with cost close to optimal. Finding a good initial join order is a difficult problem. The plan generator uses a simple heuristic for the initial join order. It orders the join items by their effective cardinalities. The join item with the smallest effective cardinality goes first, and the join item with the largest effective cardinality goes last.

Features that Require the CBO

The use of any of the following features requires the use of the CBO:

Partitioned tables
Index-organized tables
Reverse key indexes
Function-based indexes
SAMPLE clauses in a SELECT statement
Parallel execution and parallel DML
Star transformations
Star joins
Extensible optimizer
Query rewrite (materialized views)
Progress meter
Hash joins
Bitmap indexes
Partition views (release 7.3)

Note:
Even if the parameter OPTIMIZER_MODE is set to RULE, the use of these features enables the CBO.

Using the CBO

To use the CBO for a statement, collect statistics for the tables accessed by the statement, and enable the CBO using one of the following methods:

Make sure that the OPTIMIZER_MODE initialization parameter is set to its default value of CHOOSE.
To enable the CBO for your session only, issue an ALTER SESSION SET OPTIMIZER_MODE statement with the ALL_ROWS or FIRST_ROWS clause.
To enable the CBO for an individual SQL statement, use any hint other than RULE.

The plans generated by the CBO depend upon the sizes of the tables, and potentially on the data distributions as well, if histograms are being used. When using the CBO with a small amount of data to test an application prototype, do not assume that the plan chosen for the full-size database will be the same as that chosen for the prototype.

See Also:
For information on enabling the CBO, see "CBO Parameters".

Access Paths for the CBO

One of the most important choices the optimizer makes when formulating an execution plan is how to retrieve data from the database. For any row in any table accessed by a SQL statement, there may be many access paths by which that row can be located and retrieved. The optimizer chooses one of them.

This section describes the basic methods by which Oracle can access data.

See Also:
For the a list of the access paths that are available for the RBO, as well as their ranking, see "Access Paths for the RBO".

Full Table Scans

A full table scan retrieves rows from a table. To perform a full table scan, Oracle reads all rows in the table, examining each row to determine whether it satisfies the statement's WHERE clause. Oracle reads every data block allocated to the table sequentially, so a full table scan can be performed very efficiently using multiblock reads. Oracle reads each data block only once.

Sample Table Scans

A sample table scan retrieves a random sample of data from a table. This access method is used when the statement's FROM clause includes the SAMPLE clause or the SAMPLE BLOCK clause. To perform a sample table scan when sampling by rows (the SAMPLE clause), Oracle reads a specified percentage of rows in the table and examines each of these rows to determine whether it satisfies the statement's WHERE clause. To perform a sample table scan when sampling by blocks (the SAMPLE BLOCK clause), Oracle reads a specified percentage of the table's blocks and examines each row in the sampled blocks to determine whether it satisfies the statement's WHERE clause.

Oracle does not support sample table scans when the query involves a join or a remote table. However, you can perform an equivalent operation by using a CREATE TABLE AS SELECT query to materialize a sample of an underlying table and then rewrite the original query to refer to the newly created table sample. Additional queries can be written to materialize samples for other tables. Sample table scans require the CBO.

Example:

The following statement uses a sample table scan to access 1% of the emp table, sampling by blocks:

SELECT * 
    FROM emp SAMPLE BLOCK (1);

The EXPLAIN PLAN output for this statement might look like this:

OPERATION                 OPTIONS        OBJECT_NAME 
----------------------------------------------------- 
SELECT STATEMENT 
  TABLE ACCESS            SAMPLE         EMP

Table Access by Rowid

A table access by rowid also retrieves rows from a table. The rowid of a row specifies the datafile and data block containing the row and the location of the row in that block. Locating a row by its rowid is the fastest way for Oracle to find a single row.

To access a table by rowid, Oracle first obtains the rowids of the selected rows, either from the statement's WHERE clause or through an index scan of one or more of the table's indexes. Oracle then locates each selected row in the table based on its rowid.

Cluster Scans

From a table stored in an indexed cluster, a cluster scan retrieves rows that have the same cluster key value. In an indexed cluster, all rows with the same cluster key value are stored in the same data blocks. To perform a cluster scan, Oracle first obtains the rowid of one of the selected rows by scanning the cluster index. Oracle then locates the rows based on this rowid.

Hash Scans

Oracle can use a hash scan to locate rows in a hash cluster based on a hash value. In a hash cluster, all rows with the same hash value are stored in the same data blocks. To perform a hash scan, Oracle first obtains the hash value by applying a hash function to a cluster key value specified by the statement. Oracle then scans the data blocks containing rows with that hash value.

Index Scans

An index scan retrieves data from an index based on the value of one or more columns of the index. To perform an index scan, Oracle searches the index for the indexed column values accessed by the statement. If the statement accesses only columns of the index, then Oracle reads the indexed column values directly from the index, rather than from the table.

The index contains not only the indexed value, but also the rowids of rows in the table having that value. Therefore, if the statement accesses other columns in addition to the indexed columns, then Oracle can find the rows in the table with a table access by rowid or a cluster scan.

An index scan can be one of the following types:

Unique scan

This returns only a single rowid. Oracle performs a unique scan only in cases in which a single rowid is required, rather than many rowids. For example, Oracle performs a unique scan if there is a UNIQUE or a PRIMARY KEY constraint that guarantees that the statement accesses only a single row.

Range scan

This can return zero or more rowids, depending on how many rows the statement accesses.

Full scan

This is available if a predicate references one of the columns in the index. The predicate does not need to be an index driver. Full scan is also available when there is no predicate, if all of the columns in the table referenced in the query are included in the index and at least one of the index columns is not null. Full scan can be used to eliminate a sort operation. It reads the blocks singly.

Fast full scan

This is an alternative to a full table scan when the index contains all the columns that are needed for the query, and at least one column in the index key has the NOT NULL constraint. Fast full scan accesses the data in the index itself, without accessing the table. It cannot be used to eliminate a sort operation. It reads the entire index using multiblock reads (unlike a full index scan) and can be parallelized.
Fast full scan is available only with the CBO. You can specify it with the initialization parameter OPTIMIZER_FEATURES_ENABLE or the INDEX_FFS hint. Fast full index scans cannot be performed against bitmap indexes.

Index join

This is a hash join of several indexes that together contain all the columns from the table that are referenced in the query. If an index join is used, then no table access is needed, because all the relevant column values can be retrieved from the indexes. An index join cannot be used to eliminate a sort operation.
Index join is available only with the CBO. You can specify it with the initialization parameter OPTIMIZER_FEATURES_ENABLE or the INDEX_JOIN hint.
Example: The following statement uses an index join to access the empno and sal columns, both of which are indexed, in the emp table:
SELECT empno, sal FROM emp WHERE sal > 2000;

The EXPLAIN PLAN output for this statement might look like this:
OPERATION OPTIONS OBJECT_NAME --------------------------------------------------------- SELECT STATEMENT VIEW index$_join$_001 HASH JOIN INDEX RANGE SCAN EMP_SAL INDEX FAST FULL SCAN EMP_EMPNO

Bitmap

This uses a bitmap for key values and a mapping function that converts each bit position to a rowid. Bitmaps can efficiently merge indexes that correspond to several conditions in a WHERE clause, using Boolean operations to resolve AND and OR conditions.
Bitmap access is available only with the CBO.

Attention:
Bitmap indexes are available only if you have purchased the Oracle8i Enterprise Edition. For more information on purchasing options, see Getting to Know Oracle8i.

How the CBO Chooses an Access Path

The CBO chooses an access path based on the following factors:

The available access paths for the statement.
The estimated cost of executing the statement using each access path or combination of paths.

To choose an access path, the optimizer first determines which access paths are available by examining the conditions in the statement's WHERE clause (and its FROM clause for the SAMPLE or SAMPLE BLOCK clause). The optimizer then generates a set of possible execution plans using available access paths and estimates the cost of each plan using the statistics for the index, columns, and tables accessible to the statement. Finally, optimizer chooses the execution plan with the lowest estimated cost.

The optimizer's choice among available access paths can be overridden with hints, except when the statement's FROM clause contains SAMPLE or SAMPLE BLOCK.

See Also:
For information about hints in SQL statements, see Chapter 7, "Using Optimizer Hints".

To choose among available access paths, the optimizer considers the following factors:

Selectivity: The selectivity is the percentage of rows in the table that the query selects. A query that selects a small percentage of a table's rows has good selectivity, while a query that selects a large percentage of rows has poor selectivity.

The optimizer is more likely to choose an index scan over a full table scan for a query with good selectivity than for one with poor selectivity. Index scans are usually more efficient than full table scans for queries that access only a small percentage of a table's rows, while full table scans are usually faster for queries that access a large percentage.

To determine the selectivity of a query, the optimizer considers these sources of information:
- The operators used in the WHERE clause.
- Unique and primary key columns used in the WHERE clause.
- Statistics for the table.
The examples below illustrate how the optimizer uses selectivity.
DB_FILE_MULTIBLOCK_READ_COUNT: Full table scans use multiblock reads, so the cost of a full table scan depends on the number of multiblock reads required to read the entire table. This depends on the number of blocks read by a single multiblock read, which is specified by the initialization parameter DB_FILE_MULTIBLOCK_READ_COUNT. For this reason, the optimizer may be more likely to choose a full table scan when the value of this parameter is high.

Example 1:

The following query uses an equality condition in its WHERE clause to select all employees named Jackson:

SELECT * 
  FROM emp 
  WHERE ename = 'JACKSON';

If the ename column is a unique or primary key, then the optimizer determines that there is only one employee named Jackson, and the query returns only one row. In this case, the query is very selective, and the optimizer is most likely to access the table using a unique scan on the index that enforces the unique or primary key.

Example 2:

Consider again the query in the previous example. If the ename column is not a unique or primary key, then the optimizer can use these statistics to estimate the query's selectivity:

USER_TAB_COLUMNS.NUM_DISTINCT is the number of values for each column in the table.
USER_TABLES.NUM_ROWS is the number of rows in each table.

By dividing the number of rows in the emp table by the number of distinct values in the ename column, the optimizer estimates what percentage of employees have the same name. By assuming that the ename values are uniformly distributed, the optimizer uses this percentage as the estimated selectivity of the query.

Example 3:

The following query selects all employees with employee ID numbers less than 7500:

SELECT * 
  FROM emp 
  WHERE empno < 7500;

To estimate the selectivity of the query, the optimizer uses the boundary value of 7500 in the WHERE clause condition and the values of the HIGH_VALUE and LOW_VALUE statistics for the empno column, if available. These statistics can be found in the USER_TAB_COL_STATISTICS view (or the USER_TAB_COLUMNS view). The optimizer assumes that empno values are evenly distributed in the range between the lowest value and highest value. The optimizer then determines what percentage of this range is less than the value 7500 and uses this value as the estimated selectivity of the query.

Example 4:

The following query uses a bind variable rather than a literal value for the boundary value in the WHERE clause condition:

SELECT * 
  FROM emp 
  WHERE empno < :e1;

The optimizer does not know the value of the bind variable e1. Indeed, the value of e1 may be different for each execution of the query. For this reason, the optimizer cannot use the means described in the previous example to determine selectivity of this query. In this case, the optimizer heuristically guesses a small value for the selectivity. This is an internal default value. The optimizer makes this assumption whenever a bind variable is used as a boundary value in a condition with one of the operators <, >, <=, or >=.

The optimizer's treatment of bind variables can cause it to choose different execution plans for SQL statements that differ only in the use of bind variables rather than constants. In one case in which this difference may be especially apparent, the optimizer may choose different execution plans for an embedded SQL statement with a bind variable in an Oracle precompiler program and the same SQL statement with a constant in SQL*Plus.

Example 5:

The following query uses two bind variables as boundary values in the condition with the BETWEEN operator:

SELECT * 
  FROM emp 
  WHERE empno BETWEEN :low_e AND :high_e;

The optimizer decomposes the BETWEEN condition into these two conditions:

empno >= :low_e 
empno <= :high_e

The optimizer heuristically estimates a small selectivity (an internal default value) for indexed columns in order to favor the use of the index.

Example 6:

The following query uses the BETWEEN operator to select all employees with employee ID numbers between 7500 and 7800:

SELECT * 
  FROM emp 
  WHERE empno BETWEEN 7500 AND 7800;

To determine the selectivity of this query, the optimizer decomposes the WHERE clause condition into these two conditions:

empno >= 7500 
empno <= 7800

The optimizer estimates the individual selectivity of each condition using the means described in a previous example. The optimizer then uses these selectivities (S1 and S2) and the absolute value function (ABS) in this formula to estimate the selectivity (S) of the BETWEEN condition:

S = ABS( S1 + S2 - 1 )

CBO Parameters

This section contains some, but not all, of the parameters specific to the optimizer. The following sections may be especially useful when tuning Oracle Applications.

Parameters Affecting CBO Plans

The following parameters affect cost-based optimizer plans:

OPTIMIZER_FEATURES_ENABLED

Enables several optimizer features, depending on the user-specified value. For example, if OPTIMIZER_FEATURES_ENABLED=8.1.6, then ALL_ROWS or FIRST_ROWS is also used for the recursive SQL generated by PL/SQL procedures. Prior to release 8.1.6, only CHOOSE or RULE was used for such recursive SQL.

OPTIMIZER_MODE

This initialization parameter sets the mode of the optimizer at instance startup: RULE (use RBO), ALL_ROWS (use CBO for throughput), FIRST_ROWS (use CBO for response time), or CHOOSE (an optimizer choice based on the presence of statistics.
Set the OPTIMIZER_MODE parameter of the ALTER SESSION statement to change the value dynamically during a session.

OPTIMIZER_PERCENT_PARALLEL

Defines the amount of parallelism that the optimizer uses in its cost functions.

HASH_AREA_SIZE

Larger values can lower hash join costs, permitting Oracle to perform more hash joins.

SORT_AREA_SIZE

Larger values can lower sort costs, permitting Oracle to perform more sort merge joins.

DB_FILE_MULTIBLOCK_READ_COUNT

Larger values can lower table scan costs and make Oracle favor table scans over indexes.

In data warehousing applications, you often need to set the following parameters:

ALWAYS_ANTI_JOIN

Sets the type of antijoin that Oracle uses: NESTED_LOOPS, MERGE, or HASH.

HASH_JOIN_ENABLED

Enables or disables the hash join feature. This should always be set to true for data warehousing applications.

You rarely need to change the following parameters:

HASH_MULTIBLOCK_IO_COUNT

Larger value can lower hash join costs, permitting Oracle to perform more hash joins.

BITMAP_MERGE_AREA_SIZE

The size of the area used to merge the different bitmaps that match a range predicate. Larger size favors use of bitmap indexes for range predicates.

Parameters Affecting How the Optimizer Uses Indexes

The following two parameters address the optimizer's use of indexes for a wide range of statements, particularly nested-loop join statements in both OLTP and DSS applications.

OPTIMIZER_INDEX_COST_ADJ

Encourages the use of all indexes, regardless of their selectivity. It also applies to index use in general, rather than to just modeling index caching for nested loops join probes.

OPTIMIZER_INDEX_CACHING

Use this if the following two conditions exist:

Indexes Oracle could use for nested loops join probes are frequently cached in your environment.

The optimizer is not using nested loops joins aggressively enough.

In such an environment, this parameter has two advantages over OPTIMIZER_INDEX_COST_ADJ:
First, this parameter favors using selective indexes. If you use a relatively low value for this parameter, then the optimizer effectively models the caches of all non-leaf index blocks. In this case, the optimizer bases the cost of using this index primarily on the basis of its selectivity. Thus, by setting this to a low value, you achieve the desired modeling of the index caching without over-using possibly undesirable indexes that have poor selectivity.
Second, the effects of using this parameter are restricted to modeling the use of cached indexes for nested loops join probes. Thus, its use has fewer side effects.

Setting Initialization Parameters

To enable the CBO for Oracle Applications, you must set the following parameters:

OPTIMIZER_MODE=CHOOSE, FIRST_ROWS, or ALL_ROWS
OPTIMIZER_FEATURES_ENABLE=8.1.6
COMPATIBLE=8.1.6

You can set the following parameters to enable additional CBO-related features:

QUERY_REWRITE_ENABLED=TRUE
_COMPLEX_VIEW_MERGING=TRUE
_PUSH_JOIN_PREDICATE=TRUE

Verifying Initialization Parameters

To verify that the initialization parameters have been set correctly, execute the following statement against the dictionary's PARAMETER view:

SQL> SELECT NAME, VALUE
     FROM V$PARAMETER
     WHERE NAME LIKE 'optimizer%';

This returns the following typical data:

NAME                              VALUE
------------------------------    ---------------------
optimizer_features_enable         8.1.6
optimizer_mode                    CHOOSE
optimizer_max_permutations        80000
optimizer_index_cost_adj          100
optimizer_index_caching           0
optimizer_percent_parallel        0
optimizer_search_limit            5

Extensible Optimizer

The extensible optimizer is part of the CBO. It allows the authors of user-defined functions and domain indexes to control the three main components that the CBO uses to select an execution plan: statistics, selectivity, and cost evaluation.

The extensible optimizer lets you:

Associate cost function and default costs with domain indexes, indextypes, packages, and stand-alone functions.
Associate selectivity function and default selectivity with methods of object types, package functions, and stand-alone functions.
Associate statistics collection functions with domain indexes and columns of tables.
Order predicates with functions based on cost.
Select a user-defined access method (domain index) for a table based on access cost.
Use the ANALYZE statement to invoke user-defined statistics collection and deletion functions.
Use new data dictionary views to include information about the statistics collection, cost, or selectivity functions associated with columns, domain indexes, indextypes, or functions.
Add a hint to preserve the order of evaluation for function predicates.

See Also:
For details about the extensible optimizer, see Oracle8i Data Cartridge Developer's Guide.

User-Defined Statistics

You can define statistics collection functions for domain indexes, individual columns of a table, and user-defined datatypes.

Whenever a domain index is analyzed to gather statistics, Oracle calls the associated statistics collection function. Whenever a column of a table is analyzed, Oracle collects the standard statistics for that column and calls any associated statistics collection function. If a statistics collection function exists for a datatype, then Oracle calls it for each column that has that datatype in the table being analyzed.

User-Defined Selectivity

The selectivity of a predicate in a SQL statement is used to estimate the cost of a particular access method; it is also used to determine the optimal join order. The optimizer cannot compute an accurate selectivity for predicates that contain user-defined operators, because it does not have any information about these operators.

You can define selectivity functions for predicates containing user-defined operators, stand-alone functions, package functions, or type methods. The optimizer calls the user-defined selectivity function whenever it encounters a predicate that contains the operator, function, or method in one of the following relations with a constant: <, <=, =, >=, >, or LIKE.

User-Defined Costs

The optimizer cannot compute an accurate estimate of the cost of a domain index because it does not know the internal storage structure of the index. Also, the optimizer may underestimate the cost of a user-defined function that invokes PL/SQL, uses recursive SQL, accesses a BFILE, or is CPU-intensive.

You can define costs for domain indexes and user-defined stand-alone functions, 
package functions, and type methods. These user-defined costs can be in the form 
of default costs that the optimizer simply looks up or they can be full-fledged cost 
functions that the optimizer calls to compute the cost.

Rule-Based Optimizer (RBO)

Although Oracle supports the rule-based optimizer, you should design new applications to use the cost-based optimizer. You should also use the CBO for data warehousing applications, because the CBO supports enhanced features for DSS. Many new performance features, such as partitioned tables, improved star query processing, and materialized views, are only available with the CBO.

Note:
If you have developed OLTP applications using Oracle version 6, and if you have tuned your SQL statements carefully based on the rules of the optimizer, then you may want to continue using the RBO when you upgrade these applications to a new Oracle release.
If you are using applications provided by third-party vendors, then check with the vendors to determine which type of optimizer is best suited to that application.

If OPTIMIZER_MODE=CHOOSE, if statistics do not exist, and if you do not add hints to your SQL statements, then your statements use the RBO. You can use the RBO to access both relational data and object types. If OPTIMIZER_MODE=FIRST_ROWS or ALL_ROWS and no statistics exist, then the CBO uses default statistics. You should migrate your existing applications to use the cost-based approach.

You can enable the CBO on a trial basis simply by collecting statistics. You can then return to the RBO by deleting the statistics or by setting either the value of the OPTIMIZER_MODE initialization parameter or the OPTIMIZER_MODE clause of the ALTER SESSION statement to RULE. You can also use this value if you want to collect and examine statistics for your data without using the cost-based approach.

See Also:
For an explanation of how to gather statistics, see Chapter 8, "Gathering Statistics".

Access Paths for the RBO

Using the RBO, the optimizer chooses an execution plan based on the access paths available and the ranks of these access paths. Oracle's ranking of the access paths is heuristic. If there is more than one way to execute a SQL statement, then the RBO always uses the operation with the lower rank. Usually, operations of lower rank execute faster than those associated with constructs of higher rank.

The access paths and their ranking are listed below:

Path 1: Single Row by Rowid

Path 2: Single Row by Cluster Join

Path 3: Single Row by Hash Cluster Key with Unique or Primary Key

Path 4: Single Row by Unique or Primary Key

Path 5: Clustered Join

Path 6: Hash Cluster Key

Path 7: Indexed Cluster Key

Path 8: Composite Index

Path 9: Single-Column Indexes

Path 10: Bounded Range Search on Indexed Columns

Path 11: Unbounded Range Search on Indexed Columns

Path 12: Sort-Merge Join

Path 13: MAX or MIN of Indexed Column

Path 14: ORDER BY on Indexed Column

Path 15: Full Table Scan

Each of the following sections describes an access path, discusses when it is available, and shows the output generated for it by the EXPLAIN PLAN statement.

Path 1: Single Row by Rowid

This access path is available only if the statement's WHERE clause identifies the selected rows by rowid or with the CURRENT OF CURSOR embedded SQL syntax supported by the Oracle precompilers. To execute the statement, Oracle accesses the table by rowid.

Example:

SELECT * FROM emp WHERE ROWID = 'AAAA7bAA5AAAA1UAAA';

The EXPLAIN PLAN output for this statement might look like this:

OPERATION                 OPTIONS        OBJECT_NAME 
----------------------------------------------------- 
SELECT STATEMENT 
    TABLE ACCESS          BY ROWID       EMP

Path 2: Single Row by Cluster Join

This access path is available for statements that join tables stored in the same cluster if both of the following conditions are true:

The statement's WHERE clause contains conditions that equate each column of the cluster key in one table with the corresponding column in the other table.
The statement's WHERE clause also contains a condition that guarantees that the join returns only one row. Such a condition is likely to be an equality condition on the column(s) of a unique or primary key.

These conditions must be combined with AND operators. To execute the statement, Oracle performs a nested loops operation.

See Also:
For information on the nested loops operation, see "Nested Loops (NL) Join".

Example:

In the following statement, the emp and dept tables are clustered on the deptno column, and the empno column is the primary key of the emp table:

SELECT * 
  FROM emp, dept 
  WHERE emp.deptno = dept.deptno 
    AND emp.empno = 7900;

The EXPLAIN PLAN output for this statement might look like this:

OPERATION                 OPTIONS        OBJECT_NAME 
----------------------------------------------------- 
SELECT STATEMENT 
    NESTED LOOPS 
       TABLE ACCESS       BY ROWID       EMP 
       INDEX              UNIQUE SCAN    PK_EMP    
       TABLE ACCESS       CLUSTER        DEPT

Pk_emp is the name of an index that enforces the primary key.

Path 3: Single Row by Hash Cluster Key with Unique or Primary Key

This access path is available if both of the following conditions are true:

The statement's WHERE clause uses all columns of a hash cluster key in equality conditions. For composite cluster keys, the equality conditions must be combined with AND operators.
The statement is guaranteed to return only one row, because the columns that make up the hash cluster key also make up a unique or primary key.

To execute the statement, Oracle applies the cluster's hash function to the hash cluster key value specified in the statement to obtain a hash value. Oracle then uses the hash value to perform a hash scan on the table.

Example:

In the following statement, the orders and line_items tables are stored in a hash cluster, and the orderno column is both the cluster key and the primary key of the orders table:

SELECT * 
    FROM orders 
    WHERE orderno = 65118968;

The EXPLAIN PLAN output for this statement might look like this:

OPERATION                 OPTIONS        OBJECT_NAME 
----------------------------------------------------- 
SELECT STATEMENT 
    TABLE ACCESS          HASH           ORDERS

Path 4: Single Row by Unique or Primary Key

This access path is available if the statement's WHERE clause uses all columns of a unique or primary key in equality conditions. For composite keys, the equality conditions must be combined with AND operators. To execute the statement, Oracle performs a unique scan on the index on the unique or primary key to retrieve a single rowid, and then accesses the table by that rowid.

Example:

In the following statement, the empno column is the primary key of the emp table:

SELECT * 
    FROM emp 
    WHERE empno = 7900;

The EXPLAIN PLAN output for this statement might look like this:

OPERATION                 OPTIONS        OBJECT_NAME 
----------------------------------------------------- 
SELECT STATEMENT 
  TABLE ACCESS            BY ROWID       EMP 
    INDEX                 UNIQUE SCAN    PK_EMP

Pk_emp is the name of the index that enforces the primary key.

Path 5: Clustered Join

This access path is available for statements that join tables stored in the same cluster if the statement's WHERE clause contains conditions that equate each column of the cluster key in one table with the corresponding column in the other table. For a composite cluster key, the equality conditions must be combined with AND operators. To execute the statement, Oracle performs a nested loops operation.

See Also:
For information on nested loops operations, see "Nested Loops (NL) Join".

Example:

In the following statement, the emp and dept tables are clustered on the deptno column:

SELECT * 
    FROM emp, dept 
    WHERE emp.deptno = dept.deptno;

The EXPLAIN PLAN output for this statement might look like this:

OPERATION                 OPTIONS        OBJECT_NAME 
----------------------------------------------------- 
SELECT STATEMENT 
  NESTED LOOPS 
    TABLE ACCESS          FULL           DEPT 
    TABLE ACCESS          CLUSTER        EMP

Path 6: Hash Cluster Key

This access path is available if the statement's WHERE clause uses all the columns of a hash cluster key in equality conditions. For a composite cluster key, the equality conditions must be combined with AND operators. To execute the statement, Oracle applies the cluster's hash function to the hash cluster key value specified in the statement to obtain a hash value. Oracle then uses this hash value to perform a hash scan on the table.

Example:

In the following statement, the orders and line_items tables are stored in a hash cluster, and the orderno column is the cluster key:

SELECT * 
    FROM line_items 
    WHERE orderno = 65118968;

The EXPLAIN PLAN output for this statement might look like this:

OPERATION                 OPTIONS        OBJECT_NAME 
----------------------------------------------------- 
SELECT STATEMENT 
    TABLE ACCESS          HASH           LINE_ITEMS

Path 7: Indexed Cluster Key

This access path is available if the statement's WHERE clause uses all the columns of an indexed cluster key in equality conditions. For a composite cluster key, the equality conditions must be combined with AND operators.

To execute the statement, Oracle performs a unique scan on the cluster index to retrieve the rowid of one row with the specified cluster key value. Oracle then uses that rowid to access the table with a cluster scan. Because all rows with the same cluster key value are stored together, the cluster scan requires only a single rowid to find them all.

Example:

In the following statement, the emp table is stored in an indexed cluster, and the deptno column is the cluster key:

SELECT *  FROM emp 
  WHERE deptno = 10;

The EXPLAIN PLAN output for this statement might look like this:

OPERATION                 OPTIONS        OBJECT_NAME 
----------------------------------------------------- 
SELECT STATEMENT 
  TABLE ACCESS            CLUSTER        EMP 
    INDEX                 UNIQUE SCAN    PERS_INDEX

Pers_index is the name of the cluster index.

Path 8: Composite Index

This access path is available if the statement's WHERE clause uses all columns of a composite index in equality conditions combined with AND operators. To execute the statement, Oracle performs a range scan on the index to retrieve rowids of the selected rows, and then accesses the table by those rowids.

Example:

In the following statement, there is a composite index on the job and deptno columns:

SELECT * 
    FROM emp 
    WHERE job = 'CLERK' 
      AND deptno = 30;

The EXPLAIN PLAN output for this statement might look like this:

OPERATION                 OPTIONS        OBJECT_NAME 
----------------------------------------------------- 
SELECT STATEMENT 
  TABLE ACCESS            BY ROWID       EMP 
    INDEX                 RANGE SCAN     JOB_DEPTNO_INDEX

Job_deptno_index is the name of the composite index on the job and deptno columns.

Path 9: Single-Column Indexes

This access path is available if the statement's WHERE clause uses the columns of one or more single-column indexes in equality conditions. For multiple single-column indexes, the conditions must be combined with AND operators.

If the WHERE clause uses the column of only one index, then Oracle executes the statement by performing a range scan on the index to retrieve the rowids of the selected rows, and then accesses the table by these rowids.

Example 1:

In the following statement, there is an index on the job column of the emp table:

SELECT * 
    FROM emp 
    WHERE job = 'ANALYST';

The EXPLAIN PLAN output for this statement might look like this:

OPERATION                 OPTIONS        OBJECT_NAME 
----------------------------------------------------- 
SELECT STATEMENT 
  TABLE ACCESS            BY ROWID       EMP 
    INDEX                 RANGE SCAN     JOB_INDEX

Job_index is the index on emp.job.

If the WHERE clauses uses columns of many single-column indexes, then Oracle executes the statement by performing a range scan on each index to retrieve the rowids of the rows that satisfy each condition. Oracle then merges the sets of rowids to obtain a set of rowids of rows that satisfy all conditions. Oracle then accesses the table using these rowids.

Oracle can merge up to five indexes. If the WHERE clause uses columns of more than five single-column indexes, then Oracle merges five of them, accesses the table by rowid, and then tests the resulting rows to determine whether they satisfy the remaining conditions before returning them.

Example 2:

In the following statement, there are indexes on both the job and deptno columns of the emp table:

SELECT * 
    FROM emp 
    WHERE job = 'ANALYST' 
      AND deptno = 20;

The EXPLAIN PLAN output for this statement might look like this:

OPERATION                 OPTIONS        OBJECT_NAME 
----------------------------------------------------- 
SELECT STATEMENT 
  TABLE ACCESS            BY ROWID       EMP 
    AND-EQUAL 
      INDEX               RANGE SCAN     JOB_INDEX 
      INDEX               RANGE SCAN     DEPTNO_INDEX

The AND-EQUAL operation merges the rowids obtained by the scans of the job_index and the deptno_index, resulting in a set of rowids of rows that satisfy the query.

Path 10: Bounded Range Search on Indexed Columns

This access path is available if the statement's WHERE clause contains a condition that uses either the column of a single-column index or one or more columns that make up a leading portion of a composite index:

column = expr 

column >[=] expr AND column <[=] expr 

column BETWEEN expr AND expr 

column LIKE 'c%'

Each of these conditions specifies a bounded range of indexed values that are accessed by the statement. The range is said to be bounded because the conditions specify both its least value and its greatest value. To execute such a statement, Oracle performs a range scan on the index, and then accesses the table by rowid.

This access path is not available if the expression expr references the indexed column.

Example 1:

In the following statement, there is an index on the sal column of the emp table:

SELECT * 
    FROM emp 
    WHERE sal BETWEEN 2000 AND 3000;

The EXPLAIN PLAN output for this statement might look like this:

OPERATION                 OPTIONS        OBJECT_NAME 
----------------------------------------------------- 
SELECT STATEMENT 
  TABLE ACCESS            BY ROWID       EMP 
    INDEX                 RANGE SCAN     SAL_INDEX

Sal_index is the name of the index on emp.sal.

Example 2:

In the following statement, there is an index on the ename column of the emp table:

SELECT * 
    FROM emp 
    WHERE ename LIKE 'S%';

Path 11: Unbounded Range Search on Indexed Columns

This access path is available if the statement's WHERE clause contains one of the following conditions that use either the column of a single-column index or one or more columns of a leading portion of a composite index:

WHERE column >[=] expr 

WHERE column <[=] expr

Each of these conditions specifies an unbounded range of index values accessed by the statement. The range is said to be unbounded, because the condition specifies either its least value or its greatest value, but not both. To execute such a statement, Oracle performs a range scan on the index, and then accesses the table by rowid.

Example 1:

In the following statement, there is an index on the sal column of the emp table:

SELECT * 
    FROM emp 
    WHERE sal > 2000;

The EXPLAIN PLAN output for this statement might look like this:

OPERATION                 OPTIONS        OBJECT_NAME 
----------------------------------------------------- 
SELECT STATEMENT 
  TABLE ACCESS            BY ROWID       EMP 
    INDEX                 RANGE SCAN     SAL_INDEX

Example 2:

In the following statement, there is a composite index on the order and line columns of the line_items table:

SELECT * 
    FROM line_items 
    WHERE order > 65118968;

The access path is available, because the WHERE clause uses the order column, a leading portion of the index.

Example 3:

This access path is not available in the following statement, in which there is an index on the order and line columns:

SELECT * 
    FROM line_items 
    WHERE line < 4;

The access path is not available because the WHERE clause only uses the line column, which is not a leading portion of the index.

Path 12: Sort-Merge Join

This access path is available for statements that join tables that are not stored together in a cluster if the statement's WHERE clause uses columns from each table in equality conditions. To execute such a statement, Oracle uses a sort-merge operation. Oracle can also use a nested loops operation to execute a join statement.

See Also:
For information on these operations, see "Optimizing Join Statements".

Example:

In the following statement, the emp and dept tables are not stored in the same cluster:

SELECT * 
    FROM emp, dept 
    WHERE emp.deptno = dept.deptno;

The EXPLAIN PLAN output for this statement might look like this:

OPERATION                 OPTIONS        OBJECT_NAME 
----------------------------------------------------- 
SELECT STATEMENT 
  MERGE JOIN 
    SORT                  JOIN 
      TABLE ACCESS        FULL           EMP 
    SORT                  JOIN 
  TABLE ACCESS            FULL           DEPT

Path 13: MAX or MIN of Indexed Column

This access path is available for a SELECT statement, and all of the following conditions are true:

The query uses the MAX or MIN function to select the maximum or minimum value of either the column of a single-column index or the leading column of a composite index. The index cannot be a cluster index. The argument to the MAX or MIN function can be any expression involving the column, a constant, or the addition operator (+), the concatenation operation (||), or the CONCAT function.
There are no other expressions in the select list.
The statement has no WHERE clause or GROUP BY clause.

To execute the query, Oracle performs a range scan of the index to find the maximum or minimum indexed value. Because only this value is selected, Oracle need not access the table after scanning the index.

Example:

In the following statement, there is an index on the sal column of the emp table:

SELECT MAX(sal) FROM emp;

The EXPLAIN PLAN output for this statement might look like this:

OPERATION                 OPTIONS        OBJECT_NAME 
----------------------------------------------------- 
SELECT STATEMENT 
  AGGREGATE               GROUP BY 
    INDEX                 RANGE SCAN     SAL_INDEX

Path 14: ORDER BY on Indexed Column

This access path is available for a SELECT statement, and all of the following conditions are true:

The query contains an ORDER BY clause that uses either the column of a single-column index or a leading portion of a composite index. The index cannot be a cluster index.
There is a PRIMARY KEY or NOT NULL integrity constraint that guarantees that at least one of the indexed columns listed in the ORDER BY clause contains no nulls.
The NLS_SORT parameter is set to BINARY.

To execute the query, Oracle performs a range scan of the index to retrieve the rowids of the selected rows in sorted order. Oracle then accesses the table by these rowids.

Example:

In the following statement, there is a primary key on the empno column of the emp table:

SELECT * 
    FROM emp 
    ORDER BY empno;

The EXPLAIN PLAN output for this statement might look like this:

OPERATION                 OPTIONS        OBJECT_NAME 
----------------------------------------------------- 
SELECT STATEMENT 
  TABLE ACCESS            BY ROWID       EMP 
    INDEX                 RANGE SCAN     PK_EMP

Pk_emp is the name of the index that enforces the primary key. The primary key ensures that the column does not contain nulls.

Path 15: Full Table Scan

This access path is available for any SQL statement, regardless of its WHERE clause conditions, except when its FROM clause contains SAMPLE or SAMPLE BLOCK.

Note that the full table scan is the lowest ranked access path on the list. This means that the RBO always chooses an access path that uses an index if one is available, even if a full table scan might execute faster.

The following conditions make index access paths unavailable:

column1 > column2
column1 < column2
column1 >= column2
column1 <= column2

where column1 and column2 are in the same table.

column IS NULL
column IS NOT NULL
column NOT IN
column != expr
column LIKE '%pattern'

regardless of whether column is indexed.

expr = expr2

where expr is an expression that operates on a column with an operator or function, regardless of whether the column is indexed.

NOT EXISTS subquery
ROWNUM pseudocolumn in a view
Any condition involving a column that is not indexed

Any SQL statement that contains only these constructs and no others that make index access paths available must use full table scans.

Example:

The following statement uses a full table scan to access the emp table:

SELECT * 
    FROM emp;

The EXPLAIN PLAN output for this statement might look like this:

OPERATION                 OPTIONS        OBJECT_NAME 
----------------------------------------------------- 
SELECT STATEMENT 
  TABLE ACCESS            FULL           EMP

Overview of Optimizer Operations

This section describes the types of SQL statements that can be optimized and summarizes the operations performed by the optimizer.

Types of SQL Statements

Oracle optimizes the following types of SQL statements:

Simple statement

An INSERT, UPDATE, DELETE, or SELECT statement that involves only a single table.

Simple query

Another name for a SELECT statement.

Join

A query that selects data from more than one table. A join is characterized by multiple tables in the FROM clause. Oracle pairs the rows from these tables using the condition specified in the WHERE clause and returns the resulting rows. This condition is called the join condition and usually compares columns of all the joined tables.

Equijoin

A join condition containing an equality operator.

Non-equijoin

A join condition containing something other than an equality operator.

Outer join

A join condition using the outer join operator (+) with one or more columns of one of the tables. Oracle returns all rows that meet the join condition. Oracle also returns all rows from the table without the outer join operator for which there are no matching rows in the table with the outer join operator.

Cartesian product

A join with no join condition results in a Cartesian product, or a cross product. A Cartesian product is the set of all possible combinations of rows drawn one from each table. In other words, for a join of two tables, each row in one table is matched in turn with every row in the other. A Cartesian product for more than two tables is the result of pairing each row of one table with every row of the Cartesian product of the remaining tables.
All other kinds of joins are subsets of Cartesian products effectively created by deriving the Cartesian product and then excluding rows that fail the join condition.

Complex statement

An INSERT, UPDATE, DELETE, or SELECT statement that contains a subquery, which is a form of the SELECT statement within another statement that produces a set of values for further processing within the statement. The outer portion of the complex statement that contains a subquery is called the parent statement.

Compound query

A query that uses set operators (UNION, UNION ALL, INTERSECT, or MINUS) to combine two or more simple or complex statements. Each simple or complex statement in a compound query is called a component query.

Statement accessing views

Simple, join, complex, or compound statement that accesses one or more views as well as tables.

Distributed statement

A statement that accesses data on two or more distinct nodes of a distributed database. A remote statement accesses data on one remote node of a distributed database.

Optimizer Operations

For any SQL statement processed by Oracle, the optimizer does the following:

1

Evaluation of expressions and conditions

The optimizer first evaluates expressions and conditions containing constants as fully as possible. (See "Evaluation of Expressions and Conditions".)

2

Statement transformation

For complex statements involving, for example, correlated subqueries, the optimizer may transform the original statement into an equivalent join statement. (See "Transforming and Optimizing Statements".)

3

View merging

For SQL statements that access a view, the optimizer often merges the query in the statement with that in the view, and then optimizes the result. (See "Optimizing Statements That Access Views".)

4

Choice of optimizer approaches

The optimizer chooses either a cost-based or rule-based approach and determines the goal of optimization. (See "Optimizing Joins".)

5

Choice of access paths

For each table accessed by the statement, the optimizer chooses one or more of the available access paths to obtain the table's data. (See "Access Paths for the CBO".)

6

Choice of join orders

For a join statement that joins more than two tables, the optimizer chooses which pair of tables is joined first, and then which table is joined to the result, and so on.

7

Choice of join operations

For any join statement, the optimizer chooses an operation to use to perform the join.

Optimizing Joins

This section discusses how the Oracle optimizer executes SQL statements that contain joins, anti-joins, and semi-joins. It also describes how the optimizer can use bitmap indexes to execute star queries, which join a fact table to multiple dimension tables.

Optimizing Join Statements

To choose an execution plan for a join statement, the optimizer must make these interrelated decisions:

Access Paths

As for simple statements, the optimizer must choose an access path to retrieve data from each table in the join statement. (see "Access Paths for the RBO" and "Access Paths for the CBO".)

Join Operations

To join each pair of row sources, Oracle must perform one of these operations:

Nested Loops (NL) Join

Sort-Merge Join

Hash Join (not available with the RBO)

Cluster Join

Join Order

To execute a statement that joins more than two tables, Oracle joins two of the tables, and then joins the resulting row source to the next table. This process is continued until all tables are joined into the result.

Join Operations

Nested Loops (NL) Join

To perform a nested loops join, Oracle performs the following steps:

The optimizer chooses one of the tables as the outer table, or the driving table. The other table is called the inner table.
For each row in the outer table, Oracle finds all rows in the inner table that satisfy the join condition.
Oracle combines the data in each pair of rows that satisfy the join condition and returns the resulting rows.

For example, consider table A and B. Each row of B is joined back to A.

For rows 1, 2, 3, .....n-1, n in B, each row in B is joined to each row in A

For rows 1, 2, 3, ..... n-1, n in A

Total selectivity = selectivity (A) * selectivity (B)

Figure 4-4 shows the execution plan for the following statement using a nested loops join:

SELECT * 
  FROM emp, dept 
  WHERE emp.deptno = dept.deptno;

Figure 4-4 Nested Loops Join

To execute this statement, Oracle performs the following steps:

Step 2 accesses the outer table (emp) with a full table scan.
For each row returned by step 2, step 4 uses the emp.deptno value to perform a unique scan on the pk_dept index.
Step 3 uses the rowid from step 4 to locate the matching row in the inner table (dept).
Oracle combines each row returned by step 2 with the matching row returned by step 4 and returns the result.

Sort-Merge Join

Oracle can only perform a sort-merge join for an equijoin. To perform a sort-merge join, Oracle performs the following steps:

Oracle sorts each row source to be joined if they have not been sorted already by a previous operation. The rows are sorted on the values of the columns used in the join condition.
Oracle merges the two sources so that each pair of rows, one from each source, that contain matching values for the columns used in the join condition are combined and returned as the resulting row source.

Figure 4-5 shows the execution plan for this statement using a sort-merge join:

SELECT * 
  FROM emp, dept 
  WHERE emp.deptno = dept.deptno;

Figure 4-5 Sort-Merge Join

To execute this statement, Oracle performs the following steps:

Steps 3 and 5 perform full table scans of the emp and dept tables.
Steps 2 and 4 sort each row source separately.
Step 1 merges the sources from steps 2 and 4 together, combining each row from step 2 with each matching row from step 4, and returns the resulting row source.

Example 2

All relevant table A rows are fetched, sorted, and placed in a sort area. The resulting data is:

Table A
1
5
8
11

All relevant table B rows are fetched, sorted, and placed in a sort area. The resulting data is:

Table B
2
4
5
7

A merge is then performed using a merge join algorithm to produce the resulting data:

Merged Data from A and B
1
2
4
5
7
8
11

Hash Join

Oracle can only perform a hash join for an equijoin. Hash join is not available with the RBO. You must enable hash join optimization, using the initialization parameter HASH_JOIN_ENABLED (which can be set with the ALTER SESSION statement) or the USE_HASH hint.

To perform a hash join, Oracle performs the following steps:

Oracle performs a full table scan on each of the tables and splits each into as many partitions as possible based on the available memory.
Oracle builds a hash table from one of the partitions (if possible, Oracle selects a partition that fits into available memory). Oracle then uses the corresponding partition in the other table to probe the hash table. All partition pairs that do not fit into memory are placed onto disk.
For each pair of partitions (one from each table), Oracle uses the smaller one to build a hash table and the larger one to probe the hash table.

Figure 4-6 shows the execution plan for this statement using a hash join:

SELECT * 
  FROM emp, dept 
  WHERE emp.deptno = dept.deptno;

Figure 4-6 Hash Join

To execute this statement, Oracle performs the following steps:

Steps 2 and 3 perform full table scans of the emp and dept tables.
Step 1 builds a hash table out of the rows coming from step 2 and probes it with each row coming from step 3.

The initialization parameter HASH_AREA_SIZE controls the amount of memory used for hash join operations and the initialization parameter HASH_MULTIBLOCK_IO_COUNT controls the number of blocks a hash join operation should read and write concurrently.

See Also:
For more information about the USE_HASH hint, see Chapter 7, "Using Optimizer Hints".

Example 2

Consider a hash join of table A and B, where table B is the inner table. If the column value of NUM_DISTINCT data from the DBA_TAB_COLUMN dictionary table is small, then this implies that most of the rows have the same column value.

For example, the table emp has a gender column with two distinct values: male and female. It is assumed that queries on the gender column have a selectivity of one divided by two, or 50%. This means that half of the table rows are fetched. In this particular case, a hash join is most efficient.

See Also:
For more information, see "Verifying Column Statistics" in Chapter 8, "Gathering Statistics".

Note:
The optimizer can use either a full compute (e.g., full table scan) or it can estimate by using a sample of the data. The problem with estimation based on data sampling is that the sample rows selected from data blocks could all be skewed. If there are 8 million rows in the table, then the optimizer may only consider a random subset and generate statistics based on that subset.

Cluster Join

Oracle can perform a cluster join only for an equijoin that equates the cluster key columns of two tables in the same cluster. In a cluster, rows from both tables with the same cluster key values are stored in the same blocks, so Oracle only accesses those blocks.

Figure 4-7 shows the execution plan for this statement in which the emp and dept tables are stored together in the same cluster:

SELECT * 
  FROM emp, dept 
  WHERE emp.deptno = dept.deptno;

Figure 4-7 Cluster Join

To execute this statement, Oracle performs the following steps:

Step 2 accesses the outer table (dept) with a full table scan.
For each row returned by step 2, step 3 uses the dept.deptno value to find the matching rows in the inner table (emp) with a cluster scan.

A cluster join is nothing more than a nested loops join involving two tables that are stored together in a cluster. Because each row from the dept table is stored in the same data blocks as the matching rows in the emp table, Oracle can access matching rows most efficiently.

How the Optimizer Chooses the Join Method

The optimizer costs each join method and chooses the method with the least cost. If a join returns many rows, then the optimizer considers the following three factors:

A nested loops join (NL) is inefficient when a join returns a large number of rows [typically, more than 10,000 rows is considered large], and the optimizer may choose not to use it.

The cost of a nested loops join = access cost of A + (access cost of B * number of rows from A)
If you are using the RBO, then a merge join is the most efficient join when a join returns a large number or rows.

The cost of a merge join = access cost of A + access cost of B + (sort cost of A + sort cost of B)

An exception is when the data is pre-sorted. In the pre-sorted case, merge join costs = access cost of A + access cost of B where ( sort cost of A + sort cost of B) = 0.
If you are using the CBO, then a hash join is the most efficient join when a join returns a large number or rows.

Estimated costs to perform a hash join = (access cost of A * number of hash partitions of B) + access cost of B

Forcing the Join Order

The following example illustrates the use of the ORDERED hint, which specifies the join order that the optimizer should use when joining tables. The ORDERED hint causes the join order to proceed in the order that the tables are listed in the FROM clause. In this example, the optimizer will start with the table jl_br_journals first, followed by jl_br_balances, followed by gl_code_combinations, etc. When using the ORDERED hint, it is important that the tables in the FROM clause are listed in the correct order, so as to prevent Cartesian joins.

SELECT /*+ ORDERED */
      glcc.segment1||' '||glcc.segment2||' '||glcc.segment3||' '
        ||glcc.segment4||' ' ||glcc.segment5 account,
 glcc.code_combination_id ccid, REPLACE(SUBSTR(glf.description,1,40),'.',' '),   
 b.application_id, b.set_of_books_id, b.personnel_id, p.vendor_id 
FROM  jl_br_journals j,
      jl_br_balances b,
      gl_code_combinations glcc,
      fnd_flex_values_vl glf,
      gl_periods gp,
      gl_sets_of_books gsb,
      po_vendors p
WHERE  j.application_id = b.application_id(+) AND
       j.set_of_books_id = b.set_of_books_id(+) AND
j.code_combination_id = b.code_combination_id(+) AND
       j.personnel_id = b.personnel_id(+) AND j.period_name = b.period_name(+) AND
       j.code_combination_id= glcc.code_combination_id AND j.period_name = gp.period_name AND
       j.set_of_books_id = gsb.set_of_books_id AND gp.period_set_name = gsb.period_set_name AND
       glcc.segment1 || '' = '01' AND glf.flex_value_set_id||'' = :c_account_vs AND
       glcc.segment3 = glf.flex_value AND gp.start_date = add_months('01-SEP-98',-1) AND
       gp.period_set_name = gsb.period_set_name AND j.application_id = 200 AND
       j.set_of_books_id = 225 AND j.personnel_id = p.vendor_id
GROUP BY glcc.segment1||' '||glcc.segment2||' '||glcc.segment3||
         ' '||glcc.segment4||' '||glcc.segment5,
         glcc.code_combination_id, REPLACE(SUBSTR(glf.description,1,40),'.',' '),
         b.application_id, b.set_of_books_id, b.personnel_id, p.vendor_id

Cost=13 SELECT STATEMENT
Cost=13   SORT GROUP BY
Cost=11     NESTED LOOPS
Cost=10       NESTED LOOPS
Cost=9         NESTED LOOPS
Cost=7           NESTED LOOPS
Cost=6             NESTED LOOPS
Cost=3               NESTED LOOPS
Cost=2                 NESTED LOOPS OUTER
Cost=1                   TABLE ACCESS BY INDEX ROWID JL_BR_JOURNALS_ALL
Cost=2                     INDEX RANGE SCAN JL_BR_JOURNALS_U1:
Cost=1                   TABLE ACCESS FULL JL_BR_BALANCES_ALL
Cost=1                 TABLE ACCESS BY INDEX ROWID GL_CODE_COMBINATIONS
Cost=                   INDEX UNIQUE SCAN GL_CODE_COMBINATIONS_U1:
Cost=3               TABLE ACCESS BY INDEX ROWID FND_FLEX_VALUES
Cost=2                 INDEX RANGE SCAN FND_FLEX_VALUES_N1:
Cost=1             TABLE ACCESS BY INDEX ROWID FND_FLEX_VALUES_TL
Cost=               INDEX UNIQUE SCAN FND_FLEX_VALUES_TL_U1:
Cost=2           TABLE ACCESS BY INDEX ROWID GL_PERIODS
Cost=1             INDEX RANGE SCAN GL_PERIODS_N1:
Cost=1         TABLE ACCESS BY INDEX ROWID GL_SETS_OF_BOOKS
Cost=           INDEX UNIQUE SCAN GL_SETS_OF_BOOKS_U2:
Cost=1       TABLE ACCESS BY INDEX ROWID PO_VENDORS
Cost=         INDEX UNIQUE SCAN PO_VENDORS_U1:

Choosing Execution Plans for Join Statements

This section describes how the optimizer chooses an execution plan for a join statement:

The following considerations apply to both the cost-based and rule-based approaches:

The optimizer first determines whether joining two or more of the tables definitely results in a row source containing at most one row. The optimizer recognizes such situations based on UNIQUE and PRIMARY KEY constraints on the tables. If such a situation exists, then the optimizer places these tables first in the join order. The optimizer then optimizes the join of the remaining set of tables.
For join statements with outer join conditions, the table with the outer join operator must come after the other table in the condition in the join order. The optimizer does not consider join orders that violate this rule.

Choosing Execution Plans for Joins with the CBO

With the CBO, the optimizer generates a set of execution plans based on the possible join orders, join operations, and available access paths. The optimizer then estimates the cost of each plan and chooses the one with the lowest cost. The optimizer estimates costs in these ways:

The cost of a nested loops operation is based on the cost of reading each selected row of the outer table and each of its matching rows of the inner table into memory. The optimizer estimates these costs using the statistics in the data dictionary.
The cost of a sort-merge join is based largely on the cost of reading all the sources into memory and sorting them.
The optimizer also considers other factors when determining the cost of each operation. For example:
- A smaller sort area size is likely to increase the cost for a sort-merge join because sorting takes more CPU time and I/O in a smaller sort area. Sort area size is specified by the initialization parameter SORT_AREA_SIZE.
- A larger multiblock read count is likely to decrease the cost for a sort-merge join in relation to a nested loops join. If a large number of sequential blocks can be read from disk in a single I/O, then an index on the inner table for the nested loops join is less likely to improve performance over a full table scan. The multiblock read count is specified by the initialization parameter DB_FILE_MULTIBLOCK_READ_COUNT.
- For join statements with outer join conditions, the table with the outer join operator must come after the other table in the condition in the join order. The optimizer does not consider join orders that violate this rule.

With the CBO, the optimizer's choice of join orders can be overridden with the ORDERED hint. If the ORDERED hint specifies a join order that violates the rule for outer join, then the optimizer ignores the hint and chooses the order. You can also override the optimizer's choice of join operations with hints.

See Also:
For more information on using hints, see Chapter 7, "Using Optimizer Hints".

Choosing Execution Plans for Joins with the RBO

With the rule-based approach, the optimizer performs the following steps to choose an execution plan for a statement that joins R tables:

The optimizer generates a set of R join orders, each with a different table as the first table. The optimizer generates each potential join order using this algorithm:
1. To fill each position in the join order, the optimizer chooses the table with the most highly ranked available access path according to the ranks for access paths described in Chapter 4, "The Optimizer". The optimizer repeats this step to fill each subsequent position in the join order.
2. For each table in the join order, the optimizer also chooses the operation with which to join the table to the previous table or row source in the order. The optimizer does this by "ranking" the sort-merge operation as access path 12 and applying these rules:
- If the access path for the chosen table is ranked 11 or better, then the optimizer chooses a nested loops operation using the previous table or row source in the join order as the outer table.
- If the access path for the table is ranked lower than 12, and if there is an equijoin condition between the chosen table and the previous table or row source in join order, then the optimizer chooses a sort-merge operation.
- If the access path for the chosen table is ranked lower than 12, and if there is not an equijoin condition, then the optimizer chooses a nested loops operation with the previous table or row source in the join order as the outer table.
The optimizer then chooses among the resulting set of execution plans. The goal of the optimizer's choice is to maximize the number of nested loops join operations in which the inner table is accessed using an index scan. Because a nested loops join involves accessing the inner table many times, an index on the inner table can greatly improve the performance of a nested loops join.

Usually, the optimizer does not consider the order in which tables appear in the FROM clause when choosing an execution plan. The optimizer makes this choice by applying the following rules in order:
1. The optimizer chooses the execution plan with the fewest nested-loops operations in which the inner table is accessed with a full table scan.
2. If there is a tie, then the optimizer chooses the execution plan with the fewest sort-merge operations.
3. If there is still a tie, then the optimizer chooses the execution plan for which the first table in the join order has the most highly ranked access path:
- If there is a tie among multiple plans whose first tables are accessed by the single-column indexes access path, then the optimizer chooses the plan whose first table is accessed with the most merged indexes.
- If there is a tie among multiple plans whose first tables are accessed by bounded range scans, then the optimizer chooses the plan whose first table is accessed with the greatest number of leading columns of the composite index.
- If there is still a tie, then the optimizer chooses the execution plan for which the first table appears later in the query's FROM clause.

Optimizing Anti-Joins and Semi-Joins

An anti-join returns rows from the left side of the predicate for which there is no corresponding row on the right side of the predicate. That is, it returns rows that fail to match (NOT IN) the subquery on the right side. For example, an anti-join can select a list of employees who are not in a particular set of departments:

SELECT * FROM emp 
  WHERE deptno NOT IN 
    (SELECT deptno FROM dept 
      WHERE loc = 'HEADQUARTERS');

The optimizer uses a nested loops algorithm for NOT IN subqueries by default, unless the initialization parameter ALWAYS_ANTI_JOIN is set to MERGE or HASH and various required conditions are met that allow the transformation of the NOT IN subquery into a sort-merge or hash anti-join. You can place a MERGE_AJ or HASH_AJ hint in the NOT IN subquery to specify which algorithm the optimizer should use.

A semi-join returns rows that match an EXISTS subquery, without duplicating rows from the left side of the predicate when multiple rows on the right side satisfy the criteria of the subquery. For example:

SELECT * FROM dept 
  WHERE EXISTS 
    (SELECT * FROM emp 
      WHERE dept.ename = emp.ename 
        AND emp.bonus > 5000);

In this query, only one row needs to be returned from dept even though many rows in emp might match the subquery. If there is no index on the bonus column in emp, then a semi-join can be used to improve query performance.

The optimizer uses a nested loops algorithm for EXISTS subqueries by default, unless the initialization parameter ALWAYS_SEMI_JOIN is set to MERGE or HASH and various required conditions are met. You can place a MERGE_SJ or HASH_SJ hint in the EXISTS subquery to specify which algorithm the optimizer should use.

See Also:
For information about optimizer hints, see Chapter 7, "Using Optimizer Hints".

Optimizing Star Queries

One type of data warehouse design centers around what is known as a star schema, which is characterized by one or more very large fact tables that contain the primary information in the data warehouse and a number of much smaller dimension tables (or lookup tables), each of which contains information about the entries for a particular attribute in the fact table.

A star query is a join between a fact table and a number of lookup tables. Each lookup table is joined to the fact table using a primary-key to foreign-key join, but the lookup tables are not joined to each other.

The CBO recognizes star queries and generates efficient execution plans for them. (Star queries are not recognized by the RBO.)

A typical fact table contains keys and measures. For example, a simple fact table might contain the measure Sales, and keys Time, Product, and Market. In this case there would be corresponding dimension tables for Time, Product, and Market. The Product dimension table, for example, would typically contain information about each product number that appears in the fact table.

A star join is a primary-key to foreign-key join of the dimension tables to a fact table. The fact table normally has a concatenated index on the key columns to facilitate this type of join or a separate bitmap index on each key column.

See Also:
For more information about tuning star queries, see Oracle8i Data Warehousing Guide.

Optimizing Statements that Use Common Subexpressions

Common subexpression elimination is an optimization heuristic that identifies, removes, and collects common subexpression from disjunctive (i.e., OR) branches of a query. In most cases, it results in the reduction of the number of joins that would be performed.

Common subexpression elimination is enabled with initialization parameter OPTIMIZER_FEATURES_ENABLE or by setting the _ELIMINATE_COMMON_SUBEXPR parameter to TRUE.

A query is considered valid for common sub-expression elimination if its WHERE clause is in following form:

The top-level must be a disjunction; that is, a list of ORed logs.
Each disjunct must be either a simple predicate or a conjunction; that is, a list of ANDed logs.
Each conjunct must be either a simple predicate or a disjunction of simple predicates. (A predicate is considered simple if it does not contain AND or OR.)
An expression is considered common if it appears in all the disjunctive branches of the query.

Examples of Common Subexpression Elimination

The following query finds names of employees who work in a department located in L.A. and who make more than 40K or who are accountants.

SELECT emp.ename
FROM emp E, dept D
WHERE (D.deptno = E.deptno AND E.position = 'Accountant' AND D.location ='L.A.')

OR
E.deptno = D.deptno AND E.sal > 40000 AND D.location = 'L.A.');

The following query contains common subexpressions in its two disjunctive branches. The elimination of the common subexpressions transforms this query into the following query, thereby reducing the number of joins from two to one.

SELECT emp.ename FROM emp E, dept D
WHERE (D.deptno = E.deptno AND D.location = 'L.A.') 

AND (E.position = 'Accountant' OR E.sal > 40000);

The following query contains common subexpression in its three disjunctive branches:

SELECT SUM (l_extendedprice* (1 - l_discount))
FROM PARTS, LINEITEM
WHERE (p_partkey = l_partkey

AND p_brand = 'Brand#12'
AND p_container IN ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
AND l_quantity >= 1 AND l_quantity <= 1 + 10
AND p_size >= 1 AND p_size <= 5
AND l_shipmode IN ('AIR', 'REG AIR')
AND l_shipinstruct = 'DELIVER IN PERSON')


OR (l_partkey = p_partkey)


AND p_brand = 'Brand#23'
AND p_container IN ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
AND l_quantity >= 10 AND l_quantity <= 10 + 10
AND p_size >= 1 AND p_size <= 10 AND p_size BETWEEN 1 AND 10
AND l_shipmode IN ('AIR', 'REG AIR')
AND l_shipinstruct = 'DELIVER IN PERSON')


OR (p_partkey = l_partkey


AND p_brand = 'Brand#34'
AND p_container IN ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
AND l_quantity >= 20 AND l_quantity <= 20 + 10
AND p_size >= 1 AND p_size <= 15
AND l_shipmode IN ('AIR', 'REG AIR')
AND l_shipinstruct = 'DELIVER IN PERSON');

The above query is transformed by common subexpression elimination as the following, thereby reducing the number joins from three down to one.

SELECT SUM (l_extendedprice* (1 - l_discount))
FROM PARTS, LINEITEM
WHERE (p_partkey = l_partkey  /* these are the four common subexpressions */

AND p_size >= 1
AND l_shipmode IN ('AIR', 'REG AIR')
AND l_shipinstruct = 'DELIVER IN PERSON')
AND


((p_brand = 'Brand#12'
AND p_container IN  ( 'SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
AND l_quantity >= 1 AND l_quantity <= 1 + 10
AND p_size <= 5)


OR (p_brand = 'Brand#23'


AND p_container IN ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
AND l_quantity >= 10 AND l_quantity <= 10 + 10
AND p_size <= 10)


OR (p_brand = 'Brand#34'


AND p_container IN ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
AND l_quantity >= 20 AND l_quantity <= 20 + 10
AND p_size <= 15));

Evaluation of Expressions and Conditions

The optimizer fully evaluates expressions whenever possible and translates certain syntactic constructs into equivalent constructs. The reason for this is either that Oracle can more quickly evaluate the resulting expression than the original expression, or that the original expression is merely a syntactic equivalent of the resulting expression. Different SQL constructs can sometimes operate identically (for example, = ANY (subquery) and IN (subquery)); Oracle maps these to a single construct.

This section discusses how the optimizer evaluates expressions and conditions that contain the following:

Constants

Computation of constants is performed only once, when the statement is optimized, rather than each time the statement is executed.

For example, the following conditions test for monthly salaries greater than 2000:

sal > 24000/12 

sal > 2000 

sal*12 > 24000

If a SQL statement contains the first condition, then the optimizer simplifies it into the second condition.

Note:
The optimizer does not simplify expressions across comparison operators: in the examples above, the optimizer does not simplify the third expression into the second. For this reason, application developers should write conditions that compare columns with constants whenever possible, rather than conditions with expressions involving columns.

LIKE Operator

The optimizer simplifies conditions that use the LIKE comparison operator to compare an expression with no wildcard characters into an equivalent condition that uses an equality operator instead. For example, the optimizer simplifies the first condition below into the second:

ename LIKE 'SMITH' 

ename = 'SMITH'

The optimizer can simplify these expressions only when the comparison involves variable-length datatypes. For example, if ename was of type CHAR(10), then the optimizer cannot transform the LIKE operation into an equality operation due to the equality operator following blank-padded semantics and LIKE not following blank-padded semantics.

IN Operator

The optimizer expands a condition that uses the IN comparison operator to an equivalent condition that uses equality comparison operators and OR logical operators. For example, the optimizer expands the first condition below into the second:

ename IN ('SMITH', 'KING', 'JONES') 

ename = 'SMITH' OR ename = 'KING' OR ename = 'JONES'

See Also:
For more information, see "Example 2: IN Subquery".

ANY or SOME Operator

The optimizer expands a condition that uses the ANY or SOME comparison operator followed by a parenthesized list of values into an equivalent condition that uses equality comparison operators and OR logical operators. For example, the optimizer expands the first condition below into the second:

sal > ANY (:first_sal, :second_sal) 

sal > :first_sal OR sal > :second_sal

The optimizer transforms a condition that uses the ANY or SOME operator followed by a subquery into a condition containing the EXISTS operator and a correlated subquery. For example, the optimizer transforms the first condition below into the second:

x > ANY (SELECT sal 
    FROM emp 
    WHERE job = 'ANALYST') 

EXISTS (SELECT sal 
    FROM emp 
    WHERE job = 'ANALYST'  
      AND x > sal)

ALL Operator

The optimizer expands a condition that uses the ALL comparison operator followed by a parenthesized list of values into an equivalent condition that uses equality comparison operators and AND logical operators. For example, the optimizer expands the first condition below into the second:

sal > ALL (:first_sal, :second_sal) 

sal > :first_sal AND sal > :second_sal

The optimizer transforms a condition that uses the ALL comparison operator followed by a subquery into an equivalent condition that uses the ANY comparison operator and a complementary comparison operator. For example, the optimizer transforms the first condition below into the second:

x > ALL (SELECT sal 
    FROM emp 
    WHERE deptno = 10) 

NOT (x <= ANY (SELECT sal 
      FROM emp 
      WHERE deptno = 10) )

The optimizer then transforms the second query into the following query using the rule for transforming conditions with the ANY comparison operator followed by a correlated subquery:

NOT EXISTS (SELECT sal 
      FROM emp 
      WHERE deptno = 10 
        AND x <= sal)

BETWEEN Operator

The optimizer always replaces a condition that uses the BETWEEN comparison operator with an equivalent condition that uses the >= and <= comparison operators. For example, the optimizer replaces the first condition below with the second:

sal BETWEEN 2000 AND 3000 

sal >= 2000 AND sal <= 3000

NOT Operator

The optimizer simplifies a condition to eliminate the NOT logical operator. The simplification involves removing the NOT logical operator and replacing a comparison operator with its opposite comparison operator. For example, the optimizer simplifies the first condition below into the second one:

NOT deptno = (SELECT deptno FROM emp WHERE ename = 'TAYLOR') 

deptno <> (SELECT deptno FROM emp WHERE ename = 'TAYLOR')

Often, a condition containing the NOT logical operator can be written many different ways. The optimizer attempts to transform such a condition so that the subconditions negated by NOTs are as simple as possible, even if the resulting condition contains more NOTs. For example, the optimizer simplifies the first condition below into the second, and then into the third.

NOT (sal < 1000 OR comm IS NULL) 
NOT sal < 1000 AND comm IS NOT NULL 
sal >= 1000 AND comm IS NOT NULL

Transitivity

If two conditions in the WHERE clause involve a common column, then the optimizer can sometimes infer a third condition using the transitivity principle. The optimizer can then use the inferred condition to optimize the statement. The inferred condition could potentially make available an index access path that was not made available by the original conditions.

Note:
Transitivity is used only by the CBO.

Imagine a WHERE clause containing two conditions of these forms:

WHERE column1 comp_oper constant 
  AND column1 = column2

In this case, the optimizer infers the condition:

column2 comp_oper constant

where:

comp_oper

Any of the comparison operators =, !=, ^=, <, <>, >, <=, or >=.

constant

Any constant expression involving operators, SQL functions, literals, bind variables, and correlation variables.

Example:

In the following query, the WHERE clause contains two conditions, each of which uses the emp.deptno column:

SELECT * 
  FROM emp, dept 
  WHERE emp.deptno = 20 
    AND emp.deptno = dept.deptno;

Using transitivity, the optimizer infers this condition:

dept.deptno = 20

If an index exists on the dept.deptno column, then this condition makes available access paths using that index.

Note:
The optimizer only infers conditions that relate columns to constant expressions, rather than columns to other columns. Imagine a WHERE clause containing two conditions of these forms:
WHERE column1 comp_oper column3 AND column1 = column2

In this case, the optimizer does not infer this condition:
column2 comp_oper column3

DETERMINISTIC Functions

In some cases, the optimizer can use a previously calculated value, rather than executing a user-written function. This is only safe for functions that behave in a restricted manner. The function must always return the same output return value for any given set of input argument values.

The function's result must not differ because of differences in the content of package variables or the database, or session parameters such as the NLS parameters. Furthermore, if the function is redefined in the future, then its output return value must still be the same as that calculated with the prior definition for any given set of input argument values. Finally, there must be no meaningful side-effects such that using a precalculated value instead of executing the function again would alter the application.

The creator of a function can promise to the Oracle server that the function behaves according to these restrictions by using the keyword DETERMINISTIC when declaring the function with a CREATE FUNCTION statement or in a CREATE PACKAGE or CREATE TYPE statement. The server does not attempt to verify this declaration--even a function that obviously manipulates the database or package variables can be declared DETERMINISTIC. It is the programmer's responsibility to use this keyword only when appropriate.

Calls to a DETERMINISTIC function may be replaced by the use of an already calculated value when the function is called multiple times within the same query, or if there is a function-based index or a materialized view defined that includes a relevant call to the function.

See Also:

For more information on DETERMINISTIC functions, see Oracle8i Application Developer's Guide - Fundamentals.

For descriptions of CREATE FUNCTION, CREATE INDEX, and CREATE MATERIALIZED VIEW, see Oracle8i SQL Reference.

For a description of function-based indexes, see Oracle8i Concepts.

For detailed information about materialized views, see Oracle8i Data Warehousing Guide.

Transforming and Optimizing Statements

SQL is a very flexible query language; there are often many statements you could use to achieve the same goal. Sometimes, the optimizer transforms one such statement into another that achieves the same goal if the second statement can be executed more efficiently.

This section discusses the following topics:

Transforming ORs into Compound Queries

If a query contains a WHERE clause with multiple conditions combined with OR operators, then the optimizer transforms it into an equivalent compound query that uses the UNION ALL set operator if this makes it execute more efficiently:

If each condition individually makes an index access path available, then the optimizer can make the transformation. The optimizer then chooses an execution plan for the resulting statement that accesses the table multiple times using the different indexes, and then puts the results together.
If any condition requires a full table scan because it does not make an index available, then the optimizer does not transform the statement. The optimizer chooses a full table scan to execute the statement, and Oracle tests each row in the table to determine whether it satisfies any of the conditions.
For statements that use the CBO, the optimizer may use statistics to determine whether to make the transformation by estimating and then comparing the costs of executing the original statement versus the resulting statement.
The CBO does not use the OR transformation for IN-lists or ORs on the same column; instead, it uses the INLIST iterator operator.

See Also:
For information on access paths and how indexes make them available, see the "Access Paths for the RBO" section and "How the CBO Chooses an Access Path".

Example:

In the following query, the WHERE clause contains two conditions combined with an OR operator:

SELECT * 
  FROM emp 
  WHERE job = 'CLERK' 
    OR deptno = 10;

If there are indexes on both the job and deptno columns, then the optimizer may transform this query into the equivalent query below:

SELECT * 
  FROM emp 
  WHERE job = 'CLERK' 
UNION ALL 
SELECT * 
  FROM emp 
  WHERE deptno = 10 
    AND job <> 'CLERK';

When the CBO is deciding whether to make a transformation, the optimizer compares the cost of executing the original query using a full table scan with that of executing the resulting query.

With the RBO, the optimizer makes this UNION ALL transformation, because each component query of the resulting compound query can be executed using an index. The RBO assumes that executing the compound query using two index scans is faster than executing the original query using a full table scan.

The execution plan for the transformed statement might look like the illustration in Figure 4-8.

Figure 4-8 Execution Plan for a Transformed Query Containing OR

To execute the transformed query, Oracle performs the following steps:

Steps 3 and 5 scan the indexes on the job and deptno columns using the conditions of the component queries. These steps obtain rowids of the rows that satisfy the component queries.
Steps 2 and 4 use the rowids from steps 3 and 5 to locate the rows that satisfy each component query.
Step 1 puts together the row sources returned by steps 2 and 4.

If either of the job or deptno columns is not indexed, then the optimizer does not even consider the transformation, because the resulting compound query would require a full table scan to execute one of its component queries. Executing the compound query with a full table scan in addition to an index scan could not possibly be faster than executing the original query with a full table scan.

Example:

The following query assumes that there is an index on the ename column only:

SELECT * 
  FROM emp 
  WHERE ename = 'SMITH' 
    OR sal > comm;

Transforming the query above would result in the compound query below:

SELECT * 
  FROM emp 
  WHERE ename = 'SMITH' 
UNION ALL 
SELECT * 
  FROM emp 
  WHERE sal > comm;

Because the condition in the WHERE clause of the second component query (sal > comm) does not make an index available, the compound query requires a full table scan. For this reason, the optimizer does not make the transformation, and it chooses a full table scan to execute the original statement.

Transforming Complex Statements into Join Statements

To optimize a complex statement, the optimizer chooses one of the following:

Transform the complex statement into an equivalent join statement, and then optimize the join statement.
Optimize the complex statement as it is.

The optimizer transforms a complex statement into a join statement whenever the resulting join statement is guaranteed to return exactly the same rows as the complex statement. This transformation allows Oracle to execute the statement by taking advantage of join optimizer techniques described in "Optimizing Joins".

The following complex statement selects all rows from the accounts table whose owners appear in the customers table:

SELECT * 
  FROM accounts 
  WHERE custno IN 
    (SELECT custno FROM customers);

If the custno column of the customers table is a primary key or has a UNIQUE constraint, then the optimizer can transform the complex query into the following join statement that is guaranteed to return the same data:

SELECT accounts.* 
  FROM accounts, customers 
  WHERE accounts.custno = customers.custno;

The execution plan for this statement might look like Figure 4-9.

Figure 4-9 Execution Plan for a Nested Loops Join

To execute this statement, Oracle performs a nested-loops join operation.

See Also:
For information on nested loops joins, see "Optimizing Joins".

If the optimizer cannot transform a complex statement into a join statement, then the optimizer chooses execution plans for the parent statement and the subquery as though they were separate statements. Oracle then executes the subquery and uses the rows it returns to execute the parent query.

The following complex statement returns all rows from the accounts table that have balances greater than the average account balance:

SELECT * 
  FROM accounts 
  WHERE accounts.balance > 
    (SELECT AVG(balance) FROM accounts);

No join statement can perform the function of this statement, so the optimizer does not transform the statement.

Note:
Complex queries whose subqueries contain aggregate functions such as AVG cannot be transformed into join statements.

Optimizing Statements That Access Views

To optimize a statement that accesses a view, the optimizer chooses one of the following:

Transform the statement into an equivalent statement that accesses the view's base tables, then optimize the resulting statement. The optimizer can use one of the following techniques to transform the statement:
- Merge the view's query into the referencing query block in the accessing statement.
- Push the predicate of the referencing query block inside the view (for an non-mergeable view).
Issue the view's query, collecting all the returned rows, and then access this set of rows with the original statement as though it were a table. (See "Accessing the View's Rows with the Original Statement".)

Merging the View's Query into the Statement

To merge the view's query into a referencing query block in the accessing statement, the optimizer replaces the name of the view with the names of its base tables in the query block and adds the condition of the view's query's WHERE clause to the accessing query block's WHERE clause.

This optimization applies to select-project-join views, which are views that contain only selections, projections, and joins--that is, views that do not contain set operators, aggregate functions, DISTINCT, GROUP BY, CONNECT BY, and so on (as described in "Mergeable and Non-mergeable Views").

Example:

The following view is of all employees who work in department 10:

CREATE VIEW emp_10 
  AS SELECT empno, ename, job, mgr, hiredate, sal, comm, deptno 
    FROM emp 
    WHERE deptno = 10;

The following query accesses the view. The query selects the IDs greater than 7800 of employees who work in department 10:

SELECT empno 
  FROM emp_10 
  WHERE empno > 7800;

The optimizer transforms the query into the following query that accesses the view's base table:

SELECT empno 
  FROM emp 
  WHERE deptno = 10 
    AND empno > 7800;

If there are indexes on the deptno or empno columns, then the resulting WHERE clause makes them available.

Mergeable and Non-mergeable Views

The optimizer can merge a view into a referencing query block when the view has one or more base tables, provided the view does not contain the following:

Set operators (UNION, UNION ALL, INTERSECT, MINUS)
A CONNECT BY clause
A ROWNUM pseudocolumn
Aggregate functions (AVG, COUNT, MAX, MIN, SUM) in the select list

When a view contains one of the following structures, it can be merged into a referencing query block only if complex view merging (described below) is enabled:

A GROUP BY clause
A DISTINCT operator in the select list

View merging is not possible for a view that has multiple base tables if it is on the right side of an outer join. However, if a view on the right side of an outer join has only one base table, then the optimizer can use complex view merging, even if an expression in the view can return a non-null value for a NULL.

See Also:
For more information, see "Optimizing Joins".

Complex View Merging

If a view's query contains a GROUP BY clause or DISTINCT operator in the select list, then the optimizer can merge the view's query into the accessing statement only if complex view merging is enabled. Complex merging can also be used to merge an IN subquery into the accessing statement if the subquery is uncorrelated (see "Example 2: IN Subquery").

Complex merging is not cost-based--it must be enabled with the initialization parameter OPTIMIZER_FEATURES_ENABLE, the MERGE hint, or the parameter _COMPLEX_VIEW_MERGING. Without this hint or parameter setting, the optimizer uses another approach (see "Pushing the Predicate into the View").

See Also:
For details about the MERGE and NO_MERGE hints, see Chapter 7, "Using Optimizer Hints".

Example 1: View with a GROUP BY Clause

The view avg_salary_view contains the average salaries for each department:

CREATE VIEW avg_salary_view AS 
  SELECT deptno, AVG(sal) AS avg_sal_dept, 
    FROM emp 
    GROUP BY deptno;

If complex view merging is enabled, then the optimizer can transform the following query, which finds the average salaries of departments in London:

SELECT dept.loc, avg_sal_dept 
  FROM dept, avg_salary_view 
  WHERE dept.deptno = avg_salary_view.deptno 
    AND dept.loc = 'London';

into the following query:

SELECT dept.loc, AVG(sal) 
  FROM dept, emp 
  WHERE dept.deptno = emp.deptno 
    AND dept.loc = 'London' 
  GROUP BY dept.rowid, dept.loc;

The transformed query accesses the view's base table, selecting only the rows of employees who work in London and grouping them by department.

Example 2: IN Subquery

Complex merging can be used for an IN clause with a non-correlated subquery, as well as for views. The view min_salary_view contains the minimum salaries for each department:

SELECT deptno, MIN(sal) 
  FROM emp 
  GROUP BY deptno;

If complex merging is enabled, then the optimizer can transform the following query, which finds all employees who earn the minimum salary for their department in London:

SELECT emp.ename, emp.sal 
  FROM emp, dept 
  WHERE (emp.deptno, emp.sal) IN min_salary_view 
    AND emp.deptno = dept.deptno 
    AND dept.loc = 'London';

into the following query (where e1 and e2 represent the emp table as it is referenced in the accessing query block and the view's query block, respectively):

SELECT e1.ename, e1.sal 
  FROM emp e1, dept, emp e2 
  WHERE e1.deptno = dept.deptno 
    AND dept.loc = 'London' 
    AND e1.deptno = e2.deptno 
  GROUP BY e1.rowid, dept.rowid, e1.ename, e1.sal 
  HAVING e1.sal = MIN(e2.sal);

Pushing the Predicate into the View

The optimizer can transform a query block that accesses a non-mergeable view by pushing the query block's predicates inside the view's query.

Example 1:

The two_emp_tables view is the union of two employee tables. The view is defined with a compound query that uses the UNION set operator:

CREATE VIEW two_emp_tables 
  (empno, ename, job, mgr, hiredate, sal, comm, deptno) AS 
  SELECT empno, ename, job, mgr, hiredate, sal, comm, deptno 
    FROM emp1 
  UNION  
  SELECT empno, ename, job, mgr, hiredate, sal, comm, deptno 
    FROM emp2;

The following query accesses the view. The query selects the IDs and names of all employees in either table who work in department 20:

SELECT empno, ename 
  FROM two_emp_tables 
  WHERE deptno = 20;

Because the view is defined as a compound query, the optimizer cannot merge the view's query into the accessing query block. Instead, the optimizer can transform the accessing statement by pushing its predicate, the WHERE clause condition (deptno = 20), into the view's compound query.

The resulting statement looks like the following:

SELECT empno, ename 
  FROM ( SELECT empno, ename, job, mgr, hiredate, sal, comm, deptno 
           FROM emp1 
           WHERE deptno = 20 
         UNION 
         SELECT empno, ename, job, mgr, hiredate, sal, comm, deptno 
           FROM emp2 
           WHERE deptno = 20 );

If there is an index on the deptno column, then the resulting WHERE clauses make it available.

Figure 4-10 shows the execution plan of the resulting statement.

Figure 4-10 Accessing a View Defined with the UNION Set Operator

To execute this statement, Oracle performs the following steps:

Steps 5 and 6 perform full scans of the emp1 and emp2 tables.
Step 4 performs a UNION-ALL operation returning all rows returned by either step 5 or step 6, including all copies of duplicates.
Step 3 sorts the result of step 4, eliminating duplicate rows.
Step 2 extracts the desired columns from the result of step 3.
Step 1 indicates that the view's query was not merged into the accessing query.

Example 2:

The view emp_group_by_deptno contains the department number, average salary, minimum salary, and maximum salary of all departments that have employees:

CREATE VIEW emp_group_by_deptno 
  AS SELECT deptno, 
         AVG(sal) avg_sal, 
         MIN(sal) min_sal, 
         MAX(sal) max_sal 
    FROM emp 
    GROUP BY deptno;

The following query selects the average, minimum, and maximum salaries of department 10 from the emp_group_by_deptno view:

SELECT * 
  FROM emp_group_by_deptno 
  WHERE deptno = 10;

The optimizer transforms the statement by pushing its predicate (the WHERE clause condition) into the view's query. The resulting statement looks like the following:

SELECT deptno, 
    AVG(sal) avg_sal, 
    MIN(sal) min_sal, 
    MAX(sal) max_sal, 
  FROM emp 
  WHERE deptno = 10 
  GROUP BY deptno;

If there is an index on the deptno column, then the resulting WHERE clause makes it available. Figure 4-11 shows the execution plan for the resulting statement. The execution plan uses an index on the deptno column.

Figure 4-11 Accessing a View Defined with a GROUP BY Clause

To execute this statement, Oracle performs the following operations:

Step 4 performs a range scan on the index emp_deptno_index (an index on the deptno column of the emp table) to retrieve the rowids of all rows in the emp table with a deptno value of 10.
Step 3 accesses the emp table using the rowids retrieved by step 4.
Step 2 sorts the rows returned by step 3 to calculate the average, minimum, and maximum sal values.
Step 1 indicates that the view's query was not merged into the accessing query.

Applying an Aggregate Function to the View

The optimizer can transform a query that contains an aggregate function (AVG, COUNT, MAX, MIN, SUM) by applying the function to the view's query.

Example:

The following query accesses the emp_group_by_deptno view defined in the previous example. This query derives the averages for the average department salary, the minimum department salary, and the maximum department salary from the employee table:

SELECT AVG(avg_sal), AVG(min_sal), AVG(max_sal) 
    FROM emp_group_by_deptno;

The optimizer transforms this statement by applying the AVG aggregate function to the select list of the view's query:

SELECT AVG(AVG(sal)), AVG(MIN(sal)), AVG(MAX(sal)) 
    FROM emp 
    GROUP BY deptno;

Figure 4-12 shows the execution plan of the resulting statement.

Figure 4-12 Applying Aggregate Functions to a View Defined with GROUP BY Clause

To execute this statement, Oracle performs these operations:

Step 4 performs a full scan of the emp table.
Step 3 sorts the rows returned by step 4 into groups based on their deptno values and calculates the average, minimum, and maximum sal value of each group.
Step 2 indicates that the view's query was not merged into the accessing query.
Step 1 calculates the averages of the values returned by step 2.

Views in Outer Joins

For a view that is on the right side of an outer join, the optimizer can use one of two methods, depending on how many base tables the view accesses:

If the view has only one base table, then the optimizer can use view merging.
If the view has multiple base tables, then the optimizer can push the join predicate into the view.

Merging a View That Has a Single Base Table

A view that has one base table and is on the right side of an outer join can be merged into the query block of an accessing statement. (See "Merging the View's Query into the Statement".) View merging is possible even if an expression in the view can return a non-null value for a NULL.

Example:

Consider the view name_view, which concatenates first and last names from the emp table:

CREATE VIEW name_view 
  AS SELECT emp.firstname || emp.lastname AS emp_fullname, emp.deptno 
       FROM emp;

and consider this outer join statement, which finds the names of all employees in London and their departments, as well as any departments that have no employees:

SELECT dept.deptno, name_view.emp_fullname 
  FROM emp_fullname, dept 
  WHERE dept.deptno = name_view.deptno(+) 
    AND dept.loc = 'London';

The optimizer merges the view's query into the outer join statement. The resulting statement looks like this:

SELECT dept.deptno, DECODE(emp.rowid, NULL, NULL, emp.firstname || emp.lastname) 
  FROM emp, dept 
  WHERE dept.deptno = emp.deptno(+) 
    AND dept.loc = 'London';

The transformed statement selects only the employees who work in London.

Pushing the Join Predicate into a View That Has Multiple Base Tables

For a view with multiple base tables on the right side of an outer join, the optimizer can push the join predicate into the view (see "Pushing the Predicate into the View") if the initialization parameter _PUSH_JOIN_PREDICATE is set to TRUE or the accessing query contains the PUSH_PRED hint.

Pushing a join predicate is a cost-based transformation that can enable more efficient access path and join methods, such as transforming hash joins into nested loops joins, and full table scans to index scans.

See Also:
For information about optimizer hints, see Chapter 7, "Using Optimizer Hints".

Example 1:

Consider the view london_emp, which selects the employees who work in London:

CREATE VIEW london_emp 
  AS SELECT emp.ename 
       FROM emp, dept 
       WHERE emp.deptno = dept.deptno 
         AND dept.loc = 'London';

and consider this outer join statement, which finds the engineers and accountants working in London who received bonuses:

SELECT bonus.job, london_emp.ename 
  FROM bonus, london_emp 
  WHERE bonus.job IN ('engineer', 'accountant') 
    AND bonus.ename = london_emp.ename(+);

The optimizer pushes the outer join predicate into the view. The resulting statement (which does not conform to standard SQL syntax) looks like this:

SELECT bonus.job, london_emp.ename 
  FROM bonus, (SELECT emp.ename FROM emp, dept 
                   WHERE bonus.ename = london_emp.ename(+) 
                     AND emp.deptno = dept.deptno 
                     AND dept.loc = 'London') 
  WHERE bonus.job IN ('engineer', 'accountant');

Example 2:

Consider the following example:

SELECT 'PAYMENT' c_tx_type, c.check_id c_tx_id, 1 c_je_header_id,   

c.status_lookup_code, c_tx_status, DECODE(:c_bank_curr_dsp,:c_gl_currency_
code, NVL(c.base_amount,NVL(c.amount,0)), NVL(c.amount,0)) c_tx_ba_amount, 
DECODE(SIGN(:c_julian_as_of_date -
TO_CHAR(c.check_date,'J')),-1, DECODE(:c_bank_curr_dsp,:c_gl_currency_code, 
NVL(c.base_amount,NVL(c.amount,0)), NVL(c.amount,0)),0) c_tx_ba_future_
amount, NULL c_tx_dr_cr, cs.future_pay_code_combination_id c_tx_clearing_
ccid,  NVL(c.exchange_rate, 0) c_tx_exchange_rate

FROM  ap_checks c,  
      ap_check_stocks cs
WHERE (c.check_stock_id(+) = cs.check_stock_id ) AND 
      (:c_sl_reference_type = 'PAYMENT') AND 
      (:c_sl_reference_id= c.check_id) AND (:c_sl_je_header_id = 1);

Without pushing the join predicate: 41 minutes, 1,492,141 buffer gets, 125,202 disk reads

Cost=20003 SELECT STATEMENT
Cost=   FILTER
Cost=     FILTER
Cost=       NESTED LOOPS OUTER
Cost=1         TABLE ACCESS FULL AP_CHECK_STOCKS_ALL
Cost=20002         TABLE ACCESS FULL AP_CHECKS_ALL

After pushing the join predicate: 0.01 seconds, 6 buffer gets, 5 disk reads

Cost=4 SELECT STATEMENT
Cost=   FILTER
Cost=4     NESTED LOOPS OUTER
Cost=3       TABLE ACCESS BY INDEX ROWID AP_CHECKS_ALL
Cost=2         INDEX UNIQUE SCAN AP_CHECKS_U1:
Cost=1       TABLE ACCESS BY INDEX ROWID AP_CHECK_STOCKS_ALL
Cost=         INDEX UNIQUE SCAN AP_CHECK_STOCKS_U1:

Accessing the View's Rows with the Original Statement

The optimizer cannot transform all statements that access views into equivalent statements that access base table(s). For example, if a query accesses a ROWNUM pseudocolumn in a view, then the view cannot be merged into the query, and the query's predicate cannot be pushed into the view.

To execute a statement that cannot be transformed into one that accesses base tables, Oracle issues the view's query, collects the resulting set of rows, and then accesses this set of rows with the original statement as though it were a table.

Example:

Consider the emp_group_by_deptno view defined in the previous section:

CREATE VIEW emp_group_by_deptno 
  AS SELECT deptno, 
        AVG(sal) avg_sal,
        MIN(sal) min_sal, 
        MAX(sal) max_sal 
    FROM emp 
    GROUP BY deptno;

The following query accesses the view. The query joins the average, minimum, and maximum salaries from each department represented in this view and to the name and location of the department in the dept table:

SELECT emp_group_by_deptno.deptno, avg_sal, min_sal,
  max_sal, dname, loc 
    FROM emp_group_by_deptno, dept 
    WHERE emp_group_by_deptno.deptno = dept.deptno;

Because there is no equivalent statement that accesses only base tables, the optimizer cannot transform this statement. Instead, the optimizer chooses an execution plan that issues the view's query and then uses the resulting set of rows as it would the rows resulting from a table access.

See Also:
For more information on how Oracle performs a nested loops join operation, see "Optimizing Joins".

Figure 4-13 shows the execution plan for this statement.

Figure 4-13 Joining a View Defined with a GROUP BY Clause to a Table

To execute this statement, Oracle performs the following operations:

Step 4 performs a full scan of the emp table.
Step 3 sorts the results of step 4 and calculates the average, minimum, and maximum sal values selected by the query for the emp_group_by_deptno view.
Step 2 used the data from the previous two steps for a view.
For each row returned by step 2, step 6 uses the deptno value to perform a unique scan of the pk_dept index.
Step 5 uses each rowid returned by step 6 to locate the row in the deptno table with the matching deptno value.
Oracle combines each row returned by step 2 with the matching row returned by step 5 and returns the result.

Optimizing Compound Queries

To choose the execution plan for a compound query, the optimizer chooses an execution plan for each of its component queries, and then combines the resulting row sources with the union, intersection, or minus operation, depending on the set operator used in the compound query.

Figure 4-14 shows the execution plan for the following statement, which uses the UNION ALL operator to select all occurrences of all parts in either the orders1 table or the orders2 table:

SELECT part FROM orders1 
UNION ALL 
SELECT part FROM orders2;

Figure 4-14 Compound Query with UNION ALL Set Operator

To execute this statement, Oracle performs the following steps:

Steps 2 and 3 perform full table scans on the orders1 and orders2 tables.
Step 1 performs a UNION-ALL operation returning all rows that are returned by either step 2 or step 3 including all copies of duplicates.

Figure 4-15 shows the execution plan for the following statement, which uses the UNION operator to select all parts that appear in either the orders1 or orders2 table:

SELECT part FROM orders1 
UNION 
SELECT part FROM orders2;

Figure 4-15 Compound Query with UNION Set Operator

This execution plan is identical to the one for the UNION-ALL operator shown in Figure 4-14 , except that in this case, Oracle uses the SORT operation to eliminate the duplicates returned by the UNION-ALL operation.

Figure 4-16 shows the execution plan for the following statement, which uses the INTERSECT operator to select only those parts that appear in both the orders1 and orders2 tables:

SELECT part FROM orders1 
INTERSECT 
SELECT part FROM orders2;

Figure 4-16 Compound Query with INTERSECT Set Operator

To execute this statement, Oracle performs the following steps:

Steps 3 and 5 perform full table scans of the orders1 and orders2 tables.
Steps 2 and 4 sort the results of steps 3 and 5, eliminating duplicates in each row source.
Step 1 performs an INTERSECTION operation that returns only rows that are returned by both steps 2 and 4.

Optimizing Distributed Statements

The optimizer chooses execution plans for SQL statements that access data on remote databases in much the same way that it chooses executions for statements that access only local data:

If all the tables accessed by a SQL statement are collocated on the same remote database, then Oracle sends the SQL statement to that remote database. The remote Oracle instance executes the statement and sends only the results back to the local database.
If a SQL statement accesses tables that are located on different databases, then Oracle decomposes the statement into individual fragments, each of which accesses tables on a single database. Oracle then sends each fragment to the database that it accesses. The remote Oracle instance for each of these databases executes its fragment and returns the results to the local database, where the local Oracle instance may perform any additional processing the statement requires.

When choosing a cost-based execution plan for a distributed statement, the optimizer considers the available indexes on remote databases just as it does indexes on the local database. The optimizer also considers statistics on remote databases for the CBO. Furthermore, the optimizer considers the location of data when estimating the cost of accessing it. For example, a full scan of a remote table has a greater estimated cost than a full scan of an identical local table.

For a rule-based execution plan, the optimizer does not consider indexes on remote tables.

See Also:
For more information on tuning distributed queries, see Chapter 9, "Optimizing SQL Statements".

CHOOSE	The optimizer chooses between a cost-based approach and a rule-based approach based on whether statistics are available for the CBO. If the data dictionary contains statistics for at least one of the accessed tables, then the optimizer uses a cost-based approach and optimizes with a goal of best throughput. If the data dictionary contains no statistics for any of the accessed tables, then the optimizer uses a rule-based approach. This is the default value for the parameter.
ALL_ROWS	The optimizer uses a cost-based approach for all SQL statements in the session regardless of the presence of statistics and optimizes with a goal of best throughput (minimum resource use to complete the entire statement).
FIRST_ROWS	The optimizer uses a cost-based approach for all SQL statements in the session regardless of the presence of statistics and optimizes with a goal of best response time (minimum resource use to return the first row of the result set).
RULE	The optimizer chooses a rule-based approach for all SQL statements regardless of the presence of statistics.

Unique scan	This returns only a single rowid. Oracle performs a unique scan only in cases in which a single rowid is required, rather than many rowids. For example, Oracle performs a unique scan if there is a `UNIQUE` or a `PRIMARY` `KEY` constraint that guarantees that the statement accesses only a single row.
Range scan	This can return zero or more rowids, depending on how many rows the statement accesses.
Full scan	This is available if a predicate references one of the columns in the index. The predicate does not need to be an index driver. Full scan is also available when there is no predicate, if all of the columns in the table referenced in the query are included in the index and at least one of the index columns is not null. Full scan can be used to eliminate a sort operation. It reads the blocks singly.
Fast full scan	This is an alternative to a full table scan when the index contains all the columns that are needed for the query, and at least one column in the index key has the `NOT` `NULL` constraint. Fast full scan accesses the data in the index itself, without accessing the table. It cannot be used to eliminate a sort operation. It reads the entire index using multiblock reads (unlike a full index scan) and can be parallelized. Fast full scan is available only with the CBO. You can specify it with the initialization parameter `OPTIMIZER_FEATURES_ENABLE` or the `INDEX_FFS` hint. Fast full index scans cannot be performed against bitmap indexes.
Index join	This is a hash join of several indexes that together contain all the columns from the table that are referenced in the query. If an index join is used, then no table access is needed, because all the relevant column values can be retrieved from the indexes. An index join cannot be used to eliminate a sort operation. Index join is available only with the CBO. You can specify it with the initialization parameter `OPTIMIZER_FEATURES_ENABLE` or the `INDEX_JOIN` hint. Example: The following statement uses an index join to access the `empno` and `sal` columns, both of which are indexed, in the `emp` table: SELECT empno, sal FROM emp WHERE sal > 2000; The `EXPLAIN` `PLAN` output for this statement might look like this: OPERATION OPTIONS OBJECT_NAME --------------------------------------------------------- SELECT STATEMENT VIEW index$_join$_001 HASH JOIN INDEX RANGE SCAN EMP_SAL INDEX FAST FULL SCAN EMP_EMPNO
Bitmap	This uses a bitmap for key values and a mapping function that converts each bit position to a rowid. Bitmaps can efficiently merge indexes that correspond to several conditions in a `WHERE` clause, using Boolean operations to resolve `AND` and `OR` conditions. Bitmap access is available only with the CBO.

OPTIMIZER_FEATURES_ENABLED	Enables several optimizer features, depending on the user-specified value. For example, if `OPTIMIZER_FEATURES_ENABLED`=`8`.`1`.`6`, then `ALL_ROWS` or `FIRST_ROWS` is also used for the recursive SQL generated by PL/SQL procedures. Prior to release 8.1.6, only `CHOOSE` or `RULE` was used for such recursive SQL.
OPTIMIZER_MODE	This initialization parameter sets the mode of the optimizer at instance startup: `RULE` (use RBO), `ALL_ROWS` (use CBO for throughput), `FIRST_ROWS` (use CBO for response time), or `CHOOSE` (an optimizer choice based on the presence of statistics. Set the `OPTIMIZER_MODE` parameter of the `ALTER` `SESSION` statement to change the value dynamically during a session.
OPTIMIZER_PERCENT_PARALLEL	Defines the amount of parallelism that the optimizer uses in its cost functions.
HASH_AREA_SIZE	Larger values can lower hash join costs, permitting Oracle to perform more hash joins.
SORT_AREA_SIZE	Larger values can lower sort costs, permitting Oracle to perform more sort merge joins.
DB_FILE_MULTIBLOCK_READ_COUNT	Larger values can lower table scan costs and make Oracle favor table scans over indexes.

ALWAYS_ANTI_JOIN	Sets the type of antijoin that Oracle uses: `NESTED_LOOPS`, `MERGE`, or `HASH`.
HASH_JOIN_ENABLED	Enables or disables the hash join feature. This should always be set to `true` for data warehousing applications.

HASH_MULTIBLOCK_IO_COUNT	Larger value can lower hash join costs, permitting Oracle to perform more hash joins.
BITMAP_MERGE_AREA_SIZE	The size of the area used to merge the different bitmaps that match a range predicate. Larger size favors use of bitmap indexes for range predicates.

Simple statement	An `INSERT`, `UPDATE`, `DELETE`, or `SELECT` statement that involves only a single table.
Simple query	Another name for a `SELECT` statement.
Join	A query that selects data from more than one table. A join is characterized by multiple tables in the `FROM` clause. Oracle pairs the rows from these tables using the condition specified in the `WHERE` clause and returns the resulting rows. This condition is called the join condition and usually compares columns of all the joined tables.
Equijoin	A join condition containing an equality operator.
Non-equijoin	A join condition containing something other than an equality operator.
Outer join	A join condition using the outer join operator (+) with one or more columns of one of the tables. Oracle returns all rows that meet the join condition. Oracle also returns all rows from the table without the outer join operator for which there are no matching rows in the table with the outer join operator.
Cartesian product	A join with no join condition results in a Cartesian product, or a cross product. A Cartesian product is the set of all possible combinations of rows drawn one from each table. In other words, for a join of two tables, each row in one table is matched in turn with every row in the other. A Cartesian product for more than two tables is the result of pairing each row of one table with every row of the Cartesian product of the remaining tables. All other kinds of joins are subsets of Cartesian products effectively created by deriving the Cartesian product and then excluding rows that fail the join condition.
Complex statement	An `INSERT`, `UPDATE`, `DELETE`, or `SELECT` statement that contains a subquery, which is a form of the `SELECT` statement within another statement that produces a set of values for further processing within the statement. The outer portion of the complex statement that contains a subquery is called the parent statement.
Compound query	A query that uses set operators (`UNION`, `UNION` `ALL`, `INTERSECT`, or `MINUS`) to combine two or more simple or complex statements. Each simple or complex statement in a compound query is called a component query.
Statement accessing views	Simple, join, complex, or compound statement that accesses one or more views as well as tables.
Distributed statement	A statement that accesses data on two or more distinct nodes of a distributed database. A remote statement accesses data on one remote node of a distributed database.

1	Evaluation of expressions and conditions	The optimizer first evaluates expressions and conditions containing constants as fully as possible. (See "Evaluation of Expressions and Conditions".)
2	Statement transformation	For complex statements involving, for example, correlated subqueries, the optimizer may transform the original statement into an equivalent join statement. (See "Transforming and Optimizing Statements".)
3	View merging	For SQL statements that access a view, the optimizer often merges the query in the statement with that in the view, and then optimizes the result. (See "Optimizing Statements That Access Views".)
4	Choice of optimizer approaches	The optimizer chooses either a cost-based or rule-based approach and determines the goal of optimization. (See "Optimizing Joins".)
5	Choice of access paths	For each table accessed by the statement, the optimizer chooses one or more of the available access paths to obtain the table's data. (See "Access Paths for the CBO".)
6	Choice of join orders	For a join statement that joins more than two tables, the optimizer chooses which pair of tables is joined first, and then which table is joined to the result, and so on.
7	Choice of join operations	For any join statement, the optimizer chooses an operation to use to perform the join.

comp_oper	Any of the comparison operators =, !=, ^=, <, <>, >, <=, or >=.
constant	Any constant expression involving operators, SQL functions, literals, bind variables, and correlation variables.

4 The Optimizer

SQL Processing Architecture

Figure 4-1 SQL Processing Architecture

Parser

Optimizer

Row Source Generator

SQL Execution

EXPLAIN PLAN

What Is The Optimizer?

Execution Plan

Figure 4-2 An Execution Plan

Steps of Execution Plan

Execution Order

Choosing an Optimizer Approach and Goal

OPTIMIZER_MODE Initialization Parameter

Statistics in the Data Dictionary

OPTIMIZER_GOAL Parameter of the ALTER SESSION Statement

Changing the Goal with Hints

Example

Cost-Based Optimizer (CBO)

Architecture of the CBO

Figure 4-3 Cost-Based Optimizer Architecture

Query Transformer

View Merging

Subquery Unnesting

Query Rewrite with Materialized Views

Estimator

Selectivity

Cardinality

Cost

Plan Generator

Features that Require the CBO

Using the CBO

Access Paths for the CBO

Full Table Scans

Sample Table Scans

Example:

Table Access by Rowid

Cluster Scans

Hash Scans

Index Scans

How the CBO Chooses an Access Path

Example 1:

Example 2:

Example 3:

Example 4:

Example 5:

Example 6:

CBO Parameters

Parameters Affecting CBO Plans

Parameters Affecting How the Optimizer Uses Indexes

Setting Initialization Parameters

Verifying Initialization Parameters

Extensible Optimizer

User-Defined Statistics

User-Defined Selectivity

User-Defined Costs

Rule-Based Optimizer (RBO)

Access Paths for the RBO

Path 1: Single Row by Rowid

Example:

Path 2: Single Row by Cluster Join

Example:

Path 3: Single Row by Hash Cluster Key with Unique or Primary Key

Example:

Path 4: Single Row by Unique or Primary Key

Example:

Path 5: Clustered Join

Example:

Path 6: Hash Cluster Key

Example:

Path 7: Indexed Cluster Key

Example:

Path 8: Composite Index

Example:

Path 9: Single-Column Indexes

Example 1:

Example 2:

Path 10: Bounded Range Search on Indexed Columns

Example 1:

4
The Optimizer