Query Optimization, 3 of 7

Defining Statistics, Selectivity, and Cost Functions

You can compute and store user-defined statistics for domain indexes and columns. These statistics are in addition to the standard statistics that are already collected by ANALYZE. User-defined selectivity and cost functions for functions and domain indexes can use standard and user-defined statistics in their computation. The internal representation of these statistics need not be known to Oracle; we only require that you provide methods for their access. You are solely responsible for defining the representation of such statistics and for maintaining them. It is important to note that user-collected statistics are only used by user-defined selectivity and cost functions; the optimizer use s only its standard statistics.

User-defined statistics collection, selectivity, and cost functions must be defined in a user-defined type. This type must have a form similar to a system-defined interface called ODCIStats (Oracle Data Cartridge Interface Statistics) defined as follows:

CREATE INTERFACE ODCIStats AS (

   -- Function to get current interface
   FUNCTION ODCIGetInterfaces(ifclist OUT ODCIObjectList) return NUMBER,

   -- User-defined statistics functions
   FUNCTION ODCIStatsCollect(col ODCIColInfo, options ODCIStatsOptions,
      statistics OUT RAW) return NUMBER,
   FUNCTION ODCIStatsCollect(ia ODCIIndexInfo, options ODCIStatsOptions,
      statistics OUT RAW) return NUMBER,
   FUNCTION ODCIStatsDelete(col ODCIColInfo) return NUMBER,
   FUNCTION ODCIStatsDelete(ia ODCIIndexInfo) return NUMBER,
   
   -- User-defined selectivity function
   FUNCTION ODCIStatsSelectivity(pred ODCIPredInfo, sel OUT NUMBER, args
      ODCIArgDescList, start <function_return_type>,
      stop <function_return_type>,  
      <list of function arguments>) return NUMBER,

   -- User-defined cost function for functions and type methods
   FUNCTION ODCIStatsFunctionCost(func ODCIFuncInfo, cost OUT ODCICost,

      args ODCIArgDescList, <list of function arguments>) return NUMBER,

   -- User-defined cost function for domain indexes
   FUNCTION ODCIStatsIndexCost(ia ODCIIndexInfo, sel NUMBER,
      cost OUT ODCICost, qi ODCIQueryInfo, pred ODCIPredInfo,         
      args ODCIArgDescList, start <operator_return_type>,
      stop <operator_return_type>, <list of operator value arguments>)
      return NUMBER
)

You can define a new object type, referred to as a statistics type, with a subset of functions from ODCIStats. Since user-defined statistics collection, selectivity, and cost functions are all optional, a statistics type need not contain all the functions in ODCIStats.

The types of the parameters of statistics type methods are system-defined ODCI (Oracle Data Cartridge Interface) datatypes. Some of them are described in the reference to Extensible Indexing, and the rest are described in the reference chapter detailing the Extensible Optimizer.

The selectivity and cost functions must not change any database or package state. To that end, they must be defined with appropriate purity level pragmas, otherwise the optimizer will not call them.

Depending on the object that user-defined statistics are being associated with, not all the functions defined in a statistics type will be used. The table below lists the functions and default statistics that will be used by the optimizer.

Table 8-1 Statistics Type Methods and Default Statistics Used for Various Objects

ASSOCIATE STATISTICS WITH	Statistics Type Methods Used	Default Statistics Used
column	`ODCIStatsCollect, ODCIStatsDelete`
object type	`ODCIStatsCollect, ODCIStatsDelete, ODCIStatsFunctionCost, ODCIStatsSelectivity`	cost, selectivity
function	`ODCIStatsFunctionCost, ODCIStatsSelectivity`	cost, selectivity
package	`ODCIStatsFunctionCost, ODCIStatsSelectivity`	cost, selectivity
index	`ODCIStatsCollect, ODCIStatsDelete, ODCIIndexCost`	cost
indextype	`ODCIStatsCollect, ODCIStatsDelete, ODCIIndexCost`	cost

User-Defined Statistics Functions

There are two user-defined statistics collection functions, one for collecting statistics and the other for deleting them.

The first, ODCIStatsCollect, is used to collect user-defined statistics; its interface depends on whether a column or domain index is being analyzed. It is called when analyzing a column of a table or a domain index and takes two parameters:

col for the column being analyzed, or

ia for the domain index being analyzed;
options for options specified in the ANALYZE command (e.g., the sample size when ANALYZE ESTIMATE is used).

As mentioned, the database does not interpret statistics collected by ODCIStatsCollect. You can store output in a user-managed format or in a dictionary table (described in the Extensible Optimizer reference) provided for the purpose. The statistics collected by the ODCIStatsCollect functions are returned in the output parameter, statistics, as a RAW datatype.

When an ANALYZE DELETE command is issued, user-collected statistics are deleted by calling the ODCIStatsDelete function whose interface depends on whether the statistics for a column or domain index are being dropped. It takes a single parameter: col, for the column whose user-defined statistics need to be deleted, or ia, for the domain index whose statistics are to be deleted.

If a user-defined ODCIStatsCollect function is present in a statistics type, the corresponding ODCIStatsDelete function must also be present.

The return values of the ODCIStatsCollect and ODCIStatsDelete functions must be Success (indicating success), Error (indicating an error), or Warning (indicating a warning); these return values are defined in a system package ODCIConst (described in the Extensible Optimizer reference).

User-defined Selectivity Functions

You will recall that user-defined selectivity functions are used only for predicates of the following forms:

operator(...) relational_operator <constant>

<constant> relational_operator operator(...)

operator(...) LIKE <constant>

A user-defined selectivity function, ODCIStatsSelectivity, takes five sets of input parameters that describe the predicate:

pred describing the function operator and the relational operator relational_operator;
args describing the start and stop values (i.e., <constant>) of the function and the actual arguments to the function operator;
start whose datatype is the same as that of the function's return value, describing the start value of the function;
stop whose datatype is the same as that of the function's return value, describing the stop value of the function;
and a list of function arguments whose number, position, and type must match the arguments of the function operator.

The computed selectivity is returned in the output parameter sel, in whole numbers, as a percentage, between 0 and 100, inclusive. The optimizer ignores invalid values.

The return value of the ODCIStatsSelectivity function must be

Success indicating success, or
Error indicating an error, or
Warning indicating a warning.

As an example, consider a function myFunction defined as follows:

myFunction (a NUMBER, b VARCHAR2(10)) return NUMBER

A user-defined selectivity function for the function myFunction would be as follows:

ODCIStatsSelectivity(pred ODCIPredInfo, sel OUT NUMBER, args ODCIArgDescList,
   start NUMBER, stop NUMBER, a NUMBER, b VARCHAR2(10)) return NUMBER

If the function myFunction is called with literal arguments, e.g.,

myFunction (2, 'TEST') > 5

then the selectivity function is called as follows:

ODCIStatsSelectivity(<ODCIPredInfo constructor>, sel,
   <ODCIArgDescList constructor>, 5, NULL, 2, 'TEST')

If, on the other hand, the function myFunction is called with some non-literals e.g.,

myFunction(Test_tab.col_a, 'TEST')> 5

where col_a is a column in table Test_tab, then the selectivity function is called as follows:

ODCIStatsSelectivity(<ODCIPredInfo constructor>, sel,
   <ODCIArgDescList constructor>, 5, NULL, NULL, 'TEST')

In other words, the start, stop, and function argument values are passed to the selectivity function only if they are literals; otherwise they are NULL. The ODCIArgDescList descriptor describes all its following arguments.

User-defined Cost Functions for Functions

As already mentioned, user-defined cost functions are only used for predicates of the following forms:

operator(...) relational_operator <constant>

<constant> relational_operator operator(...)

operator(...) LIKE <constant>

You can define a function, ODCIStatsFunctionCost, for computing the cost of stand-alone functions, package functions, or type methods. This function takes three sets of input parameters describing the predicate:

func describing the function operator;
args describing the actual arguments to the function operator;
and a list of function arguments whose number, position, and type must match the arguments of the function operator.

The ODCIStatsFunctionCost function returns its computed cost in the cost parameter. As mentioned, the returned cost can have two components -- CPU and I/O -- which are combined by the optimizer to compute a composite cost. The costs returned by user-defined cost functions must be positive whole numbers. Invalid values are ignored by the optimizer.

The return value of the ODCIStatsFunctionCost function must be

Success indicating success, or
Error indicating an error, or
Warning indicating a warning.

Consider a function myFunction defined as follows:

myFunction (a NUMBER, b VARCHAR2(10)) return NUMBER

A user-defined cost function for the function myFunction would be coded as follows:

ODCIStatsFunctionCost(func ODCIFuncInfo, cost OUT ODCICost,
   args ODCIArgDescList, a NUMBER, b VARCHAR2(10)) return NUMBER

If the function myFunction is called with literal arguments, e.g.,

myFunction(2, 'TEST') > 5,

then the cost function is called as follows:

ODCIStatsFunctionCost(<ODCIFuncInfo constructor>, cost,
   <ODCIArgDescList constructor>, 2, 'TEST')

If, on the other hand, the function myFunction is called with some non-literals, e.g.,

myFunction(Test_tab.col_a, 'TEST') > 5

where col_a is a column in table Test_tab, then the cost function is called as follows:

ODCIStatsFunctionCost(<ODCIFuncInfo constructor>, cost,
   <ODCIArgDescList constructor>, NULL, 'TEST')

In other words, function argument values are passed to the cost function only if they are literals; otherwise they are NULL. The ODCIArgDescList descriptor describes all its following arguments.

User-defined Cost Functions for Domain Indexes

User-defined cost functions for domain indexes are used for the same type of predicates mentioned previously, except that operator must be a user-defined operator for which a valid domain index access path exists.

The ODCIStatsIndexCost function takes eight sets of parameters:

ia describing the domain index;
sel representing the user-computed selectivity of the predicate;
qi containing additional information about the query;
pred describing the predicate;
args describing the start and stop values (i.e., <constant>) of the operator and the actual arguments to the operator operator;
start, whose datatype is the same as that of the operator's return value, describing the start value of the operator;
stop whose datatype is the same as that of the operator's return value, describing the stop value of the operator; and
a list of operator value arguments whose number, position, and type must match the arguments of the operator operator. The value arguments of an operator are the arguments excluding the first argument.

The computed cost of the domain index is returned in the output parameter, cost.

ODCIStatsIndexCost returns

Success indicating success, or
Error indicating an error, or
Warning indicating a warning.

Consider an operator

Contains(a_string VARCHAR2(2000), b_string VARCHAR2(10))

that returns 1 or 0 depending on whether or not the string b_string is contained in the string a_string. Further, assume that the operator is implemented by a domain index. A user-defined index cost function for this domain index would be coded as follows:

ODCIStatsIndexCost(ia ODCIIndexInfo, sel NUMBER, cost OUT ODCICost,
   qi ODCIQueryInfo, pred ODCIPredInfo, args ODCIArgDescList,
   start NUMBER, stop NUMBER, b_string VARCHAR2(10)) return NUMBER

Note that the first argument, a_string, of Contains does not appear as a parameter of ODCIStatsIndexCost. This is because the first argument to an operator must be a column for the domain index to be used, and this column information is passed in via the ODCIIndexInfo parameter. Only the operator arguments after the first (i.e., the "value" arguments) must appear as parameters to the ODCIStatsIndexCost function.

If the operator is called, e.g.,

Contains(Test_tab.col_c,'TEST') <= 1

then the index cost function is called as follows:

ODCIStatsIndexCost(<ODCIIndexInfo constructor>, sel, cost,
   <ODCIQueryInfo constructor>, <ODCIPredInfo constructor>, 
   <ODCIArgDescList constructor>, NULL, 1, 'TEST')

In other words, the start, stop, and operator argument values are passed to the index cost function only if they are literals; otherwise they are NULL. The ODCIArgDescList descriptor describes all its following arguments.