Oracle8i interMedia Text Reference Release 2 (8.1.6) Part Number A77063-01 |
|
Indexing, 3 of 11
Use the datastore objects to specify how your text is stored. To create a data storage preference, you must use one of the following objects:
Use the DIRECT_DATASTORE object for text stored directly in the text column, one document per row. DIRECT_DATASTORE has no attributes.
The following example creates a table with a CLOB column to store text data. It then populates two rows with text data and indexes the table using the system-preference CTXSYS.DEFAULT_DATASTORE
create table mytable(id number primary key, docs clob); insert into mytable values(111555,'this text will be indexed'); insert into mytable values(111556,'this is a direct_datastore example'); commit; create index myindex on mytable(docs) indextype is ctxsys.context parameters ('DATASTORE CTXSYS.DEFAULT_DATASTORE');
Use the DETAIL_DATASTORE object for text stored directly in the database in detail tables, with the textkey column located in the master table.
DETAIL_DATASTORE has the following attributes:
This example illustrates how master and detail tables are related to each other.
Master tables define the documents in a master/detail relationship. You assign an identifying number to each document. The following table is an example master table, called my_master
:
Detail tables contain the text for a document, whose content is usually stored across a number of rows. The following detail table my_detail
is related to the master table my_master
with the article_id
column. This column identifies the master document to which each detail row (sub-document) belongs.
Column Name | Column Type | Description |
---|---|---|
article_id |
NUMBER |
Document ID that relates to master table. |
seq |
NUMBER |
Sequence of document in the master document defined by article_id. |
text |
CLOB |
Document text. |
In this example, the DETAIL_DATASTORE attributes have the following values:
Attribute | Attribute Value |
---|---|
binary |
TRUE |
detail_table |
my_detail |
detail_key |
article_id |
detail_lineno |
seq |
detail_text |
text |
You use CTX_DDL.CREATE_PREFERENCE to create a preference with DETAIL_DATASTORE. You use CTX_DDL.SET_ATTRIBUTE to set the attributes for this preference as described above. The following example shows how this is done:
begin ctx_ddl.create_preference('my_detail_pref', 'DETAIL_DATASTORE'); ctx_ddl.set_attribute('my_detail_pref', 'binary', 'true'); ctx_ddl.set_attribute('my_detail_pref', 'detail_table', 'my_detail'); ctx_ddl.set_attribute('my_detail_pref', 'detail_key', 'article_id'); ctx_ddl.set_attribute('my_detail_pref', 'detail_lineno', 'seq'); ctx_ddl.set_attribute('my_detail_pref', 'detail_text', 'text'); end;
To index the document defined in this master/detail relationship, you specify a column in the master table with CREATE INDEX. The column you specify must be one of the allowable types.
This example uses the body
column, whose function is to allow the creation of the master/detail index and to improve readability of the code. The my_detail_pref
preference is set to DETAIL_DATASTORE with the required attributes:
CREATE INDEX myindex on my_master(body) indextype is context parameters('datastore my_detail_pref');
In this example, you can also specify the title
or author
column to create the index. However, if you do so, changes to these columns will trigger a re-index operation.
The FILE_DATASTORE object is used for text stored in files accessed through the local file system.
FILE_DATASTORE has the following attribute(s):
Attribute | Attribute Values |
---|---|
path |
path1:path2:...:pathn |
Specify the location of text files that are stored externally in a file system.
You can specify multiple paths for path, with each path separated by a colon (:). File names are stored in the text column in the text table. If path is not used to specify a path for external files, Oracle requires the path to be included in the file names stored in the text column.
This example creates a file datastore preference called COMMON_DIR that has a path of /mydocs
:
begin ctx_ddl.create_preference('COMMON_DIR','FILE_DATASTORE'); ctx_ddl.set_attribute('COMMON_DIR','PATH','/mydocs'); end;
When you populate the table mytable, you need only insert filenames. The PATH attribute tells the system where to look during the indexing operation.
create table mytable(id number primary key, docs varchar2(2000)); insert into mytable values(111555,'first.txt'); insert into mytable values(111556,'second.txt'); commit;
Create the index as follows:
create index myindex on mytable(docs) indextype is ctxsys.context parameters ('datastore COMMON_DIR');
Use the URL_DATASTORE object for text stored:
You store each URL in a single text field.
The syntax of a URL you store in a text field is as follows (with brackets indicating optional parameters):
[URL:]<access_scheme>://<host_name>[:<port_number>]/[<url_path>]
The access_scheme string you specify can be either ftp, http, or file. For example:
http://mymachine.us.oracle.com/home.html
As this syntax is partially compliant with the RFC 1738 specification, the following restriction holds for the URL syntax:
URL_DATASTORE has the following attributes:
Specify the length of time, in seconds, that a network operation such as a connect or read waits before timing out and returning a timeout error to the application. The valid range for timeout is 15 to 3600 and the default is 30.
Specify the maximum number of threads that can be running at the same time. The valid range for maxthreads is 1 to 1024 and the default is 8.
Specify the maximum length, in bytes, that the URL data store supports for URLs stored in the database. If a URL is over the maximum length, an error is returned. The valid range for urlsize is 32 to 65535 and the default is 256.
Specify the maximum number of rows that the internal buffer can hold for HTML documents (rows) retrieved from the text table. The valid range for maxurls is 32 to 65535 and the default is 256.
Specify the maximum size, in bytes, that the URL data store supports for accessing HTML documents whose URLs are stored in the database. The valid range for maxdocsize is 1 to 2,147,483,647 (2 gigabytes) and the default is 2,000,000.
Specify the fully-qualified name of the host machine that serves as the HTTP proxy (gateway) for the machine on which interMedia Text is installed. You can optionally specify port number with a colon in the form hostname:port.
This attribute must be set if the machine is in an intranet that requires authentication through a proxy server to access Web files located outside the firewall.
Specify the fully-qualified name of the host machine that serves as the FTP proxy (gateway) for the machine on which interMedia Text is installed. You can optionally specify a port number with a colon in the form hostname:port.
This attribute must be set if the machine is in an intranet that requires authentication through a proxy server to access Web files located outside the firewall.
Specify a string of domains (up to sixteen, separate by commas) which are found in most, if not all, of the machines in your intranet. When one of the domains is encountered in a host name, no request is sent to the machine(s) specified for ftp_proxy and http_proxy. Instead, the request is processed directly by the host machine identified in the URL.
For example, if the string us.oracle.com, uk.oracle.com is entered for no_proxy, any URL requests to machines that contain either of these domains in their host names are not processed by your proxy server(s).
This example creates a URL_DATASTORE preference called URL_PREF to which the http_proxy, no_proxy, timeout attributes are set. The defaults are used for the attributes that are not set.
begin ctx_ddl.create_preference('URL_PREF','URL_DATASTORE'); ctx_ddl.set_attribute('URL_PREF','HTTP_PROXY','www-proxy.us.oracle.com'); ctx_ddl.set_attribute('URL_PREF','NO_PROXY','us.oracle.com'); ctx_ddl.set_attribute('URL_PREF','Timeout','300'); end;
Create the table and insert values into it:
create table urls(id number primary key, docs varchar2(2000)); insert into urls values(111555,'http://context.us.oracle.com'); insert into urls values(111556,'http://www.sun.com'); commit;
To create the index, specify URL_PREF as the datastore:
create index datastores_text on urls ( docs ) indextype is ctxsys.context parameters ( 'Datastore URL_PREF' );
Use USER_DATASTORE object to define stored procedures that synthesize documents during indexing. For example, a user procedure might synthesize author, date, and text columns into one document to have the author and date information be part of the indexed text.
The USER_DATASTORE has the following attribute:
Specify the name of the procedure that synthesizes the document to be indexed. This specification must be in the form PROCEDURENAME or PACKAGENAME.PROCEDURENAME. The schema owner name is constrained to CTXSYS, so specifying owner name is not necessary.
The procedure you specify must have the following parameters defined as follows:
procedure (r IN ROWID, c IN OUT NOCOPY <output_type>)
The first argument r must be of type ROWID. The second argument c must be of the type defined in output_type. NOCOPY is a compiler hint that instructs Oracle to pass parameter c by reference if possible.
Note:: The procedure name and its arguments can be named anything. The arguments r and c are used in this example for simplicity. |
The stored procedure is called once for each row indexed. Given the rowid of the current row, procedure must write the text of the document into its second argument, whose type you specify with output_type.
The following constraints apply to procedure:
If you change or edit the stored procedure, indexes based upon it will not be notified, so you must manually recreate such indexes. That is, if the stored procedure makes use of other columns, and those column values change, the row will not be re-indexed. The row is only re-indexed when the indexed column changes.
Specify the datatype of the second argument to procedure. You can use either CLOB, BLOB or VARCHAR2.
Consider a table in which the author, title, and text fields are separate as in the articles
table defined as follows:
create table articles( id number, author varchar2(80), title varchar2(120), text clob );
The author and title fields are to be part of the indexed document text. Assume user appowner
writes a stored procedure with the user datastore interface that synthesizes a document from the text, author, and title fields:
create procedure myproc(rid in rowid, tlob in out clob) is offset number := 1; begin for c1 in (select author, title, text from articles where rowid = rid) loop append_varchar_to_lob(tlob, c1.title, offset); append_varchar_to_lob(tlob, 'by '||c1.author, offset); dbms_lob.append(tlob, c1.text); end loop; end;
This procedure takes in a rowid and a temporary clob locator, and concatenates all the articles columns into the temporary clob. Assume that a helper procedure append_varchar_to_lob
does the concatenation.
Because only ctxsys-owned stored procedures are allowed for the user datastore, CTXSYS
must wrap the user procedure (owned by appowner) with a CTXSYS
owned procedure as follows:
create procedure s_myproc(rid in rowid, tlob in out clob) is begin appowner.myproc(rid, tlob); end;
The CTXSYS
user must make sure that the index owner can execute the stub procedure by granting execute privileges as follows:
grant execute on s_myproc to appowner
The user appowner
creates the preference, setting the procedure attribute to the name of the ctxsys stub procedure as follows:
ctx_ddl.create_preference('myud', 'user_datastore'); ctx_ddl.set_attribute('myud', 'procedure', 's_myproc');
When appowner
creates the index on articles(text)
using this preference, the indexing operation sees author and title in the document text.
Use the nested datastore to index documents stored as rows in a nested table.
When using the nested table datastore, you must index a dummy column, since the extensible indexing framework disallows indexing the nested table column. See the example.
DML on the nested table is not automatically propagated to the dummy column used for indexing. For DML on the nested table to be propagated to the dummy column, your application code or trigger must explicitly update the dummy column.
Filter defaults for the index are based on the type of the nested_text
column.
During validation, Oracle checks that the type exists and that the attributes you specify for lineno and text exist in the nested table type. Oracle does not check that the named nested table column exists in the indexed table.
The following code creates a nested table.
create type nt_rec as object ( lno number, -- line number ltxt varchar2(80) -- text of line ); create type nt_tab as table of nt_rec; create table mytab ( id number primary key, -- primary key dummy char(1), -- dummy column for indexing doc nt_tab -- nested table ) nested table doc store as myntab;
The following code creates a storage table myntab
for the nested table.
It then inserts values into the nested table for the parent row with id equal to 1.
insert into mytab values (1, null, nt_tab()); insert into table(select doc from mytab where id=1) values (1, 'the dog'); insert into table(select doc from mytab where id=1) values (2, 'sat on mat '); commit;
The following code sets the preferences and attributes for the NESTED_DATASTORE according to the definitions of the nested table type nt_tab
and the parent table mytab
:
begin -- create nested datastore pref ctx_ddl.create_preference('ntds','nested_datastore'); -- nest tab column in main table ctx_ddl.set_attribute('ntds','nested_column', 'doc'); -- nested table type ctx_ddl.set_attribute('mtds','nested_type', 'scott.nt_tab'); -- lineno column in nested table ctx_ddl.set_attribute('ntds','nested_lineno','lno'); --text column in nested table ctx_ddl.set_attribute('ntds','nested_text', 'ltxt'); end;
The following code creates the index using the nested table datastore:
create index myidx on mytab(dummy) -- index dummy column, not nest tab indextype is ctxsys.context parameters ('datastore ntds');
The following select statement queries the index built from a nested table:
select from mytab where contains(dummy, 'dog and mat')>0; -- should get back document 1, since it has dog in line 1 and mat in line 2.
|
![]() Copyright © 1996-2000, Oracle Corporation. All Rights Reserved. |
|