Using interMedia Text to Search and Retrieve Data from XML Documents, 22 of 22

Using interMedia Text to Search and Retrieve Data from XML Documents, 22 of 22

Frequently Asked Questions (FAQs): interMedia Text

Inserting XML data and Searching with interMedia Text

Question

Although InterMedia doesn't understand the hierarchical XML structure, can I do something like this...

<report>
<day>yesterday</day> there was a disaster <cause>hurricane</cause>
</report>

I would like to search the LOB's where cause was hurricane, is this possible?

Answer

You can perform that level of searching with the current release of interMedia. Currently to break a document up you would have to use our XML Parser with XSLT to create a stylesheet that transforms the XML into DDL. iFS gives you a higher level interface.

Another technique is to use a JDBC program to insert the text of the document or document fragment into a CLOB or LONG column, then do the searching using the CONTAINS() operator after setting up the indexes...

interMedia Text: Handling Attributes

Question

Currently interMedia Text has the option to create indexes based on the content of a section group. But most XML Elements are of the type of Element. So, the only option for searching would be attribute values. So, I am wondering if there is any way to build indexes on attribute values.

Answer

Releases from 8.1.6 and higher allow attribute indexing. See the following site: http://technet.oracle.com/products/intermedia/htdocs/text_training_816/Samples/imt_816_techover.html#SCN

CTXSYS/CTXSYS id and password

Question

We are installing the XSQL demos at http://technet.oracle.com//tech/xml/xsql_servlet/htdocs/relnotes.htm#ID3376.

At step 3, we are unable to access the database via user/password CTXSYS/CTXSYS. We cannot access this via running SQLPLUS from the web server's root directory or by accessing SQL PLUS normally. We can access either SCOTT/TIGER or SYSTEM/MANAGER via either method. Is there a step we missed somewhere where we need to add this user/permissions, etc.? If there is, can you tell us where to find the instructions on this? The error message we receive is:

ERROR:OCA-30017: error logging on to non-Oracle database[POL-5246] User does not 
exist.

We are using Oracle 8iLite, Win NT 4 SP 4.

Answer

Oracle8i Lite does not support Intermedia Text, so you can ignore this step. CTXSYS/CTXSYS is the default username/password for the Intermedia Text schema owner. The demos are designed to run against a regular Oracle8i database, so you may encounter other problems running them with Oracle8i Lite. The Servlet works fine for reading data out of Oracle8i Lite, however, it's just that the demos are not targeting the "lite" version.

Querying an XML Document

Question

I know that an intact XML documents are stored in a CLOB or BLOB with ORACLE XML solution.

XML documents stored in a CLOB/BLOB are able to be queried like table schema? For example:
```
[XML document stored in BLOB]...<name id="1111"><first>lee</first>            
<sencond>jumee</second></name>...
```
Is value(lee, jumee) able to be queried by elements, attributes and structure of XML document?
If some element or attribute is inserted/updated/deleted, all document must be updated? Or can insert/update/delete like table schema?
About locking, if someone manages an XML document stored in a BLOB/CLOB, nobody can access the same XML document? Is this true?

Answer

Using interMedia Text, you can find this document with a query like this:
```
lee within first or this:jumee within second or this:1111 within name@id
```
you can combine these like this:
```
lee within first and jumee within secondor this:(lee within first) within 
name.
```
For more information, please read the "interMedia Text Technical Overview" for 8.1.5 and 8.1.6 available on OTN.
interMedia Text indexes CLOB/BLOB, and this has no knowledge about XML specifically, so you cannot really change individual elements. You have to edit the document as a whole.
Just like any other CLOB, if someone is writing to the CLOB, they have it locked and nobody else can write to the CLOB. Other users can READ it, but not write. This is basic LOB behavior.

Another alternative is to decompose the XML document and store the information in relational fields. Then you could modify individual elements, have element-level simultaneous access, and so on. In this case, using something called the USER_DATASTORE, you can use PL/SQL to reconstitute the document to XML for text indexing. Then, you get text search as if it were XML, but data management as if it were relational data. Again, see interMedia Text Technical Overview for more information.

interMedia Text and Oracle8i

Question

Is interMedia Text included in Oracle8i? What is the name of this package? Does the package insert and search XML documents into the database?

Answer

Context Cartridge is now called interMedia Text and is part of the Oracle8i interMedia option. Details are at http://www.oracle.com/database/options/intermedia.html. interMedia Text will not help you insert XML documents into the database, only search them.

interMedia XML Indexing

Question

Is it possible for interMediaText to index XML such as:

2/7/1968

and then process a query such as:

Who has brown hair, that is, select name from person where hair.color = "BROWN"

Answer

Searches based on structural conditions are not yet available through interMedia Text. Attribute searches are supported from release 8.1.6. For reference you should not put data in attributes as that will not be compliant with XML Schema when it becomes a recommendation.

Searching CLOBs Using interMedia Text

Question

How would I define interMedia parameters so that I would be able to search my CLOB column for records that contained "aorta" and "damage". For example using the following XML (DTD implied):

WellKnownFileName.gif   echocardiogram   aorta

This is an image of the vessel damage. It would be nice to see a simple (or complicated) example of an XML interMedia implementation.

I assume there is no need to setup the ZONE or FIELDS.....Is this the case?

Answer

If you save an XML Document fragment in a CLOB, and enable an interMedia Text XML index on it, then you can do a SQL query which uses the CONTAINS() operator as the following query does:

Assume you have a document like an insurance claim...

77804
1999-01-01 00:00:00.0                     8895                           1044               
Paul               Astoria
123 Cherry Lane                  SF                  CA                  94132                                                                        
1999-01-05 00:00:00.0               7600               JCOX                                          
It was becase of Faulty Brakes

If you store the content as a document fragment in a CLOB, then you can do a query like the following (assuming everything else you store in relational tables):

REM Select the SUM of the amounts of
REM  all settlement payments approved by "JCOX" 
REM  for claims whose  relates to Brakes.
select sum(n.amount) as TotalApprovedAmount  
  from insurance_claim_view v, TABLE(v.settlements) n 
  where n.approver = 'JCOX'  
  and contains(damageReport,'Brakes within Cause') >

Managing Different XML Documents With Different DTDs: Storing and Searching XML in CLOBs -- interMedia Text

Question

It was suggested that I store XML in CLOBs and use the DOM or SAX to reparse the XML later as needed. I agree that this was the best solution for my problem (which was how to manage many different XML documents using many different DTDs in a document management system) The big problem was searching this document repository to locate relevant information.

This is where interMedia Text seems ideal. It would be nice to see an example of setting this up using intermedia in Oracle8i, demonstrating how to define the XML_SECTION_GROUP and where to use a ZONE as opposed to a FIELD etc.

For example:

How would I define Intermedia parameters so that I would be able to search my CLOB column for records that had the "aorta" and "damage" in the using the following XML (DTD implied) WellKnownFileName.gif echo cardiogram aorta This is an image of the vessel damage

Answer

You can't do XML structure-based searches with interMedia. You can search for text within a given element, but nothing more complicated than that. It also does not do attributes.You could load up each doc with the DOMParser and search that way, but that wouldn't scale very well.We are working on a project with a similar requirement. We are resorting to creating columns in the table for each bit of xml data we want to do serious searching on and loading it up from an initial XML parse. Of course that doesn't help if you need to do structured searches on arbitrary elements.

Question 2

Releases from 8.1.6 allow searching within attribute text. That's something like: dog within book@author. We are working on attribute value sensitive search, more like the following:

dog within book[@author = "Eric"]:

begin  ctx_ddl.create_section_group('mygrp','basic_section_group');
  ctx_ddl.add_field_section('mygrp','keyword','keyword');
  ctx_ddl.add_field_section('mygrp','caption','caption');
end;
create index myidx on mytab(mytxtcolumn)indextype is ctxsys.contextparameters   
('section group mygrp');
select * from mytab where contains(mytxtcolumn, 'aorta within keyword')>0;
options:

Use XML section group instead of basic section group if your tags have attributes or you need case-sensitive tag detection
Use zone sections instead of field sections if your sections overlap, or if you need to distinguish between instances. For instance, keywords. If keywords is a field section, then (aorta and echo cardiogram) within keywords finds the document. If it is a zone section, then it does not, because they are not in the SAME instance of keywords.

It is not so clear. It looks to me like his example is trying to find instances of elements containing "damage" that have a sibling element containing "aorta" within the same record. It's not clear what exactly he means by "record".

If each record equates to the in his example, and there can be multiple records in a single XML LOB, than I don't see how you could do this search with interMedia.

If there is only one per CLOB/row, than perhaps you could find it by ANDing two context element queries. But that would still be a sloppy sort of xml search relying on some expected limitations of the situation more so than the structural composition actually called for.

Answer 2

What I meant by record was the obvious thing. The whole XML example was stored in a CLOB column in a table, therefore the Record was the row in the table that contained the XML code.

interMedia Text Role (ORA-01919: role 'CTXSYS' does not exist)

Question

With reference to your documentation, Oracle8i interMedia Text Migration, Part No. A67845-01, section "Roles and Users", it says Oracle8i interMedia Text provides the two roles for system administrators and application developers as CTXSYS Role and CTXAPP Role.

But when I issue a GRANT command to grant the CTXSYS role to a user it says ORA-01919: role 'CTXSYS' does not exist. When I queried the sys.dba_roles view, it does not give CTXSYS role.

Could you please reply me and help me out to solve the problem. I have installed the Oracle8I from the free CD shipped be you.

Answer

You might not have the interMedia Text installed? Did you take the "starter" database, or create one from scratch? If the latter, did you select interMedia Text during the install?

Searching XML Documents and Returning a Zone

Question

I need to store a large XML file in Oracle8i, search it, and return a specific tagged area. I have not found a clear way to store a large XML file, index it, allow searching on it AND return Tagged sections from it based on a search.Using interMedia Text some of this is possible:

I can store an XML file in a CLOB field
I can index it with ctxsys.context
I can create <Zones> and <Fields> to represent the Tags in my XML fileEx. ctx_ddl.add_zone_section(xmlgroup,"dublincore", dc);
I can search for text within a Zone or fieldEx. Select title from mytable where CONTAINS(textField,"some words WITHIN dublincore")

What I need to know is how do I return a zone or a field based on a text search?

Answer

interMedia Text will only return the "hits". You will need to subsequently parse the CLOB to extract a section.

Storing an XML Document in CLOB: Using interMedia Text

Question

I need to store XML files(that are present on the file system as of now) into the database. I want to store the whole document. What I mean is that I do not want to break the document as per the tags and then store the info in separate tables/fields. Rather I want that I should have a universal table, that I can use to store different XML documents. I think internally it will be stored in a CLOB type of field in my case. My XML files will always contain ASCII data.

Can this be done using interMedia. Should we be using interMedia Text or interMedia Annotator for this? I downloaded Annotator from OTN, but I could not store XML document in the database.

I am trying to store XML document into CLOB column. Basically I have one table with the following definition(shown in red color below):

CREATE TABLE xml_store_testing
(
    xml_doc_id  NUMBER,
    xml_doc     CLOB  )

I want to store my XML document in xml_doc field.

I have written another PL/SQL procedure shown below, to read the contents of the XML Document. The XML document is available on the file system. XML document contains just ASCII data - no binary data.

CREATE OR REPLACE PROCEDURE FileExec
(
  p_Directory      IN VARCHAR2, 
  p_FileName       IN VARCHAR2)  
  AS   v_CLOBLocator  CLOB;
       v_FileLocator       BFILE;
BEGIN
    SELECT  xml_doc
    INTO      v_CLOBLocator
    FROM    xml_store_testing
    WHERE  xml_doc_id = 1
    FOR        UPDATE;
    v_FileLocator := BFILENAME(p_Directory, p_FileName);
    DBMS_LOB.FILEOPEN(v_FileLocator, DBMS_LOB.FILE_READONLY);
    dbms_output.put_line(to_char(DBMS_LOB.GETLENGTH(v_FileLocator)));
      DBMS_LOB.LOADFROMFILE(v_CLOBLocator, v_FileLocator,
      DBMS_LOB.GETLENGTH(v_FileLocator));
      DBMS_LOB.FILECLOSE(v_FileLocator);
END FileExec;

Answer

Put the XML documents into your CLOB column, then add an interMedia Text index on it using the XML section-group. See the documentation and overview material at http://technet.oracle.com/products/intermedia.

Question 2

When I execute this procedure, it executes successfully. But when I select from the table I see unknown characters in the table in CLOB field. Could this be because of the reason of the character set difference between operating system (where XML file resides) and database (where CLOB data resides).

Answer 2

Yes. If the character sets are different then you probably have to pass the data through UTL_RAW.CONVERT to do a character set conversion before writing to the CLOB.

Loading XML Documents into the Database and Searching with interMedia Text

Question

How do I insert XML documents into a database?

Specifically I need to insert the XML document "as is" in column of datatype CLOB into a table.

Answer

Oracle's XML-SQL Utility for Java offers a command-line utility that can be used for Loading XML data. More information can be found on the XML-SQL Utility at: http://technet.oracle.com/tech/xml

You can insert the XML documents as you would any text file. There is nothing special about an XML-formatted file from a CLOB perspective.

Question 2

I understand that Oracle interMedia Text can be used to index and search XML stored in CLOBs. Is this true? Any advice on how to get started with this?

Answer 2

Prior versions of interMedia Text only allowed tag-based searching. The current version, Release 3 (8.1.7) of Oracle8i, allows for XML structure and attribute based searching. There is documentation on how to have the index built and the SQL usage in the Oracle8i interMedia documentation.

See Also:
Oracle8i interMedia Text Reference.

Searching XML with WITHIN Operator

Question

I have this xml:

<person>
  <name>efrat</name> 
 <childrens> 
     <child> 
         <id>1</id> 
         <name>keren</name> 
     </child>
  </childrens>
</person>

How do I find the person who has a child name keren but not the person's name keren? Assuming I defined every tag with the add_zone_section that can be nested and can include themselves.

Answer

Use selectSingleNode or selectNodes with XPATH string as a parameter.eg. selectSingleNode("//child/name[.='keren'])Also, I recommend making id as an attribute instead of a tag.

interMedia Text and XML

Question

Where can I get good samples of using XML with interMedia.

Answer

See the following sites for more information:

http://technet.oracle.com/sample_code/products/intermedia/htdocs/text_samples/imt_815_techover.html and

http://technet.oracle.com/sample_code/products/intermedia/htdocs/text_samples/imt_816_techover.html

There's also some new interMedia utilities and add-ons that could help you at: http://technet.oracle.com/software/products/intermedia/software_index.htm

More XML samples ave been added to the 8.1.6 interMedia Text Doc, Oracle8i interMedia Text Reference.

interMedia Text and XML: Add_field_section

Question

Regarding XML with interMedia Text: Is there a way to feed an XML document into interMedia Text and have it recognize the tags, or do I have to use the add_field_section command for each tag in the XML document. My XML documents have hundreds of tags. Is there an easy way to do this?

Answer

Which version of the database are you using? I believe you need to do it for 8.1.5 but not 8.1.6.

You can use AUTO_SECTION_GROUP in 8.1.6

interMedia and XML Support

Question

Is there someone out there who can provide some real world examples of performing this simple task. I have an XML document that I want to feed into Oracle8i and search by content using tags. My XML document has over 100 tags, do I have to sit their and do an ADD_FIELD_SECTION for every tag....If not, where is this documented.

Answer

XSQL Servlet ships with a complete (albeit simple from the interMedia standpoint) example of a SQL script that creates a complex XML Datagram out of Object Types, and then creates an interMedia Text index on the XML Document Fragment stored in the "Insurance Claim" type.

If you download the XSQL Servlet, and look at the file ./xsql/demo/insclaim.sql you'll be able to see the interMedia stuff at the bottom of the file. One of the key new features in interMedia in 8.1.6 as outlined in one of the URL I posted in my previous reply is the AUTO Sectioner for XML. In 8.1.5, you do have to manually created your field sections.

Question 2

Is there a "Hello World" sample available anywhere? I am getting a javascript error on the XSQL servlet download page.

Answer 2

What follows is the content of the aforementioned demo file. It sets up the tables, types, and object views for the XSQL Insurance Claim Demo that you try live on our OTN demo site at http://technet.oracle.com/tech/xml/demo/demo1.htm

The interMedia Text-related part starts at the line that reads:

ctx_ddl.drop_preference()

In this example, an insurance claim has a "DamageReport" which is an XML Document fragment. The interMedia code at the end shows how to setup an XML searching index on the <CAUSE> and <MOTIVE> tags in this "DamageReport" document fragment.

set scan offset echo onset termout onREMREM $Author: smuench $REM $Date: 
1999/11/27 14:48:10 $REM $Source: C:\\cvsroot/xsql/src/demo/insclaim.sql,v $REM 
$Revision: 1.3 $REMdrop synonym claim;drop table settlement_payments;drop view 
insurance_claim_view;drop table insurance_claim;drop view policy_view;drop table 
policy;drop view policyholder_view;
drop table policyholder;drop type insurance_claim_t;
drop type settlements_t;
drop type payment;
drop type policy_t;
drop type policyholder_t;
drop type address_t;
create type address_t as object( Street varchar2(80), City Varchar2(80), State 
VARCHAR2(80),Zip NUMBER );
./create type policyholder_t as object( CustomerId number,
   FirstName varchar2(80),
   LastName varchar2(80),
   HomeAddress address_t);
./create type policy_t as object(
  policyID number,  primaryinsured policyholder_t);
./create type payment as object(
  PayDate DATE,  Amount NUMBER,  Approver VARCHAR2(8));
./create type settlements_t as table of payment;
./create type insurance_claim_t as object (
  claimid number,filed   date,  claimpolicy policy_t,
  settlements settlements_t,  damageReport varchar2(4000) /* XML */);
./create table policyholder( CustomerId number,
  FirstName varchar2(80),			   LastName varchar2(80),
  HomeAddress address_t,  constraint policyholder_pk primary key (customerid));
insert into policyholder values ( 1044, 'Paul','Astoria',
  ADDRESS_T('123 Cherry Lane','SF','CA','94132'));
insert into policyholder values ( 1045, 'Martina','Boyle',
  ADDRESS_T('55 Belden Place','SF','CA','94102'));
create or replace force view policyholder_view of policyholder_t
        with object OID 
...
...
create or replace force view insurance_claim_view of insurance_claim_t        
with object OID (claimid)
    as select c.claimid,c.filed,
       (SELECT value(pv)
        from policy_view pv 
        WHERE pv.policyid = c.claimpolicy),
        CAST(MULTISET(SELECT PAYMENT(sp.paydate,sp.amount,sp.approver) 
            as Payment from settlement_payments sp
            WHERE sp.claimid = c.claimid) AS settlements_t),c.damagereport
        from insurance_claim c;commit;
begin  ctx_ddl.drop_preference('Demo');
end;
/begin  ctx_ddl.create_preference('Demo', 'basic_lexer');  
ctx_ddl.set_attribute ('Demo', 'index_themes', '0');
ctx_ddl.set_attribute ('Demo', 'index_text', '1');  
ctx_ddl.create_section_group('demo_xml', 'xml_section_group');  
ctx_ddl.add_zone_section('demo_xml', 'CAUSE', 'CAUSE');  
ctx_ddl.add_zone_section('demo_xml', 'MOTIVE', 'MOTIVE');
end;
/create index 
ctx_xml_i on insurance_claim(damagereport)indextype is 
ctxsys.contextparameters('LEXER Demo SECTION GROUP demo_xml');
create synonym claim for insurance_claim_view;

Oracle8i Lite 4.0.0.2.0: interMedia Text is Not Supported

Question

I cannot initialize the database and run the SQL scripts for the demo programs:

I cannot connect as CTXSYS/CTXSYS. What is the source of the interMedia Text packages? Should user CTXSYS exist in a default installation?
I get a syntax error when attempting to execute GRANT QUERY REWRITE TO SCOTT.
I also get syntax errors when running some of the scripts, such as, airport.sql, although others, such as, index.sql, complete fine. I'm running a fresh installation of Oracle8i Lite 4.0.0.2.0.

Answer

interMedia Text is a feature of Oracle8i. If you're using Oracle8i Lite, then it is not available.
QUERY REWRITE is a privilege supported in Oracle8i, not Oracle8i Lite. So this failure is also explainable.
AIRPORT.SQL fails on the very last statement which is creating a FUNCTIONAL INDEX on UPPER(description). Functional indexes are an Oracle8i feature not available in 8i Lite. Also, if doing a fresh install, you will get some error messages during SQL script execution. If you look at the script, you'll find that it tries to delete a table before creating it. On a fresh install, these tables will not exist, so you'll get an error.

SQL in interMedia context

Question

I have an XML document that I have stored in CLOB. I have also created the indexes on the tags using section_group, and so on. One of the tags is <SALARY> </SALARY> I want to write an SQL statement so as to select all the records that have salary lets say > 5000.

How do I do this? I cannot use WITHIN operator. I want to interpret the value present in this tag as a number. This could be floating point number also since this is salary.

Answer

You can't do this in interMedia Text. Range search is not really a text operation. The best solution is to use the other Oracle XML parsing utilities to extract the salary into a NUMBER field -- then you can use interMedia Text for text searching, and normal SQL operators for the more structured fields, and achieve the same results.

XML and interMedia Text

Question

We are storing all our documents in XML format in a CLOB. Are there utilities available in Oracle perhaps interMedia to retrieve the contents a field at a time, that is given a field name, get the text between tags, as opposed to retrieving the whole document and traversing the structure?

Answer

interMedia does not do section extraction. See XM-SQL Utility for this.

Creating an Index on Three Columns?

Question

I have created a view based on 7-8 tables and it has columns like, custordnumber, product_dscr, qty, prdid,shipdate, ship_status, and so on. I need to create an interMedia index on the three columns:

custordnumber
product_dsc
ship_status

Is there a way to create a text index on these columns?

Answer

The short answer is yes. You have two options:

Use the USER_DATASTORE object to create a concatenated field on the fly during indexing;
Concatenate your fields and store them in an extra CLOB field in one of your tables. Then create the index on the CLOB field. If you're using Oracle 8.1.6, then you also have the option of placing XML tags around each field prior to concatenation. This gives you the capability of searching WITHIN each field.

Searching Structured and Unstructured Data

Question

We need to insert data in the Database from an XML file. Currently we only can insert structured data with the table already created. Is this true?

We are working in a law project where we need to store laws that have structured data and unstructured data, and then search the data using interMedia text.

Can we insert unstructured data too? Or do we need to develop a custom application to do it? Then if we have the data stored with some structured parts and some unstructured parts, can we use interMedia Text to search it?

If we stored the unstructured part in a CLOB, and the CLOB has some tags, how can we search only the data in an specific tag?

Answer

Consider using iFS which allows you to break up a document storing it across tables and in a LOB. Currently interMedia Text can perform data searches with tags but is not knowledgeable about the hierachical XML structure. From release 8.1.6, interMedia Text has this capability along with name/value pair attribute searches.

Question 2

So, if I understand your answer this document breaking is not possible in these moments if I don't create a custom development? Although interMedia does not understand hierachical XML structure, can I do something like this?

<report>
   <day>yesterday</day> there was a disaster <cause>hurricane</cause>
</report>

Indexing with interMedia I would like to search the LOBs where cause was hurricane, is this possible?

Answer 2

You can perform that level of searching with the current release of interMedia Text. Currently to break a document up you would have to use the XML Parser with XSL-T to create a stylesheet that transforms the XML into DDL. iFS gives you a higher level interface.