Oracle8i Application Developer's Guide - XML Release 3 (8.1.7) Part Number A86030-01 |
|
Using interMedia Text to Search and Retrieve Data from XML Documents, 22 of 22
Although InterMedia doesn't understand the hierarchical XML structure, can I do something like this...
<report> <day>yesterday</day> there was a disaster <cause>hurricane</cause> </report>
I would like to search the LOB's where cause was hurricane, is this possible?
You can perform that level of searching with the current release of interMedia. Currently to break a document up you would have to use our XML Parser with XSLT to create a stylesheet that transforms the XML into DDL. iFS gives you a higher level interface.
Another technique is to use a JDBC program to insert the text of the document or document fragment into a CLOB or LONG column, then do the searching using the CONTAINS() operator after setting up the indexes...
Currently interMedia Text has the option to create indexes based on the content of a section group. But most XML Elements are of the type of Element. So, the only option for searching would be attribute values. So, I am wondering if there is any way to build indexes on attribute values.
Releases from 8.1.6 and higher allow attribute indexing. See the following site: http://technet.oracle.com/products/intermedia/htdocs/text_training_816/Samples/imt_816_techover.html#SCN
We are installing the XSQL demos at http://technet.oracle.com//tech/xml/xsql_servlet/htdocs/relnotes.htm#ID3376.
At step 3, we are unable to access the database via user/password CTXSYS/CTXSYS. We cannot access this via running SQLPLUS from the web server's root directory or by accessing SQL PLUS normally. We can access either SCOTT/TIGER or SYSTEM/MANAGER via either method. Is there a step we missed somewhere where we need to add this user/permissions, etc.? If there is, can you tell us where to find the instructions on this? The error message we receive is:
ERROR:OCA-30017: error logging on to non-Oracle database[POL-5246] User does not exist.
We are using Oracle 8iLite, Win NT 4 SP 4.
Oracle8i Lite does not support Intermedia Text, so you can ignore this step. CTXSYS/CTXSYS is the default username/password for the Intermedia Text schema owner. The demos are designed to run against a regular Oracle8i database, so you may encounter other problems running them with Oracle8i Lite. The Servlet works fine for reading data out of Oracle8i Lite, however, it's just that the demos are not targeting the "lite" version.
I know that an intact XML documents are stored in a CLOB or BLOB with ORACLE XML solution.
[XML document stored in BLOB]...<name id="1111"><first>lee</first> <sencond>jumee</second></name>...
Is value(lee, jumee) able to be queried by elements, attributes and structure of XML document?
lee within first or this:jumee within second or this:1111 within name@id
you can combine these like this:
lee within first and jumee within secondor this:(lee within first) within name.
For more information, please read the "interMedia Text Technical Overview" for 8.1.5 and 8.1.6 available on OTN.
Another alternative is to decompose the XML document and store the information in relational fields. Then you could modify individual elements, have element-level simultaneous access, and so on. In this case, using something called the USER_DATASTORE, you can use PL/SQL to reconstitute the document to XML for text indexing. Then, you get text search as if it were XML, but data management as if it were relational data. Again, see interMedia Text Technical Overview for more information.
Is interMedia Text included in Oracle8i? What is the name of this package? Does the package insert and search XML documents into the database?
Context Cartridge is now called interMedia Text and is part of the Oracle8i interMedia option. Details are at http://www.oracle.com/database/options/intermedia.html. interMedia Text will not help you insert XML documents into the database, only search them.
Is it possible for interMediaText to index XML such as:
2/7/1968
and then process a query such as:
Who has brown hair, that is, select name from person where hair.color = "BROWN"
Searches based on structural conditions are not yet available through interMedia Text. Attribute searches are supported from release 8.1.6. For reference you should not put data in attributes as that will not be compliant with XML Schema when it becomes a recommendation.
How would I define interMedia parameters so that I would be able to search my CLOB column for records that contained "aorta" and "damage". For example using the following XML (DTD implied):
WellKnownFileName.gif echocardiogram aorta
This is an image of the vessel damage. It would be nice to see a simple (or complicated) example of an XML interMedia implementation.
I assume there is no need to setup the ZONE or FIELDS.....Is this the case?
If you save an XML Document fragment in a CLOB, and enable an interMedia Text XML index on it, then you can do a SQL query which uses the CONTAINS() operator as the following query does:
Assume you have a document like an insurance claim...
77804 1999-01-01 00:00:00.0 8895 1044 Paul Astoria 123 Cherry Lane SF CA 94132 1999-01-05 00:00:00.0 7600 JCOX It was becase of Faulty Brakes
If you store the content as a document fragment in a CLOB, then you can do a query like the following (assuming everything else you store in relational tables):
REM Select the SUM of the amounts of REM all settlement payments approved by "JCOX" REM for claims whose relates to Brakes. select sum(n.amount) as TotalApprovedAmount from insurance_claim_view v, TABLE(v.settlements) n where n.approver = 'JCOX' and contains(damageReport,'Brakes within Cause') >
It was suggested that I store XML in CLOBs and use the DOM or SAX to reparse the XML later as needed. I agree that this was the best solution for my problem (which was how to manage many different XML documents using many different DTDs in a document management system) The big problem was searching this document repository to locate relevant information.
This is where interMedia Text seems ideal. It would be nice to see an example of setting this up using intermedia in Oracle8i, demonstrating how to define the XML_SECTION_GROUP and where to use a ZONE as opposed to a FIELD etc.
For example:
How would I define Intermedia parameters so that I would be able to search my CLOB column for records that had the "aorta" and "damage" in the using the following XML (DTD implied) WellKnownFileName.gif echo cardiogram aorta This is an image of the vessel damage
You can't do XML structure-based searches with interMedia. You can search for text within a given element, but nothing more complicated than that. It also does not do attributes.You could load up each doc with the DOMParser and search that way, but that wouldn't scale very well.We are working on a project with a similar requirement. We are resorting to creating columns in the table for each bit of xml data we want to do serious searching on and loading it up from an initial XML parse. Of course that doesn't help if you need to do structured searches on arbitrary elements.
Releases from 8.1.6 allow searching within attribute text. That's something like: dog within book@author. We are working on attribute value sensitive search, more like the following:
dog within book[@author = "Eric"]:
begin ctx_ddl.create_section_group('mygrp','basic_section_group'); ctx_ddl.add_field_section('mygrp','keyword','keyword'); ctx_ddl.add_field_section('mygrp','caption','caption'); end; create index myidx on mytab(mytxtcolumn)indextype is ctxsys.contextparameters ('section group mygrp'); select * from mytab where contains(mytxtcolumn, 'aorta within keyword')>0; options:
It is not so clear. It looks to me like his example is trying to find instances of elements containing "damage" that have a sibling element containing "aorta" within the same record. It's not clear what exactly he means by "record".
If each record equates to the in his example, and there can be multiple records in a single XML LOB, than I don't see how you could do this search with interMedia.
If there is only one per CLOB/row, than perhaps you could find it by ANDing two context element queries. But that would still be a sloppy sort of xml search relying on some expected limitations of the situation more so than the structural composition actually called for.
What I meant by record was the obvious thing. The whole XML example was stored in a CLOB column in a table, therefore the Record was the row in the table that contained the XML code.
With reference to your documentation, Oracle8i interMedia Text Migration, Part No. A67845-01, section "Roles and Users", it says Oracle8i interMedia Text provides the two roles for system administrators and application developers as CTXSYS Role and CTXAPP Role.
But when I issue a GRANT command to grant the CTXSYS role to a user it says ORA-01919: role 'CTXSYS' does not exist. When I queried the sys.dba_roles view, it does not give CTXSYS role.
Could you please reply me and help me out to solve the problem. I have installed the Oracle8I from the free CD shipped be you.
You might not have the interMedia Text installed? Did you take the "starter" database, or create one from scratch? If the latter, did you select interMedia Text during the install?
I need to store a large XML file in Oracle8i, search it, and return a specific tagged area. I have not found a clear way to store a large XML file, index it, allow searching on it AND return Tagged sections from it based on a search.Using interMedia Text some of this is possible:
What I need to know is how do I return a zone or a field based on a text search?
interMedia Text will only return the "hits". You will need to subsequently parse the CLOB to extract a section.
I need to store XML files(that are present on the file system as of now) into the database. I want to store the whole document. What I mean is that I do not want to break the document as per the tags and then store the info in separate tables/fields. Rather I want that I should have a universal table, that I can use to store different XML documents. I think internally it will be stored in a CLOB type of field in my case. My XML files will always contain ASCII data.
Can this be done using interMedia. Should we be using interMedia Text or interMedia Annotator for this? I downloaded Annotator from OTN, but I could not store XML document in the database.
I am trying to store XML document into CLOB column. Basically I have one table with the following definition(shown in red color below):
CREATE TABLE xml_store_testing ( xml_doc_id NUMBER, xml_doc CLOB )
I want to store my XML document in xml_doc field.
I have written another PL/SQL procedure shown below, to read the contents of the XML Document. The XML document is available on the file system. XML document contains just ASCII data - no binary data.
CREATE OR REPLACE PROCEDURE FileExec ( p_Directory IN VARCHAR2, p_FileName IN VARCHAR2) AS v_CLOBLocator CLOB; v_FileLocator BFILE; BEGIN SELECT xml_doc INTO v_CLOBLocator FROM xml_store_testing WHERE xml_doc_id = 1 FOR UPDATE; v_FileLocator := BFILENAME(p_Directory, p_FileName); DBMS_LOB.FILEOPEN(v_FileLocator, DBMS_LOB.FILE_READONLY); dbms_output.put_line(to_char(DBMS_LOB.GETLENGTH(v_FileLocator))); DBMS_LOB.LOADFROMFILE(v_CLOBLocator, v_FileLocator, DBMS_LOB.GETLENGTH(v_FileLocator)); DBMS_LOB.FILECLOSE(v_FileLocator); END FileExec;
Put the XML documents into your CLOB column, then add an interMedia Text index on it using the XML section-group. See the documentation and overview material at http://technet.oracle.com/products/intermedia.
When I execute this procedure, it executes successfully. But when I select from the table I see unknown characters in the table in CLOB field. Could this be because of the reason of the character set difference between operating system (where XML file resides) and database (where CLOB data resides).
Yes. If the character sets are different then you probably have to pass the data through UTL_RAW.CONVERT to do a character set conversion before writing to the CLOB.
How do I insert XML documents into a database?
Specifically I need to insert the XML document "as is" in column of datatype CLOB into a table.
Oracle's XML-SQL Utility for Java offers a command-line utility that can be used for Loading XML data. More information can be found on the XML-SQL Utility at: http://technet.oracle.com/tech/xml
You can insert the XML documents as you would any text file. There is nothing special about an XML-formatted file from a CLOB perspective.
Question 2
I understand that Oracle interMedia Text can be used to index and search XML stored in CLOBs. Is this true? Any advice on how to get started with this?
Prior versions of interMedia Text only allowed tag-based searching. The current version, Release 3 (8.1.7) of Oracle8i, allows for XML structure and attribute based searching. There is documentation on how to have the index built and the SQL usage in the Oracle8i interMedia documentation.
I have this xml:
<person> <name>efrat</name> <childrens> <child> <id>1</id> <name>keren</name> </child> </childrens> </person>
How do I find the person who has a child name keren but not the person's name keren? Assuming I defined every tag with the add_zone_section that can be nested and can include themselves.
Use selectSingleNode or selectNodes with XPATH string as a parameter.eg. selectSingleNode("//child/name[.='keren'])Also, I recommend making id as an attribute instead of a tag.
Where can I get good samples of using XML with interMedia.
See the following sites for more information:
http://technet.oracle.com/sample_code/products/intermedia/htdocs/text_samples/imt_816_techover.html
There's also some new interMedia utilities and add-ons that could help you at: http://technet.oracle.com/software/products/intermedia/software_index.htm
More XML samples ave been added to the 8.1.6 interMedia Text Doc, Oracle8i interMedia Text Reference.
Regarding XML with interMedia Text: Is there a way to feed an XML document into interMedia Text and have it recognize the tags, or do I have to use the add_field_section command for each tag in the XML document. My XML documents have hundreds of tags. Is there an easy way to do this?
Which version of the database are you using? I believe you need to do it for 8.1.5 but not 8.1.6.
You can use AUTO_SECTION_GROUP in 8.1.6
Is there someone out there who can provide some real world examples of performing this simple task. I have an XML document that I want to feed into Oracle8i and search by content using tags. My XML document has over 100 tags, do I have to sit their and do an ADD_FIELD_SECTION for every tag....If not, where is this documented.
XSQL Servlet ships with a complete (albeit simple from the interMedia standpoint) example of a SQL script that creates a complex XML Datagram out of Object Types, and then creates an interMedia Text index on the XML Document Fragment stored in the "Insurance Claim" type.
If you download the XSQL Servlet, and look at the file ./xsql/demo/insclaim.sql you'll be able to see the interMedia stuff at the bottom of the file. One of the key new features in interMedia in 8.1.6 as outlined in one of the URL I posted in my previous reply is the AUTO Sectioner for XML. In 8.1.5, you do have to manually created your field sections.
Is there a "Hello World" sample available anywhere? I am getting a javascript error on the XSQL servlet download page.
What follows is the content of the aforementioned demo file. It sets up the tables, types, and object views for the XSQL Insurance Claim Demo that you try live on our OTN demo site at http://technet.oracle.com/tech/xml/demo/demo1.htm
The interMedia Text-related part starts at the line that reads:
ctx_ddl.drop_preference()
In this example, an insurance claim has a "DamageReport" which is an XML Document fragment. The interMedia code at the end shows how to setup an XML searching index on the <CAUSE> and <MOTIVE> tags in this "DamageReport" document fragment.
set scan offset echo onset termout onREMREM $Author: smuench $REM $Date: 1999/11/27 14:48:10 $REM $Source: C:\\cvsroot/xsql/src/demo/insclaim.sql,v $REM $Revision: 1.3 $REMdrop synonym claim;drop table settlement_payments;drop view insurance_claim_view;drop table insurance_claim;drop view policy_view;drop table policy;drop view policyholder_view; drop table policyholder;drop type insurance_claim_t; drop type settlements_t; drop type payment; drop type policy_t; drop type policyholder_t; drop type address_t; create type address_t as object( Street varchar2(80), City Varchar2(80), State VARCHAR2(80),Zip NUMBER ); ./create type policyholder_t as object( CustomerId number, FirstName varchar2(80), LastName varchar2(80), HomeAddress address_t); ./create type policy_t as object( policyID number, primaryinsured policyholder_t); ./create type payment as object( PayDate DATE, Amount NUMBER, Approver VARCHAR2(8)); ./create type settlements_t as table of payment; ./create type insurance_claim_t as object ( claimid number,filed date, claimpolicy policy_t, settlements settlements_t, damageReport varchar2(4000) /* XML */); ./create table policyholder( CustomerId number, FirstName varchar2(80), LastName varchar2(80), HomeAddress address_t, constraint policyholder_pk primary key (customerid)); insert into policyholder values ( 1044, 'Paul','Astoria', ADDRESS_T('123 Cherry Lane','SF','CA','94132')); insert into policyholder values ( 1045, 'Martina','Boyle', ADDRESS_T('55 Belden Place','SF','CA','94102')); create or replace force view policyholder_view of policyholder_t with object OID ... ... create or replace force view insurance_claim_view of insurance_claim_t with object OID (claimid) as select c.claimid,c.filed, (SELECT value(pv) from policy_view pv WHERE pv.policyid = c.claimpolicy), CAST(MULTISET(SELECT PAYMENT(sp.paydate,sp.amount,sp.approver) as Payment from settlement_payments sp WHERE sp.claimid = c.claimid) AS settlements_t),c.damagereport from insurance_claim c;commit; begin ctx_ddl.drop_preference('Demo'); end; /begin ctx_ddl.create_preference('Demo', 'basic_lexer'); ctx_ddl.set_attribute ('Demo', 'index_themes', '0'); ctx_ddl.set_attribute ('Demo', 'index_text', '1'); ctx_ddl.create_section_group('demo_xml', 'xml_section_group'); ctx_ddl.add_zone_section('demo_xml', 'CAUSE', 'CAUSE'); ctx_ddl.add_zone_section('demo_xml', 'MOTIVE', 'MOTIVE'); end; /create index ctx_xml_i on insurance_claim(damagereport)indextype is ctxsys.contextparameters('LEXER Demo SECTION GROUP demo_xml'); create synonym claim for insurance_claim_view;
I cannot initialize the database and run the SQL scripts for the demo programs:
I have an XML document that I have stored in CLOB. I have also created the indexes on the tags using section_group, and so on. One of the tags is <SALARY> </SALARY> I want to write an SQL statement so as to select all the records that have salary lets say > 5000.
How do I do this? I cannot use WITHIN operator. I want to interpret the value present in this tag as a number. This could be floating point number also since this is salary.
You can't do this in interMedia Text. Range search is not really a text operation. The best solution is to use the other Oracle XML parsing utilities to extract the salary into a NUMBER field -- then you can use interMedia Text for text searching, and normal SQL operators for the more structured fields, and achieve the same results.
We are storing all our documents in XML format in a CLOB. Are there utilities available in Oracle perhaps interMedia to retrieve the contents a field at a time, that is given a field name, get the text between tags, as opposed to retrieving the whole document and traversing the structure?
interMedia does not do section extraction. See XM-SQL Utility for this.
I have created a view based on 7-8 tables and it has columns like, custordnumber, product_dscr, qty, prdid,shipdate, ship_status, and so on. I need to create an interMedia index on the three columns:
Is there a way to create a text index on these columns?
The short answer is yes. You have two options:
We need to insert data in the Database from an XML file. Currently we only can insert structured data with the table already created. Is this true?
We are working in a law project where we need to store laws that have structured data and unstructured data, and then search the data using interMedia text.
Can we insert unstructured data too? Or do we need to develop a custom application to do it? Then if we have the data stored with some structured parts and some unstructured parts, can we use interMedia Text to search it?
If we stored the unstructured part in a CLOB, and the CLOB has some tags, how can we search only the data in an specific tag?
Consider using iFS which allows you to break up a document storing it across tables and in a LOB. Currently interMedia Text can perform data searches with tags but is not knowledgeable about the hierachical XML structure. From release 8.1.6, interMedia Text has this capability along with name/value pair attribute searches.
So, if I understand your answer this document breaking is not possible in these moments if I don't create a custom development? Although interMedia does not understand hierachical XML structure, can I do something like this?
<report> <day>yesterday</day> there was a disaster <cause>hurricane</cause> </report>
Indexing with interMedia I would like to search the LOBs where cause was hurricane, is this possible?
You can perform that level of searching with the current release of interMedia Text. Currently to break a document up you would have to use the XML Parser with XSL-T to create a stylesheet that transforms the XML into DDL. iFS gives you a higher level interface.
Another technique is to use a JDBC program to insert the text of the document or document fragment into a CLOB or LONG column, then do the searching using the CONTAINS() operator after setting up the indexes.
|
![]() Copyright © 1996-2000, Oracle Corporation. All Rights Reserved. |
|