Query Operators, 11 of 26

NEAR (;)

Use the NEAR operator to return a score based on the proximity of two or more query terms. Oracle returns higher scores for terms closer together and lower scores for terms farther apart in a document.

Note:
The NEAR operator works with only word queries. You cannot use NEAR in ABOUT queries.

Syntax

Syntax

NEAR((word1, word2,..., wordn) [, max_span [, order]])

Syntax
NEAR((word1, word2,..., wordn) [, max_span [, order]])

word1-n

Specify the terms in the query separated by commas. The query terms can be single words or phrases.

max_span

Optionally specify the size of the biggest clump. The default is 100. Oracle returns an error if you specify a number greater than 100.

A clump is the smallest group of words in which all query terms occur. All clumps begin and end with a query term.

For near queries with two terms, max_span is the maximum distance allowed between the two terms. For example, to query on dog and cat where dog is within 6 words of cat, issue the following query:

'near((dog, cat), 6)'

order

Specify TRUE for Oracle to search for terms in the order you specify. The default is FALSE.

For example, to search for the words monday, tuesday, and wednesday in that order with a maximum clump size of 20, issue the following query:

'near((monday, tuesday, wednesday), 20, TRUE)'

Note:
To specify order, you must always specify a number for the max_span parameter.

Oracle might return different scores for the same document when you use identical query expressions that have the order flag set differently. For example, Oracle might return different scores for the same document when you issue the following queries:

'near((dog, cat), 50, FALSE)'
'near((dog, cat), 50, TRUE)'

NEAR Scoring

The scoring for the NEAR operator combines frequency of the terms with proximity of terms. For each document that satisfies the query, Oracle returns a score between 1 and 100 that is proportional to the number of clumps in the document and inversely proportional to the average size of the clumps. This means many small clumps in a document result in higher scores, since small clumps imply closeness of terms.

The number of terms in a query also affects score. Queries with many terms, such as seven, generally need fewer clumps in a document to score 100 than do queries with few terms, such as two.

A clump is the smallest group of words in which all query terms occur. All clumps begin and end with a query term. You can define clump size with the max_span parameter as described in this section.

NEAR with Other Operators

You can use the NEAR operator with other operators such as AND and OR. Scores are calculated in the regular way.

For example, to find all documents that contain the terms tiger, lion, and cheetah where the terms lion and tiger are within 10 words of each other, issue the following query:

'near((lion, tiger), 10) AND cheetah'

The score returned for each document is the lower score of the near operator and the term cheetah.

You can also use the equivalence operator to substitute a single term in a near query:

'near((stock crash, Japan=Korea), 20)'

This query asks for all documents that contain the phrase stock crash within twenty words of Japan or Korea.

Backward Compatibility NEAR Syntax

You can write near queries using the syntax of previous ConText releases. For example, to find all documents where lion occurs near tiger, you can write:

'lion near tiger'

or with the semi-colon as follows:

'lion;tiger'

This query is equivalent to the following query:

'near((lion, tiger), 100, FALSE)'

Note:
Only the syntax of the NEAR operator is backward compatible. In the example above, the score returned is calculated using the clump method as described in this section.

Highlighting with the NEAR Operator

When you use highlighting and your query contains the near operator, all occurrences of all terms in the query that satisfy the proximity requirements are highlighted. Highlighted terms can be single words or phrases.

For example, assume a document contains the following text:

Chocolate and vanilla are my favorite ice cream flavors.  I like chocolate 
served in a waffle cone, and vanilla served in a cup with carmel syrup.

If the query is near((chocolate, vanilla)), 100, FALSE), the following is highlighted:

 <<Chocolate>> and <<vanilla>> are my favorite ice cream flavors.  I like 
<<chocolate>> served in a waffle cone, and <<vanilla>> served served in a cup 
with carmel syrup.

However, if the query is near((chocolate, vanilla)), 4, FALSE), only the following is highlighted:

 <<Chocolate>> and <<vanilla>> are my favorite ice cream flavors.  I like 
chocolate served in a waffle cone, and vanilla served in a cup with carmel 
syrup.

See Also:
For more information about the procedures you can use for highlighting, see Chapter 8, "CTX_DOC Package".

Section Searching and NEAR

You can use the NEAR operator with the WITHIN operator for section searching as follows:

'near((dog, cat), 10) WITHIN Headings'

When evaluating expressions such as these, Oracle looks for clumps that lie entirely within the given section.

In the example above, only those clumps that contain dog and cat that lie entirely within the section Headings are counted. That is, if the term dog lies within Headings and the term cat lies five words from dog, but outside of Headings, this pair of words does not satisfy the expression and is not counted.