Oracle8i JDBC Developer's Guide and Reference Release 3 (8.1.7) Part Number A83724-01 |
|
After a brief overview, this section covers the following topics:
Oracle's JDBC drivers support NLS (National Language Support). NLS lets you retrieve data or insert data into a database in any character set that Oracle supports. If the clients and the server use different character sets, then the driver provides the support to perform the conversions between the database character set and the client character set.
For more information on NLS, NLS environment variables, and the character sets that Oracle supports, see the Oracle8i National Language Support Guide. See the Oracle8i Reference for more information on the database character set and how it is created.
Here are a few examples of commonly used Java methods for JDBC that rely heavily on NLS character set conversion:
java.sql.ResultSet
methods getString()
and getUnicodeStream()
return values from the database as Java strings and as a stream of Unicode characters, respectively.
oracle.sql.CLOB
method getCharacterStream()
returns the contents of a CLOB
as a Unicode stream.
oracle.sql.CHAR
methods getString()
, toString()
, and getStringWithReplacement()
convert the following data to strings:
getString()
: This converts the sequence of characters represented by the CHAR
object to a string and returns a Java String
object.
toString()
: This is identical to getString()
, but if the character set is not recognized, then toString()
returns a hexadecimal representation of the CHAR
data.
getStringWithReplacement()
: This is identical to getString()
, except characters that have no Unicode representation in the character set of this CHAR
object are replaced by a default replacement character.
The techniques that the Oracle JDBC drivers use to perform character set conversion for Java applications depend on the character set the database uses. The simplest case is where the database uses the US7ASCII
or WE8ISO8859P1
character set. In this case, the driver converts the data directly from the database character set to UCS-2
, which is used in Java applications, and vice versa.
If you are working with databases that employ a non-US7ASCII
or non-WE8ISO8859P1
character set (for example, Japanese or Korean), then the driver converts the data first to UTF-8
(this step does not apply to the server-side internal driver), then to UCS-2
. For example, the driver always converts CHAR
and VARCHAR2
data in a non-US7ASCII
, non-WE8ISO8859P1
character set. It does not convert RAW
data.
If you are using the JDBC OCI driver, then NLS is handled as in any other Oracle client situation. The client character set, language, and territory settings are in the NLS_LANG
environment variable, which is set at client-installation time.
Note that there are also server-side settings for these parameters, determined during database creation. So, when performing character set conversion, the JDBC OCI driver has to take three factors into consideration:
UCS-2
The JDBC OCI driver transfers the data from the server to the client in the character set of the database. Depending on the value of the NLS_LANG
environment variable, the driver handles character set conversions in one of two ways:
NLS_LANG
is not specified, or specifies the US7ASCII
or WE8ISO8859P1
character set, then the JDBC OCI driver uses Java to convert the character set from US7ASCII
or WE8ISO8859P1
directly to UCS-2
, or the reverse.
or:
NLS_LANG
specifies a non-US7ASCII
or non-WE8ISO8859P1
character set, then the driver changes the value of the NLS_LANG
parameter on the client to UTF-8
. This happens automatically and does not require any user-intervention. OCI uses the NLS_LANG
setting in converting the data from the database character set to UTF-8
; the JDBC driver then converts the UTF-8
data to UCS-2
.
If you are using the JDBC Thin driver, then there will presumably be no Oracle client installation. NLS conversions must be handled differently.
The Thin driver obtains language and territory settings (NLS_LANGUAGE
and NLS_TERRITORY
) from the Java locale in the JVM user.language
property. The date format (NLS_DATE_FORMAT
) is set according to the territory setting.
If the database character set is US7ASCII
or WE8ISO8859P1
, then the data is transferred to the client without any conversion. The driver then converts the character set to UCS-2
in Java.
If the database character set is something other than US7ASCII
or WE8ISO8859P1
, then the server first translates the data to UTF-8
before transferring it to the client. On the client, the JDBC Thin driver converts the data to UCS-2
in Java.
If your JDBC code running in the server accesses the database, then the JDBC server-side internal driver performs a character set conversion based on the database character set. The target character set of all Java programs is UCS-2
.
The Oracle JDBC class files, classes12.zip
and classes111.zip
, provide NLS support for the Thin and OCI drivers. The files contain all the necessary classes to provide complete NLS support for all Oracle character sets for CHAR
, VARCHAR
, LONGVARCHAR
, and CLOB
type data not retrieved or inserted as part of an Oracle object or collection type.
However, in the case of the CHAR
and VARCHAR
data portion of Oracle objects and collections, the JDBC class files provide support for only these commonly used character sets:
To provide support for all NLS character sets, the Oracle 8i JDBC driver installation includes two additional files: nls_charset12.zip
for JDK 1.2.x and nls_charset11.zip
for JDK 1.1.x. The OCI and Thin drivers require these files to support all Oracle characters sets for CHAR
and VARCHAR
data in Oracle object types and collections. To obtain this support, you must add the appropriate nls_charset*.zip
file to your CLASSPATH
.
It is important to note that the nls_charset*.zip
files are very large, because they must support a large number of character sets. To save space, you might want to keep only the classes you need from the nls_charset*.zip
file. If you want to do this, follow these steps:
nls_charset*.zip
file.
CLASSPATH
.
The character set extension class files are named in the following format:
CharacterConverter<OracleCharacterSetId
>.class
where <OracleCharacterSetId
> is the hexadecimal representation of the Oracle character set ID that corresponds to a character set name.
If the database character set is neither ASCII
(US7ASCII
) nor ISO-LATIN-1
(WE8ISO8859P1
), then the Thin driver must impose size restrictions for CHAR
and VARCHAR2
bind parameters that are more restrictive than normal database size limitations. This is necessary to allow for data expansion during conversion.
The Thin driver checks CHAR
or VARCHAR2
bind sizes when the setXXX()
method is called. If the data size exceeds the size restriction, then the driver throws a SQL exception (ORA-17070 "Data size bigger than max size for this type") from the setXXX()
call. This limitation is necessary to avoid the chance of data corruption whenever an NLS conversion occurs and increases the length of the data. This limitation is enforced when you are doing all the following:
CHAR
or VARCHAR2
datatypes
ASCII
(US7ASCII
) nor ISO-Latin-1
(WE8ISO8859P1
)
As previously discussed, when the database character set is neither US7ASCII
nor WE8ISO8859P1
, the Thin driver converts Java UCS-2
characters to UTF-8
encoding bytes for CHAR
or VARCHAR2
binds. The UTF-8
encoding bytes are then transferred to the database, and the database converts the UTF-8
encoding bytes to the database character set encoding.
This conversion to the character set encoding might result in a size increase. The NLS ratio for a database character set indicates the maximum possible expansion in converting from UTF-8
to the character set:
NLS ratio = (maximum possible value of) [(size in database character set) / (size in UTF-8)]
Table 18-1 shows the database size limitations for CHAR
and VARCHAR2
data, and the Thin driver size restriction formulas for CHAR
and VARCHAR2
binds. Database limits are in bytes. Formulas determine the maximum size of the UTF-8
encoding, in bytes.
The formulas guarantee that after the data is converted from UTF-8
to the database character set, the size will not exceed the database maximum size.
The number of UCS-2
characters that can be supported is determined by the number of bytes per character in the data. All ASCII
characters are one byte long in UTF-8
encoding. Other character types can be two or three bytes long.
Table 18-2 lists the NLS ratios of some common server character sets, then shows the Thin driver maximum bind sizes for CHAR
and VARCHAR2
data for each character set, as determined by using the NLS ratio in the appropriate formula.
Again, maximum bind sizes are for UTF-8
encoding, in bytes.
Server Character Set | NLS Ratio | Thin Driver Max VARCHAR2 Bind Size (UTF-8 bytes) | Thin Driver Max CHAR Bind Size (UTF-8 bytes) |
---|---|---|---|
|
1 |
4000 |
2000 |
|
2 |
2000 |
2000 |
|
3 |
1333 |
1333 |
|
![]() Copyright © 1996-2000, Oracle Corporation. All Rights Reserved. |
|