3
New Character Set Scanner Utility

This chapter introduces the Character Set Scanner Utility, a National Language Support utility in release 8.1.7 for checking data before migrating character sets. The topics in this chapter include:

Overview of Choosing and Migrating Character Sets

Choosing the appropriate database character set for your database is an important decision and requires taking into account many factors. Some of these factors are:

The type of data you need to store
The number of languages the database character set can represent
The different sizing requirements of each character set and their performance implications

A related topic is choosing a new character set for an existing database, which is called migrating character sets. Migrating from one database character set to another involves additional considerations beyond those of simply choosing a character set. In particular, it is a complex planning process with the goal of minimizing the possibility of losing data because of data truncation and character set conversions during the migration.

See Also:
Oracle8i National Language Support Guide for further details regarding choosing character sets

Data Truncation

The sizes of character data types CHAR and VARCHAR2 are specified in bytes, not characters. Hence, the specification CHAR(20) in a table definition allows 20 bytes for storing character data. This is acceptable when the database character set uses a single-byte character encoding scheme because the number of characters will be equivalent to the number of bytes. If the database character set uses a multi-byte character encoding scheme, however, there is no such correspondence. That is, the number of bytes no longer equals the number of characters because a character can consist of one or more bytes. This situation can cause problems.

During migration to a new character set, it is important to verify the column widths of existing CHAR and VARCHAR columns because they might need to be extended to support encoding that requires multi-byte storage. If the character set width differs during the import process, truncation of data can occur if conversion causes expansion of data. Figure 3-1 shows a typical case of data expansion with single-byte characters becoming multibyte. In it, ä (a with an umlaut) is a single-byte character in WE8MSWIN1252, but it becomes a double-byte character in UTF8. Also, the Euro symbol goes from one byte to three bytes in this conversion.

Figure 3-1 Single-byte Character Sets Becoming Multibyte

Text Description of nls81023.gif follows

Text description of the illustration nls81023.gif

The maximums for CHAR and VARCHAR2 data types are 2000 and 4000 bytes respectively. If the data columns in the new destination character set require more than 2000 and 4000 bytes, you will need to change your schema.

Restrictions

Here are some known restrictions caused by data truncation.

Within the database data dictionary, schema object names cannot exceed 30 bytes in length. Schema objects are tables, clusters, views, indexes, synonyms, tablespaces, and usernames. Renaming schema objects is required if they exceed 30 bytes in the new database character set. For example, one Thai character in the Thai national character set requires 1 byte, but, in UTF8, it requires 3 bytes. So, if you have defined a table with 11 Thai characters, then this table name must be shortened to 10 or fewer Thai characters when changing the database character set to UTF8.
If your existing Oracle usernames or passwords are created based on characters that will change in size on the target character set, these users will experience login difficulties due to authentication failures after the migration to a new character set. This is because the encrypted usernames and passwords stored in the data dictionary are not updated during migration to a new character set. For example, assuming the current database character set is WE8MSWIN1252 and the target database character set is UTF8, the username scött (o with an umlaut) will change from 5 bytes in WE8MSWIN1252 to 6 bytes. In UTF8, scött will no longer be able to log in because of the difference in the length of the username. Oracle recommends that usernames and passwords be based on ASCII characters. If they are not, you will need to reset the affected usernames and passwords after migrating to a new character set.
When CHAR data contains characters that will be expanded after migration to a new character set, space padding will not be removed during database export by default. This means that these rows will be rejected upon import into the database with the new character set. The workaround is to set the initialization parameter BLANK_TRIMMING to TRUE prior to the import.

See Also:
Oracle8i Reference for further information about BLANK_TRIMMING

Character Set Conversions

When migrating to a new database character set, the Export and Import utilities can handle character set conversions from the original database character set to the new database character set. However, character set conversions can sometimes cause data loss or data corruption. For example, if you are migrating from character set A to character set B, the destination character set B should be a superset of character set A. Characters that are not available in character set B will be converted to replacement characters, which are usually specified as '?' or '¿' or other linguistically-related characters. For example, ä (a with an umlaut) will be converted to 'a'. Replacement characters are defined by the target character set. Figure 3-2 shows a sample conversion where the copyright and Euro symbols are converted to '?' and ä to 'a'.

Figure 3-2 Replacement Characters in Character Set Conversion

Text Description of nls81020.gif follows

Text description of the illustration nls81020.gif

To reduce the risk of losing data, consider selecting a destination character set with similar character repertoires if possible. Migrating to Unicode can be an attractive option because UTF8 contains characters from most legacy character sets.

Another scenario that can cause the loss of data is migrating a database containing data of a different character set from that of the database character set. Users can insert data into the database from another character set if the client NLS_LANG character set setting is the same as the database character set. When these settings are the same, Oracle assumes that the data being sent or received is of the same character set, so no validations or conversions are performed.

This can lead to two possible data inconsistency problems. One is when a database contains data from another character set but the same codepoints exist in both character sets. For example, if the database character set is WE8ISO8859P1 and the end user Chinese Windows NT client's NLS_LANG setting is SIMPLIFIED CHINESE_CHINA.WE8ISO8859P1, then all multi-byte Chinese data (from the ZHS16GBK character set) is stored as multiples of single-byte WE8ISO8859P1 data. This means that Oracle will treat these characters as single-byte WE8ISO8859P1 characters, hence all SQL string manipulation functions such as SUBSTR or LENGTH will be based on bytes rather than characters. All bytes constituting ZHS16GBK data are legal WE8ISO8859P1 codes. If such a database is migrated to another character set, for example, UTF8, character codes will be converted as if they were in WE8ISO8859P1. This way, each of the two bytes of a ZHS16GBK character will be converted separately, yielding garbage values in UTF8. Figure 3-3 shows an example of this incorrect character set replacement.

Figure 3-3 Incorrect Character Set Replacement

Text Description of nls81021.gif follows

Text description of the illustration nls81021.gif

The second possibility is having data from mixed character sets inside the database. For example, if the data character set is WE8MSWIN1252, and two separate Windows clients using German and Greek are both using the NLS_LANG character set setting as WE8MSWIN1252, then the database will contain a mixture of German and Greek characters. Figure 3-4 shows how different clients can use different character sets in the same database.

Figure 3-4 Mixed Character Sets

Text Description of nls81022.gif follows

Text description of the illustration nls81022.gif

For database character set migration to be successful, both of these cases require manual intervention because Oracle cannot determine the character sets of the data being stored.

Database Character Set Migration

Database character set migration has two distinct stages:

Data Scanning

Before you actually migrate your character set, you need to identify areas of possible database character set conversions and truncation of data. This step is called data scanning.

Data scanning identifies the amount of effort required to migrate data into the new character encoding scheme prior to the change of the database character set. Some examples of what are found during a data scan are the number of schema objects where the column widths need to be expanded and the extent of the data that does not exist in the target repertoire. This information will assist in determining the best approach for the conversion of the database character set.

Conversion of Data

There are generally three approaches in migrating data from one database character set to another, provided the database does not contain any of the inconsistencies described in "Character Set Conversions". A description of methods to migrate databases with such inconsistencies is out of the scope of this document. For more information, contact Oracle Consulting Services for assistance.

In most cases, a full export or import is recommended to properly convert all data to a new character set. It is important to be aware of data truncation issues because character data type columns might need to be extended prior to import to handle the increase in size required. Existing PL/SQL code should be reviewed to ensure all byte-based SQL functions such as LENGTHB, SUBSTRB, and INSTRB, and PL/SQL CHAR and VARCHAR2 declarations are still valid. However, if, and only if, the new character set is a strict superset of the current character set, you can use the ALTER DATABASE CHARACTER SET statement to expedite migration to a new database character set. The target character set is a strict superset if, and only if, each and every codepoint in the source character set is available in the target character set with the same corresponding codepoint value. For instance, because US7ASCII is a strict subset of UTF8, then an ALTER DATABASE CHARACTER SET statement can be used to upgrade the database character set from US7ASCII to UTF8. Refer to "Subsets and Supersets" for a listing of all superset character sets in release 8.1.7.

The syntax is:

ALTER DATABASE [<db_name>] CHARACTER SET <new_character_set>;
ALTER DATABASE [<db_name>] NATIONAL CHARACTER SET <new_NCHAR_character_set>;

The database name is optional. The character set name should be specified without quotes. For example:

ALTER DATABASE CHARACTER SET UTF8;

To change the database character set, perform the following step:

SHUTDOWN IMMEDIATE;   -- or NORMAL
<do a full backup> 
STARTUP MOUNT;
ALTER SYSTEM ENABLE RESTRICTED SESSION;
ALTER SYSTEM SET JOB_QUEUE_PROCESSES=0;
ALTER SYSTEM SET IAQ_TM_PROCESSES=0;
ALTER DATABASE OPEN;
ALTER DATABASE CHARACTER SET <new_character_set_name>;
SHUTDOWN IMMEDIATE;   -- or NORMAL
STARTUP;

To change the national character set, replace the ALTER DATABASE CHARACTER SET statement with the ALTER DATABASE NATIONAL CHARACTER SET statement. You can issue both statements together if desired.

See Also:
Oracle8i National Language Support Guide and Oracle8i SQL Reference for the syntax of the ALTER DATABASE [NATIONAL] CHARACTER SET statement

When using Oracle Parallel Server (OPS), make sure no other Oracle background processes are running, with the exception of the one session through which a user is connected before attempting to issue the ALTER DATABASE CHARACTER SET statement. Use the following SQL statement to verify your environment:

SELECT SID, SERIAL#, PROGRAM FROM V$SESSION;

Setting the initialization parameter PARALLEL_SERVER to FALSE allows the character set change to go through. This is required in an OPS environment, as an exclusive startup is not sufficient.

Note:
It is essential to do a full backup of the database before using the ALTER DATABASE [NATIONAL] CHARACTER SET statement because the command cannot be rolled back.

The last approach is to perform an ALTER DATABASE CHARACTER SET statement followed by selective imports. This method is best suited for when the distributions of convertible data are known and they are stored within a small number of tables. A full export and import will be too expensive in this scenario. For example, a 100GB database with over 300 tables but only 3 tables requires character set conversions and the rest of the data is of the same encoding as the destination character set. The 3 tables can be exported and imported back to the new database after issuing the ALTER DATABASE CHARACTER SET statement.

Incorrect data conversion can lead to the corruption of your data, so a full backup of your database must be performed before attempting to migrate your data to a new character set.

What is the Character Set Scanner Utility?

The Character Set Scanner provides an assessment of the feasibility and potential issues in migrating an Oracle database to a new database character set. The Scanner checks all character data in the database and tests for the effects and problems of changing the character set encoding. At the end of the scan, it generates a summary report of the database scan. This report provides estimates of the amount of work required to convert the database to a new character set.

Based on the information in the summary report, you will be able to decide on the most appropriate method to migrate the database's character set. The methods are:

Export and Import Utilities
ALTER DATABASE CHARACTER SET statement

ALTER DATABASE CHARACTER SET with selective Export and Import

Note:
If there are conversion exceptions reported by the Scanner, these problems must be fixed first before using any of the above methods to do the conversions. This may involve modifying the problem data to eliminate those exceptions. In extreme cases, both database and application might need to be modified. Oracle recommends you contact Oracle Consulting Services for services on database character set migration.

Conversion Tests on Character Data

The Scanner reads the character data and tests for the following conditions on each data cell:

Do character codes of the data cells change when converted to the new character set?
Can the data cells be successfully converted to the new character set?
Will the post-conversion data fit into the current column size?

The Scanner reads and tests for data in CHAR, VARCHAR2, LONG, CLOB, NCHAR, NVARCHAR2, and NCLOB columns only. The Scanner does not perform post-conversion column size testing for LONG, CLOB, and NCLOB columns.

Access Privileges

To use the Scanner, you must have DBA privileges on the Oracle database.

Restrictions

All the character-based data in CHAR, VARCHAR2, LONG, and CLOB columns is stored in the same character set, which is the database character set specified with the CREATE DATABASE statement when the database was first created. However, in some configurations, it is possible to store data in a different character set from the database character set either intentionally or unintentionally. This happens most often when the NLS_LANG character set is the same as the database character set, because in such cases Oracle sends and receives data as is, without any conversion or validation. But it can also happen if one of the two character sets is a superset of the other, in which case many of the codes appear as if they were not converted. For example, if NLS_LANG is set to WE8ISO8859P1 and the database character set is WE8MSWIN1252, all codes except the range 128-159 are preserved through the client/server conversion.

Although a database that contains data not in its database character set cannot be converted to another character set by the three methods described in "What is the Character Set Scanner Utility?", you can still use the Scanner in the way described below to test the effect of the conversion that would take place if the data was in the database character set.

Database Containing Data from Two or More Character Sets

If a database contains data from more than one character set, the Scanner cannot accurately test the effects of changing the database character set on the database because it cannot differentiate character sets properly. If the data can be divided into two separate tables, one for each language, then the Scanner can perform two single table scans to verify the validity of the data.

For each scan, a different value of the FROMCHAR parameter can be used to tell the Scanner to treat all CHAR, VARCHAR2, LONG, and CLOB columns in the table as if they were in the specified character set.

Database Containing Data not from the Database Character Set

If a database contains data not in the database character set, but still in only one character set, the Scanner can perform a full database scan. Use the FROMCHAR parameter to tell the Scanner what character set the data is in.

Scan Modes in the Scanner

The Character Set Scanner provides three modes of database scan:

Full Database Scan

The Scanner reads and verifies the character data of all tables belonging to all users in the database including the data dictionary (SYS user), and it reports on the effects of the simulated migration to the new database character set. It scans all schema objects including stored packages, procedures and functions, and object names.

To understand the feasibility of migration to a new database character set, you need to perform a full database scan.

User Tables Scan

The Scanner reads and verifies character data of all tables belonging to the specified user and reports on the effects of changing the character set on them.

The Scanner does not test for table definitions such as table names and column names. To see the effects on the schema definitions, you need to perform a full database scan.

Single Table Scan

The Scanner reads and verifies the character data of the specified table, and reports the effects on changing the character set of them.

The Scanner does not test for table definitions such as table name and column name. To see the effects on the schema definitions, you need to perform a full database scan.

Using The Scanner

This section describes how to use the Scanner, including the steps you need to perform before scanning and the procedures on how to invoke the Scanner. The topics discussed are:

Before Using the Scanner

To use the Scanner, you must run the script CSMINST.SQL on the database that you plan to scan. CSMINST.SQL needs to be run only once, so it is not necessary to run it each time you scan the database. The script performs the following tasks to prepare the database for scanning:

Creates a user named CSMIG
Assigns the necessary privileges to CSMIG
Assigns the default tablespace to CSMIG
Connects as CSMIG
Creates the Scanner system tables under CSMIG

The SYSTEM tablespace is assigned to CSMIG by default, so you need to ensure there is sufficient storage space available in the SYSTEM tablespace before scanning the database. The amount of space required depends on the type of scan and the nature of the data in the database. For information on storage considerations, refer to "Storage and Performance Considerations in the Scanner".

You can modify the default tablespace for CSMIG by editing the script CSMINST.SQL. Modify the following statement in CSMINST.SQL to assign your preferred tablespace to CSMIG as follows:

SQL> ALTER USER CSMIG default tablespace PREFERRED_TABLESPACE;

Then run CSMINST.SQL using this command string:

% cd $ORACLE_HOME/rdbms/admin 
% sqlplus
SQL> CONNECT system/manager 
SQL> START csminst.sql

Compatibility

The Scanner is certified with Oracle databases on any platforms running under the same release except you cannot mix ASCII- and EBCDIC-based platforms. For example, the release 8.1.7 versions of the Scanner on any ASCII-based client platforms are certified to run with any release 8.1.7 Oracle databases on any ASCII-based platforms, while EBCDIC-based clients are certified to run with any release 8.1.7 Oracle database on EBCDIC platforms.

Oracle recommends you run the Scanner tool in the same Oracle Home as the database when possible.

Invoking the Scanner

You can invoke the Scanner by one of three methods:

Using the parameter file

csscan system/manager PARFILE=filename

PARFILE is a file containing the Scanner parameters you typically use.

Using the command line

csscan system/manager full=y tochar=utf8 array=10240 process=3

Using an interactive session

csscan system/manager

In an interactive session, the Scanner prompts you for only the following parameters:

USERID
FULL
USER
TABLE
TOCHAR
ARRAY
PROCESS

If you want to specify parameters that are not listed above, you need to invoke the Scanner using either the parameter file or the command line.

Getting Online Help

The Scanner provides online help. Enter csscan help=y on the command line to invoke the help screen.

Character Set Scanner: Release 8.1.7.0.0 - Production 

(c) Copyright 2000 Oracle Corporation. All rights reserved.


You can let Scanner prompt you for parameters by entering the CSSCAN command 
followed by your username/password:
Example: CSSCAN SYSTEM/MANAGER

Or, you can control how Scanner runs by entering the CSSCAN command followed by 
various parameters. To specify parameters, you use keywords:
Example: CSSCAN SYSTEM/MANAGER FULL=y TOCHAR=utf8 ARRAY=102400 PROCESS=3

Keyword    Default Prompt Description
---------- ------- ------ -------------------------------------------------
USERID             yes    username/password
FULL       N       yes    scan entire database
USER               yes    user name of the table to scan
TABLE              yes    table name to scan
TOCHAR             yes    new database character set name
FROMCHAR                  current database character set name
TONCHAR                   new NCHAR character set name
FROMNCHAR                 current NCHAR character set name
ARRAY      10240   yes    size of array fetch buffer
PROCESS    1       yes    number of scan process
MAXBLOCKS                 split table if larger than MAXBLOCKS
CAPTURE     N             capture convertible data
SUPPRESS                  suppress error log by N per table
FEEDBACK                  feedback progress every N rows
BOUNDARIES                list of column size boundaries for summary report
LASTRPT    N              generate report of the last database scan
LOG        scan           base name of log files
PARFILE                   parameter file name
HELP       N              show help screen (this screen)
---------- ------- ------ -------------------------------------------------
Scanner terminated successfully.

The Parameter File

The parameter file allows you to specify Scanner parameters in a file where they can be easily modified or reused. Create a parameter file using any flat file text editor. The command line option PARFILE=filename tells the Scanner to read the parameters from a specified file rather than from the command line. For example:

csscan parfile=filename

csscan username/password parfile=filename

The syntax for parameter file specifications is one of the following:

KEYWORD=value
KEYWORD=(value1, value2, ...)

The following is an example of a parameter file:

USERID=system/manager
USER=SCOTT    # scan SCOTT's tables
TOCHAR=utf8
ARRAY=40960
PROCESS=2  # use two concurrent scan processes
FEEDBACK=1000

You can add comments to the parameter file by preceding them with the pound (#) sign. All characters to the right of the pound sign are ignored.

Scanner Parameters

This section describes each of the Scanner parameters.

ARRAY

Default value:
10240

Minimum value:
4096

Maximum value:
unlimited

Purpose:

Specifies the size in bytes of the array buffer used to fetch data. The size of the array buffer determines the number of rows fetched by the Scanner at any one time.

The formula below gives an approximation of number of rows fetched at a time:

(rows in array) = (ARRAY buffer size) / (sum of the CHAR and VARCHAR2 

column sizes of a given table)

If the summation of CHAR and VARCHAR2 column sizes exceeds the array buffer size, the Scanner fetches only one row at a time. Tables with LONG, CLOB, or NCLOB columns are fetched only one row at a time.

This parameter affects the duration of a database scan. In general, the larger the size of the array buffer, the shorter the duration time. Each scan process will allocate the specified size of array buffer.

BOUNDARIES

Default value:
none

Purpose:

Specifies the list of column boundary sizes that are used for an application data conversion summary report. This parameter is used to locate the distribution of the application data for the datatypes CHAR, VARCHAR2, NCHAR, and NVARCHAR2.

For example, if you specify a BOUNDARIES value of (10, 100, 1000), the application data conversion summary report will produce a breakdown of the CHAR data into the following groups by their column length, CHAR(1..10), CHAR(11..100) and CHAR(101..1000), likewise for the VARCHAR2, NCHAR, and NVARCHAR2 datatypes.

CAPTURE

Default value:
N

Range of values:
Y or N

Purpose:
Indicates whether to capture the information on the individual convertible rows as well as the default of storing the exception rows. The convertible rows information is written to the table CSM$ERRORS if the parameter CAPTURE is set to Y. This information can be used to deduce which records need to be converted to the target character set by selective export and import.

FEEDBACK

Default value:
none

Minimum value:
100

Maximum value:
100000

Purpose:
Specifies that the Scanner should display a progress meter in the form of a dot for every N number of rows scanned.

For example, if you specify FEEDBACK=1000, the Scanner displays a dot for every 1000 rows scanned. The FEEDBACK value applies to all tables being scanned, so it cannot be set on a per-table basis.

FROMCHAR

Default value:
none

Purpose:
Specifies the actual character set name for CHAR, VARCHAR2, LONG, and CLOB data types in the database. By default, the Scanner assumes the character set for the above data types to be the database character set.

Use this parameter to override the default database character set definition for CHAR, VARCHAR2, LONG, and CLOB data in the database.

FROMNCHAR

Default value:
none

Purpose:

Specifies the actual national database character set name for NCHAR, NVARCHAR2, and NCLOB data types in the database. By default, the Scanner assumes the character set for the above data types to be the database national character set.

Use this parameter to override the default database character set definition for NCHAR, NVARCHAR2, and NCLOB data in the database.

FULL

Default value:
N

Range of values:
Y or N

Purpose:

Indicates whether to perform the full database scan (that is, to scan the entire database including the data dictionary). Specify FULL=Y to scan in full database mode.

For more information on full database scans, refer to "Scan Modes in the Scanner".

HELP

Default value:
N

Range of values:
Y or N

Purpose:
Displays a help message with descriptions of the Scanner parameters.

For more information, see "Getting Online Help".

LASTRPT

Default value:
N

Range of values:
Y or N

Purpose:
Indicates whether to regenerate the Scanner reports based on statistics gathered from the last database scan.

If LASTRPT=Y is specified, the Scanner does not scan the database, but creates the report files using the information left by the previous database scan session instead.

If LASTRPT=Y is specified, only the USERID, BOUNDARIES, and LOG parameters take effect.

LOG

Default value:
scan

Purpose:

Specifies a base file name for the following Scanner report files:
Database Scan Summary Report file whose extension is .txt
Individual Exception Report file whose extension is .err
Screen log file whose extension is .out

By default, the Scanner generates the three text files, scan.txt, scan.err, and scan.out in the current directory.

MAXBLOCKS

Default value:
none

Minimum value:
1000

Maximum value:
unlimited

Purpose:

Specifies the maximum block size per table, so that large tables can be split into smaller chunks for the Scanner to process.

For example, if the MAXBLOCKS parameter is set to 1000, then any tables that are greater than 1000 blocks in size will be divided into n chunks, where
n=CEIL(table block size/1000).

Dividing large tables into smaller pieces will be beneficial only when the number of processes set with PROCESS is greater than 1. If the MAXBLOCKS parameter is not set, the Scanner attempts to split up large tables based on the its own optimization rules.

PARFILE

Default value:
none

Purpose:
Specifies a filename for a file that contains a list of Scanner parameters.

For more information on using a parameter file, see "The Parameter File".

PROCESS

Default value:
1

Minimum value:
1

Maximum value:
32

Purpose:

Specifies the number of concurrent scan processes to utilize for the database scan.

SUPPRESS

Default value:
unlimited

Minimum value:
0

Maximum value:
unlimited

Purpose:
Specifies the maximum number of data exceptions being logged per table.

The Scanner inserts individual exceptional record information into the CSM$ERRORS table when an exception is found in a data cell. The table grows depending on the number of exceptions reported.

This parameter is used to suppress the logging of individual exception information after a specified number of exceptions are inserted per table. For example, if SUPPRESS is set to 100, then the Scanner records a maximum of 100 exception records per table.

For more information, see "Storage Considerations".

TABLE

Default value:
none

Purpose:
Specifies the name of the table to scan.

When specified, Scanner scans the specified table only. For example, the command below scans an emp table that belongs to the user scott:

csscan system/manager USER=SCOTT TABLE=EMP ...

TOCHAR

Default value:
none

Purpose:
Specifies a target database character set name for the CHAR, VARCHAR2, LONG, and CLOB data.

TONCHAR

Default value:
none

Purpose:
Specifies a target database character set name for the NCHAR, NVARCHAR2, and NCLOB data.

If you do not specify a value for TONCHAR, the Scanner does not scan NCHAR, NVARCHAR2, and NCLOB data.

USER

Default value:
none

Purpose:
Specifies the owner of the tables to be scanned.

If the parameter USER is specified, the Scanner scans all tables belonging to the user. If TABLE is specified, the Scanner scans only the table specified by TABLE that belongs to the user. For example, the following statement scans all tables belonging to the user scott:

csscan system/manager USER=scott ...

USERID

Default value:
none

Purpose:

Specifies the username/password (and optional connect string) of the user who scans the database. If you omit the password, the Scanner prompts you for it.

The following examples are all valid:

username/password
username/password@connect_string
username
username@connect_string

Sample Scanner Sessions

The following examples show you how to use the command line and parameter file methods to use Full Database, User Tables, and Single Table scan modes.

Sample Session of Full Database Scan

The following example shows how to scan the full database to see the effects on migrating it to UTF8. This example assumes the current database character set is WE8ISO8859P1 (or anything other than UTF8).

Parameter File Method

% csscan system/manager parfile=param.txt

The param.txt file contains the following information:

full=y
tochar=utf8
array=40960
process=4

Command Line Method

% csscan system/manager full=y tochar=utf8 array=40960 process=4

Scanner Messages
Database Scanner: Release 8.1.7.0.0 - Production

(c) Copyright 2000 Oracle Corporation. All rights reserved.
Connected to:
Oracle8 Enterprise Edition Release 8.1.7.0.0 - Production
With the Objects option
PL/SQL Release 8.1.7.0.0 - Production

Enumerating tables to scan...

. process 1 scanning SYSTEM.REPCAT$_RESOLUTION
. process 1 scanning SYS.AQ$_MESSAGE_TYPES
. process 1 scanning SYS.ARGUMENT$
. process 2 scanning SYS.AUD$
. process 3 scanning SYS.ATTRIBUTE$
. process 4 scanning SYS.ATTRCOL$
. process 2 scanning SYS.AUDIT_ACTIONS
. process 2 scanning SYS.BOOTSTRAP$
. process 2 scanning SYS.CCOL$
. process 2 scanning SYS.CDEF$
 :
 :
. process 3 scanning SYSTEM.REPCAT$_REPOBJECT
. process 1 scanning SYSTEM.REPCAT$_REPPROP
. process 2 scanning SYSTEM.REPCAT$_REPSCHEMA
. process 3 scanning MDSYS.MD$DIM
. process 1 scanning MDSYS.MD$DICTVER
. process 2 scanning MDSYS.MD$EXC
. process 3 scanning MDSYS.MD$LER
. process 1 scanning MDSYS.MD$PTAB
. process 2 scanning MDSYS.MD$PTS
. process 3 scanning MDSYS.MD$TAB

Creating Database Scan Summary Report...

Creating Individual Exception Report...

Scanner terminated successfully.

Sample Session of User Tables Scan

The following example shows how to scan the user tables to see the effects on migrating them to UTF8. This example assumes the current database character set is US7ASCII, but the actual data stored is in Western European WE8MSWIN1252 encoding.

Parameter File Method

% csscan system/manager parfile=param.txt

The param.txt file contains the following information:

user=scott
fromchar=we8mswin1252
tochar=utf8
array=40960
process=1

Command Line Method

% csscan system/manager user=scott fromchar=we8mswin1252 tochar=utf8 array=40960 
process=1

Scanner Messages

Database Scanner: Release 8.1.7.0.0 - Production

(c) Copyright 2000 Oracle Corporation. All rights reserved.

Connected to:
Oracle8 Enterprise Edition Release 8.1.7.0.0 - Production
With the Objects option
PL/SQL Release 8.1.7.0.0 - Production

Enumerating tables to scan...

. process 1 scanning SCOTT.BONUS
. process 1 scanning SCOTT.DEPT
. process 1 scanning SCOTT.EMP

Creating Database Scan Summary Report...

Creating Individual Exception Report...

Scanner terminated successfully.

Sample Session of Single Table Scan

The following example shows how to scan a single table to see the effects on migrating it to WE8MSWIN1252. This example assumes the current database character set is in US7ASCII.

Parameter File Method

% csscan system/manager parfile=param.txt

The param.txt file contains the following information:

user=scott
table=emp
tochar=we8mswin1252
array=40960
process=1
supress=100

Command Line Method

% csscan system/manager user=scott table=emp tochar=we8mswin1252 array=40960 
process=1 supress=100

Scanner Messages
Database Scanner: Release 8.1.7.0.0 - Production

(c) Copyright 1999 Oracle Corporation. All rights reserved.

Connected to:
Oracle8 Enterprise Edition Release 8.1.7.0.0 - Production
With the Objects option
PL/SQL Release 8.1.7.0.0 - Production

. process 1 scanning SCOTT.EMP

Creating Database Scan Summary Report...

Creating Individual Exception Report...

Scanner terminated successfully.

Scanner Reports

The Scanner generates two reports per scan:

Database Scan Summary Report

A Database Scan Summary Report consists of the following sections. The information available for each section depends on the type of scans and the parameters you select.

Database Scanner Parameters

This section describes the parameters selected and the type of scan you chose. The following is an example:

Parameter                                Value                         
---------------------------------------- ------------------------------
Scan type                                Full database                 
Scan CHAR data?                          YES                           
Current database character set           WE8ISO8859P1                  
New database character set               UTF8                          
Scan NCHAR data?                         NO                            
Array fetch buffer size                  102400                        
Number of processes                      4                               
---------------------------------------- ------------------------------

Database Size

This section describes the current database size. The following is an example:

TABLESPACE                          Total(MB)        Used(MB)        Free(MB)
----------------------------- --------------- --------------- ---------------
APPS_DATA                           1,340.000       1,331.070           8.926
CTX_DATA                               30.000           3.145          26.852
INDEX_DATA                            140.000         132.559           7.438
RBS_DATA                              310.000         300.434           9.563
SYSTEM_DATA                           150.000         144.969           5.027
TEMP_DATA                             160.000                         159.996
TOOLS_DATA                             35.000          22.148          12.848
USERS_DATA                            220.000         142.195          77.801
----------------------------- --------------- --------------- ---------------
Total                               2,385.000       2,073.742         311.227

Scan Summary

This indicates the feasibility of the database character set migration. There are two basic criteria that determine the feasibility of the character set migration of the database. One is the condition of the data dictionary and the other is the condition of the application data.

The Scan Summary section consists of two status lines. Depending on the scan mode and the result returned, the following statuses are printed:

For the data dictionary

All character-type data in the data dictionary remains the same in the new character set
All character-type data in the data dictionary is convertible to the new character set
Some character-type data in the data dictionary is not convertible to the new character set

For application data

All character-type application data remains the same in the new character set
All character-type application data is convertible to the new character set
Some character-type application data is not convertible to the new character set

When all data remains the same in the new character set, it means that the data encoding of the original character set is identical to the target character set. In this case, the character set can be migrated using the ALTER DATABASE CHARACTER SET statement.

If all the data is convertible to the new character set, it means that the data can be safely migrated using the Export and Import utilities. However, the migrated data may or may not have the same encoding as the original character set.

See Also:
"Individual Exception Report" for more information on non-convertible data

The following is sample output:

All character type data in the data dictionary remains the same in the new 
character set
All character type application data remains the same in the new character set

Data Dictionary Conversion Summary

This section contains the statistics on the conversion summary of the data dictionary. The granularity of this report is per datatype. The following statuses are available:

Table 3-1 Data Conversion Summary for Data Dictionary

Status	Description
`Changeless`	Number of data cells that remain the same in the new character set
`Convertible`	Number of data cells that will be successfully converted to the new character set
`Exceptional`	Number of data cells that cannot be converted. If you choose to convert anyway, some characters will be lost or data will be truncated

If the numbers in both the Convertible and Exceptional columns are zero, it means that all the data in the data dictionary will remain the same in the new character set.

If the numbers in the Exceptional column are zero and some numbers in the Convertible columns are non-zero, it means all data in the data dictionary is convertible to the new character set. During import, the relevant data will be converted.

If the numbers in the Exceptional column are non-zero, it means there is data in the data dictionary that is not convertible. Therefore, it is not feasible to migrate the current database to the new character because the export and import process cannot convert the data into the new character set. For example, you might have a table name with invalid characters or a PL/SQL procedure where a comment line includes data that can not be mapped to the new character set. These changes to schema objects must be corrected manually prior to migration to a new character set.

This information is available only when a full database scan is performed. The following is an example:

Datatype        Changeless      Convertible      Exceptional            Total
--------- ---------------- ---------------- ---------------- ----------------
VARCHAR2           971,300                1                0          971,301
CHAR                     7                0                0                7
LONG                60,325                0                0           60,325
CLOB                     
--------- ---------------- ---------------- ---------------- ----------------
Total            1,031,632                1                0        1,031,633

Application Data Conversion Summary

This section contains the statistics on conversion summary of the application data. The granularity of this report is per datatype. The following statuses are available:

Table 3-2 Data Conversion Summary for Application Data

Status	Description
`Changeless`	Number of data cells that remain the same in the new character set
`Convertible`	Number of data cells that will be successfully converted to the new character set
`Exceptional`	Number of data cells that cannot be converted. If you choose to convert anyway, some characters will be lost or data will be truncated

The following is sample output:


Datatype         Changeless      Convertible      Exceptional            Total
---------- ---------------- ---------------- ---------------- ----------------
VARCHAR2         23,213,745            1,324                0       23,215,069
CHAR                423,430                0                0          423,430
LONG                  8,624               33                0            8,657
CLOB                 58,839           11,114               28           69,981
---------- ---------------- ---------------- ---------------- ----------------
Total            23,704,638           12,471               28       23,717,137

Application Data Conversion Summary per Column Size Boundary

This section contains the conversion summary of the CHAR and VARCHAR2 application data. The granularity of this report is per column size boundaries specified by the BOUNDARIES parameter. The following status is available for each datatype and each boundary:

The granularity of this report is per datatype. The following statuses are available:

Table 3-3 Data Conversion Summary for Columns in Application Data

Status	Description
`Changeless`	Number of data cells that remain the same in the new character set
`Convertible`	Number of data cells that will be successfully converted to the new character set
`Exceptional`	Number of data cells that cannot be converted. If you choose to convert, some characters will be lost or data will be truncated

This information is available only when the BOUNDARIES parameter is specified.

The following is sample output:

Datatype                 Changeless   Convertible     Exceptional            Total
-------------------   ------------- ------------- --------------- ----------------
VARCHAR2(1..10)           1,474,825             0               0        1,474,825
VARCHAR2(11..100)         9,691,520            71               0        9,691,591
VARCHAR2(101..4000)      12,047,400         1,253               0       12,048,653
-------------------   ------------- ------------- --------------- ----------------
CHAR(1..10)                 423,413             0               0          423,413
CHAR(11..100)                    17             0               0               17
CHAR(101..4000)          
-------------------   ------------- ------------- --------------- ----------------
Total                    23,637,175         1,324               0       23,638,499

Distribution of Convertible Data per Table

This example show how Convertible and Exceptional data is distributed within the database. The granularity of this report is per table. If the list contains only a few rows, it means the Convertible data is localized. If the list contains many rows, it means the Convertible data is spread out in the database.

The following is sample output:

USER.TABLE                                            Convertible      Exceptional
------------------------------------------------ ---------------- ----------------
SMG.SOURCE                                                      1                0
SMG.HELP                                                       12                0
SMG.CLOSE_LIST                                                 16                0
SMG.ATTENDEES                                                   8                0
SGT.DR_010_I1T1                                                 7                0
SGT.DR_011_I1T1                                                 7                0
SGT.MRK_SRV_PROFILE                                             2                0
SGT.MRK_SRV_PROFILE_TEMP                                        2                0
SGT.MRK_SRV_QUESTION                                            3                0
------------------------------------------------ ---------------- ----------------

Distribution of Convertible Data per Column

This example shows how Convertible and Exceptional data is distributed within the database. The granularity of this report is per column. The following is an example:

USER.TABLE|COLUMN                                    Convertible      Exceptional
----------------------------------------------- ---------------- ----------------
SMG.SOURCE|SOURCE                                              1                0
SMG.HELP|INFO                                                 12                0
SMG.CLOSE_LIST|FNAME                                           1                0
SMG.CLOSE_LIST|LNAME                                           1                0
SMG.CLOSE_LIST|COMPANY                                         1                0
SMG.CLOSE_LIST|STREET                                          8                0
SMG.CLOSE_LIST|CITY                                            4                0
SMG.CLOSE_LIST|STATE                                           1                0
SMG.ATTENDEES|ATTENDEE_NAME                                    1                0
SMG.ATTENDEES|ADDRESS1                                         3                0
SMG.ATTENDEES|ADDRESS2                                         2                0
SMG.ATTENDEES|ADDRESS3                                         2                0
SGT.DR_010_I1T1|WORD_TEXT                                      7                0
SGT.DR_011_I1T1|WORD_TEXT                                      7                0
SGT.MRK_SRV_PROFILE|FNAME                                      1                0
SGT.MRK_SRV_PROFILE|LNAME                                      1                0
SGT.MRK_SRV_PROFILE_TEMP|FNAME                                 1                0
SGT.MRK_SRV_PROFILE_TEMP|LNAME                                 1                0
SGT.MRK_SRV_QUESTION|ANSWER                                    3                0
------------------------------------------------ ---------------- ----------------

Indexes To Be Rebuilt

This generates a list of all the indexes that are affected by the database character set migration. These can be rebuilt upon the import of the data. The following is an example:

USER.INDEX on USER.TABLE(COLUMN)
----------------------------------------------------------------------------
CD2000.COMPANY_IX_PID_BID_NNAME on CD2000.COMPANY(CO_NLS_NAME)
CD2000.I_MASHINE_MAINT_CONT on CD2000.MACHINE(MA_MAINT_CONT#)
CD2000.PERSON_NEWS_SABUN_CONT_CONT on
CD2000.PERSON_NEWS_SABUN_CONT(CONT_BID)
CD2000.PENEWSABUN3_PEID_CONT on CD2000.PE_NEWS_SABUN_3(CONT_BID)
PMS2000.CALLS_IX_STATUS_SUPPMGR on PMS2000.CALLS(SUPPMGR)
PMS2000.MAILQUEUE_CHK_SUB_TOM on PMS2000.MAIL_QUEUE(TO_MAIL)
PMS2000.MAILQUEUE_CHK_SUB_TOM on PMS2000.MAIL_QUEUE(SUBJECT)
PMS2000.TMP_IX_COMP on PMS2000.TMP_CHK_COMP(COMP_NAME)
----------------------------------------------------------------------------

Individual Exception Report

An Individual Exception Report consists of the following summaries:

Database Scan Parameters

This section describes the parameters and the type of scan chosen. The following is an example:

Parameter                                Value                         
---------------------------------------- ------------------------------
Scan type                                Full database                 
Scan CHAR data?                          YES                           
Current database character set           we8mswin1252                  
New database character set               utf8                          
Scan NCHAR data?                         NO                            
Array fetch buffer size                  102400                        
Number of rows to heap up for insert     10                            
Number of processes                      1                             
---------------------------------------- ------------------------------

Application Data Individual Exceptions

This report identifies the data that has exceptions so that this data can then be modified if necessary.

There are two types of exceptions:

Exceed Column Size

The column size should be extended if the maximum column width has been surpassed. If not, data truncation occurs.
Lossy Conversion

The data must be corrected before migrating to the new character set, or else the invalid characters will be converted to a replacement character. Replacement characters are usually specified as '?' or '¿' or a similar linguistically-related character.

The following is an example of an individual exception report that illustrates some possible problems when changing the database character set from WE8ISO8859P1 to UTF8:

User:   SCOTT
Table:  PRODUCT
Column: NAME
Type:   VARCHAR2(10)
Number of Exceptions: 2  
Max Post Conversion Data Size: 11

ROWID              Exception Type      Size Cell Data(first 30 bytes)     
------------------ ------------------ ----- ------------------------------
AAAA2fAAFAABJwQAAg exceed column size    11 Ährenfeldt
AAAA2fAAFAABJwQAAu lossy conversion         órâclë8^TM
AAAA2fAAFAABJwQAAu exceed column size    11 órâclë8^TM
------------------ ------------------ ----- ------------------------------

The values Ährenfeldt and órâclë8^TM exceed the column size (10 bytes) because each of the characters Ä, ó, â, and ë occupies one byte in WE8ISO8859P1 but two bytes in UTF8. The value órâclë8^TM has lossy conversion to UTF8 because the trademark sign ^TM (code 153) is not a valid WE8ISO8859P1 character. It is a WE8MSWIN1252 character, which is a superset of WE8ISO8859P1.

You can view the data that has an exception by issuing a SELECT statement:

SELECT name FROM scott.product 
WHERE ROWID='AAAA2fAAFAABJwQAAu';

You can modify the data that has the exception by issuing an UPDATE statement:

UPDATE scott.emp SET ename = 'Oracle8 TM'
WHERE ROWID='AAAA2fAAFAABJwQAAu';

Storage and Performance Considerations in the Scanner

Storage Considerations

This section describes the sizing and the growth of the Scanner's system tables, and explains the approach to maintain them. There are three system tables that can increase rapidly depending on the nature of the data stored in the database.

CSM$TABLES

The Scanner enumerates all tables that need to be scanned into the table CSM$TABLES. Therefore, CSM$TABLES contains as many rows as the number of tables in the database.

You might want to assign a large tablespace to the user CSMIG by amending the CSMINST.SQL script. By default, the SYSTEM tablespace is assigned to the user CSMIG.

You can look up the number of tables in the database by issuing the following SQL statement:

SELECT COUNT(*) FROM DBA_TABLES;

CSM$COLUMNS

The Scanner stores statistical information for each column scanned into the table CSM$COLUMNS. Therefore, CSM$COLUMNS contains as many rows as the number of character-type columns in the database.

You might want to assign a large tablespace to the user CSMIG by amending the CSMINST.SQL script. By default, the SYSTEM tablespace is assigned to CSMIG user.

You can look up the number of character type columns in the database by issuing the following SQL statement:

SELECT COUNT(*) FROM DBA_TAB_COLUMNS
WHERE DATA_TYPE IN ('CHAR', 'VARCHAR2', 'LONG', 'CLOB');

CSM$ERRORS

When exceptions are detected with cell data, the Scanner inserts individual exception information into the table CSM$ERRORS. This information then appears in the Individual Exception Report and facilitates identifying records to be modified if necessary.

If your database contains a lot of data that is signaled as Exceptional or Convertible (when the parameter CAPTURE=Y is set), the table CSM$ERRORS can grow too large. You can prevent the CSM$ERRORS table from growing unnecessarily large by using the SUPPRESS parameter.

The SUPPRESS parameter applies to each table. The Scanner suppresses inserting individual Exceptional information after the specified number of exceptions is inserted. Limiting the number of exceptions to be recorded may not be useful if the exceptions are spread over different tables.

You might want to assign a large tablespace to the user CSMIG by amending the CSMINST.SQL script.

Performance Consideration

This section describes ways to increase performance when scanning the database.

Utilizing Multiple Scan Processes

If you plan to scan a relatively large database, for example, over 50GB, you might want to consider using multiple scan processes. This shortens the duration time of database scans by utilizing hardware resources such as CPU and memory available on the machine.

Array Fetch Buffer Size

The Scanner fetches multiple rows at a time when an array fetch is allowed. Generally, you will improve performance by letting the Scanner use a bigger array fetch buffer.

Suppressing Exception and Convertible Log

The Scanner inserts individual Exceptional and Convertible (when CAPTURE=Y) information into the table CSM$ERRORS. In general, insertion into the CSM$ERRORS table is more costly than data fetching. If your database has a lot of data that is signaled as Exceptional or Convertible, the Scanner issues many insert statements, causing performance degradation. Oracle recommends setting a limit on the number of exception rows to be recorded using the SUPRESS parameter.

Reference Material

This section contains the following reference material:

Scanner Tables

The Scanner uses the following tables.

CSM$PARAMETERS

This table contains Scanner parameters specified by the user.

Column Datatype NULL Description

NAME

VARCHAR2(30)

NOT NULL

Parameter name

VALUE

VARCHAR2(30)

NOT NULL

Parameter value

CSM$TABLES

This table contains information about database tables to be scanned. The Scanner enumerates all tables to be scanned into this table.

Column Datatype NULL Description

USR#

NUMBER

NOT NULL

Userid of the table owner

OBJ#

NUMBER

NOT NULL

Object id of the table

MINROWID

ROWID

Minimum rowid of the split range of the table

MAXROWID

ROWID

Maximum rowid of the split range of the table

PROPERTY

NUMBER

Table property

WHO

NUMBER

Internal id of the scan process that scanned the table

SCNSTART

DATE

Time table scan started

SCNEND

DATE

Time table scan completed

SCNCOLS

NUMBER

Number of columns (to be) scanned

SCNROWS

NUMBER

Number of rows scanned

CSM$COLUMNS

This table contains statistical information of columns that were scanned.

CSM$ERRORS

This table contains individual exception information of cell data and object definitions.

Column Datatype NULL Description

ERR#

NUMBER

NOT NULL

Scanner internal exception code

USR#

NUMBER

NOT NULL

Userid of the object owner or data owner

OBJ#

NUMBER

NOT NULL

Object id

COL#

NUMBER

Column id

INTCOL#

NUMBER

Internal column id (for ADT)

TYP#

NUMBER

Column datatype or object type

FRM#

NUMBER

Character set form

CNVSIZE

DATE

Post-conversion data size

ID$

DATE

The rowid of the data or name that identifies the object

Scanner Messages

The Scanner has the following error messages:

CSC-00100 failed to allocate memory size of number: Cause: An attempt was made to allocate memory with size 0 or bigger than the maximum size.; Action: This is an internal error. Contact Oracle Customer Support.
CSC-00101 failed to release memory: Cause: An attempt was made to release memory with invalid pointer.; Action: This is an internal error. Contact Oracle Customer Support.
CSC-00102 failed to release memory, null pointer given: Cause: An attempt was made to release memory with null pointer.; Action: This is an internal error. Contact Oracle Customer Support.
CSC-00105 failed to parse BOUNDARIES parameter: Cause: BOUNDARIES parameter was specified in an invalid format.; Action: Refer to the manual for the correct syntax.
CSC-00106 failed to parse SPLIT parameter: Cause: SPLIT parameter was specified in an invalid format.; Action: Refer to the manual for the correct syntax.
CSC-00107 CSM$* tables not found: Cause: CSM$VERSION table not found in the database.; Action: Run CSMINST.SQL on the database.
CSC-00108 incompatible CSM$* tables: Cause: Incompatible CSM$* tables found in the database.; Action: Run CSMINST.SQL on the database.
CSC-00110 failed to parse userid: Cause: USERID parameter was specified in an invalid format.; Action: Refer to the manual for the correct syntax.
CSC-00111 failed to get RDBMS version: Cause: Failed to retrieve the value of the Version of the database.; Action: This is an internal error. Contact Oracle Customer Support.
CSC-00112 database version not supported: Cause: The database version is older than release 8.0.5.0.0.; Action: Upgrade the database to release 8.0.5.0.0 or later, then try again.
CSC-00113 user %s is not allowed to access data dictionary: Cause: The specified user cannot access the data dictionary.; Action: Set O7_DICTIONARY_ACCESSIBILITY parameter to TRUE, or use SYS user.
CSC-00114 failed to get database character set name: Cause: Failed to retrieve value of NLS_CHARACTERSET or NLS_NCHAR_CHARACTERSET parameter from NLS_DATABASE_PARAMETERS view.; Action: This is an internal error. Contact Oracle Customer Support.
CSC-00115 invalid character set name %s: Cause: The specified character set is not a valid Oracle character set.; Action: Refer to the National Language Support Guide for the correct character set name.
CSC-00116 failed to reset NLS_LANG/NLS_NCHAR parameter: Cause: Failed to force NLS_LANG character set to be same as database character set.; Action: This is an internal error. Contact Oracle Customer Support.
CSC-00117 failed to clear previous scan log: Cause: Failed to delete all rows from CSM$* tables.; Action: This is an internal error. Contact Oracle Customer Support.
CSC-00118 failed to save command parameters: Cause: Failed to insert rows into CSM$PARAMETERS table.; Action: This is an internal error. Contact Oracle Customer Support.
CSC-00119 failed to save scan start time: Cause: Failed to insert a row into CSM$PARAMETERS table.; Action: This is an internal error. Contact Oracle Customer Support.
CSC-00120 failed to enumerate tables to scan: Cause: Failed to enumerate tables to scan into CSM$TABLES table.; Action: This is an internal error. Contact Oracle Customer Support.
CSC-00121 failed to save scan complete time: Cause: Failed to insert a row into CSM$PARAMETERS table.; Action: This is an internal error. Contact Oracle Customer Support.
CSC-00122 failed to create scan report: Cause: Failed to create database scan report.; Action: This is an internal error. Contact Oracle Customer Support.
CSC-00123 failed to check if user %s exist: Cause: Select statement that checks if the specified user exists in the database failed.; Action: This is an internal error. Contact Oracle Customer Support.
CSC-00124 user %s not found: Cause: The specified user does not exist in the database.; Action: Check the user name.
CSC-00125 failed to check if table %s.%s exist: Cause: Select statement that checks if the specified table exists in the database failed.; Action: This is an internal error. Contact Oracle Customer Support.
CSC-00126 table %s.%s not found: Cause: The specified table does not exist in the database.; Action: Check the user name and table name.
CSC-00127 user %s does not have DBA privilege: Cause: The specified user does not have DBA privileges, which are required to scan the database.; Action: Choose a user with DBA privileges.
CSC-00128 failed to get server version string: Cause: Failed to retrieve the version string of the database.; Action: None.
CSC-00130 failed to initialize semaphore: Cause: Unknown.; Action: This is an internal error. Contact Oracle Customer Support.
CSC-00131 failed to spawn scan process %d: Cause: Unknown.; Action: This is an internal error. Contact Oracle Customer Support.
CSC-00132 failed to destroy semaphore: Cause: Unknown.; Action: This is an internal error. Contact Oracle Customer Support.
CSC-00133 failed to wait semaphore: Cause: Unknown.; Action: This is an internal error. Contact Oracle Customer Support.
CSC-00134 failed to post semaphore: Cause: Unknown.; Action: This is an internal error. Contact Oracle Customer Support.
CSC-00140 failed to scan table (tid=%d, oid=%d): Cause: Data scan on this particular table failed.; Action: This is an internal error. Contact Oracle Customer Support.
CSC-00141 failed to save table scan start time: Cause: Failed to update a row in the CSM$TABLES table.; Action: This is an internal error. Contact Oracle Customer Support.
CSC-00142 failed to get table information: Cause: Failed to retrieve various information from user id and object id of the table.; Action: This is an internal error. Contact Oracle Customer Support.
CSC-00143 failed to get column attributes: Cause: Failed to retrieve column attributes of the table.; Action: This is an internal error. Contact Oracle Customer Support.
CSC-00144 failed to scan table %s.%s: Cause: Data scan on this particular table was not successful.; Action: This is an internal error. Contact Oracle Customer Support.
CSC-00145 failed to save scan result for columns: Cause: Failed to insert rows into CSM$COLUMNS table.; Action: This is an internal error. Contact Oracle Customer Support.
CSC-00146 failed to save scan result for table: Cause: Failed to update a row of CSM$TABLES table.; Action: This is an internal error. Contact Oracle Customer Support.
CSC-00147 unexpected data truncation: Cause: Scanner allocates the exactly same size of memory as the column byte size for fetch buffer, resulting in unexpected data truncation.; Action: This is an internal error. Contact Oracle Customer Support.
CSC-00150 failed to retrieve table information: Cause: Failed to retrieve the specified table information.; Action: This is an internal error. Contact Oracle Customer Support.
CSC-00151 failed to enumerate user tables: Cause: Failed to enumerate all tables that belong to the specified user.; Action: This is an internal error. Contact Oracle Customer Support.
CSC-00152 failed to enumerate all tables: Cause: Failed to enumerate all tables in the database.; Action: This is an internal error. Contact Oracle Customer Support.
CSC-00153 failed to enumerate character type columns: Cause: Failed to enumerate all CHAR, VARCHAR2, LONG, and CLOB columns of tables to scan.; Action: This is an internal error. Contact Oracle Customer Support.
CSC-00154 failed to create list of tables to scan: Cause: Failed to enumerate the tables into CSM$TABLES table.; Action: This is an internal error. Contact Oracle Customer Support.
CSC-00155 failed to split tables for scan: Cause: Failed to split the specified tables.; Action: This is an internal error. Contact Oracle Customer Support.
CSC-00156 failed to get total number of tables to scan: Cause: Select statement that retrieves the number of tables to scan failed.; Action: This is an internal error. Contact Oracle Customer Support.
CSC-00157 failed to retrieve list of tables to scan: Cause: Failed to read all table ids into the scanner memory.; Action: This is an internal error. Contact Oracle Customer Support.
CSC-00158 failed to retrieve index defined on column: Cause: Select statement that retrieves index defined on the column fails.; Action: This is an internal error. Contact Oracle Customer Support.
CSC-00160 failed to open summary report file: Cause: File open function returned error.; Action: Check if you have create/write privilege on the disk and check if the file name specified for the LOG parameter is valid.
CSC-00161 failed to report scan elapsed time: Cause: Unknown.; Action: This is an internal error. Contact Oracle Customer Support.
CSC-00162 failed to report database size information: Cause: Unknown.; Action: This is an internal error. Contact Oracle Customer Support.
CSC-00163 failed to report scan parameters: Cause: Unknown.; Action: This is an internal error. Contact Oracle Customer Support.
CSC-00164 failed to report Scan summary: Cause: Unknown.; Action: This is an internal error. Contact Oracle Customer Support.
CSC-00165 failed to report conversion summary: Cause: Unknown.; Action: This is an internal error. Contact Oracle Customer Support.
CSC-00166 failed to report convertible data distribution: Cause: Unknown.; Action: This is an internal error. Contact Oracle Customer Support.
CSC-00167 failed to open exception report file: Cause: File open function returned error.; Action: Check if you have create/write privilege on the disk and check if the file name specified for LOG parameter is valid.
CSC-00168 failed to report individual exceptions: Cause: Unknown.; Action: This is an internal error. Contact Oracle Customer Support.
CSC-00170 failed to retrieve used size of tablespace %s: Cause: Unknown.; Action: This is an internal error. Contact Oracle Customer Support.
CSC-00171 failed to retrieve free size of tablespace %s: Cause: Unknown.; Action: This is an internal error. Contact Oracle Customer Support.
CSC-00172 failed to retrieve total size of tablespace %s: Cause: Unknown.; Action: This is an internal error. Contact Oracle Customer Support.
CSC-00173 failed to retrieve used size of the database: Cause: Unknown.; Action: This is an internal error. Contact Oracle Customer Support.
CSC-00174 failed to retrieve free size of the database: Cause: Unknown.; Action: This is an internal error. Contact Oracle Customer Support.
CSC-00175 failed to retrieve total size of the database: Cause: Unknown.; Action: This is an internal error. Contact Oracle Customer Support.

Subsets and Supersets

Table 3-4 lists common subset/superset relationships.

Table 3-4 Subset-Superset Pairs

Subset	Superset
AR8ADOS710	AR8ADOS710T
AR8ADOS720	AR8ADOS720T
AR8ADOS720T	AR8ADOS720
AR8APTEC715	AR8APTEC715T
AR8ARABICMACT	AR8ARABICMAC
AR8ISO8859P6	AR8ASMO708PLUS
AR8ISO8859P6	AR8ASMO8X
AR8MUSSAD768	AR8MUSSAD768T
AR8MUSSAD768T	AR8MUSSAD768
AR8NAFITHA711	AR8NAFITHA711T
AR8NAFITHA721	AR8NAFITHA721T
AR8SAKHR707	AR8SAKHR707T
AR8SAKHR707T	AR8SAKHR707
D7DEC	D7SIEMENS9780X
D7SIEMENS9780X	D7DEC
DK7SIEMENS9780X	N7SIEMENS9780X
I7DEC	I7SIEMENS9780X
I7SIEMENS9780X	IW8EBCDIC424
IW8EBCDIC424	IW8EBCDIC1086
KO16KSC5601	KO16MSWIN949
LV8PC8LR	LV8RST104090
N7SIEMENS9780X	DK7SIEMENS9780X
US7ASCII	See Table 3-5 for a complete list
WE16DECTST	WE16DECTST2
WE16DECTST2	WE16DECTST
WE8DEC	TR8DEC
WE8DEC	WE8NCR4970
WE8ISO8859P1	WE8MSWIN1252
WE8NCR4970	TR8DEC
WE8NCR4970	WE8DEC
WE8PC850	WE8PC858

US7ASCII is a special case because so many other character sets are supersets of it. Table 3-5 lists supersets for US7ASCII.

Table 3-5 US7ASCII Supersets

Supersets	Supersets	Supersets
AL24UTFFSS	EE8PC852	RU8BESTA
AR8ADOS710	EL8DEC	RU8PC855
AR8ADOS710T	EL8ISO8859P7	RU8PC866
AR8ADOS720	EL8MACGREEKS	SE8ISO8859P3
AR8ADOS720T	EL8MSWIN1253	TH8MACTHAIS
AR8APTEC715	EL8PC437S	TH8TISASCII
AR8APTEC715T	EL8PC851	TR8DEC
AR8ARABICMACS	EL8PC869	TR8MACTURKISHS
AR8ASMO708PLUS	ET8MSWIN923	TR8MSWIN1254
AR8ASMO8X	HU8ABMOD	TR8PC857
AR8HPARABIC8T	HU8CWI2	US8PC437
AR8ISO8859P6	IN8ISCII	UTF8
AR8MSAWIN	IS8PC861	VN8MSWIN1258
AR8MUSSAD768	IW8ISO8859P8	VN8VN3
AR8MUSSAD768T	IW8MACHEBREWS	WE8DEC
AR8NAFITHA711	IW8MSWIN1255	WE8DG
AR8NAFITHA711T	IW8PC1507	WE8ISO8859P1
AR8NAFITHA721	JA16EUC	WE8ISO8859P15
AR8NAFITHA721T	JA16SJIS	WE8ISO8859P9
AR8SAKHR706	JA16TSTSET	WE8MACROMAN8S
AR8SAKHR707	JA16TSTSET2	WE8MSWIN1252
AR8SAKHR707T	JA16VMS	WE8NCR4970
BG8MSWIN	KO16KSC5601	WE8NEXTSTEP
BG8PC437S	KO16KSCCS	WE8PC850
BLT8CP921	KO16MSWIN949	WE8PC858
BLT8MSWIN1257	KO16TSTSET	WE8PC860
BLT8PC775	LA8ISO6937	WE8ROMAN8
BN8BSCII	LA8PASSPORT	ZHS16CGB231280
CDN8PC863	LT8MSWIN921	ZHS16GBK
CL8ISO8859P5	LT8PC772	ZHT16BIG5
CL8KOI8R	LT8PC774	ZHT16CCDC
CL8MACCYRILLICS	LV8PC1117	ZHT16DBT
CL8MSWIN1251	LV8PC8LR	ZHT16HKSCS
EE8ISO8859P2	LV8RST104090	ZHT16MSWIN950
EE8MACCES	N8PC865	ZHT32EUC
EE8MACCROATIANS	NE8ISO8859P10	ZHT32SOPS
EE8MSWIN1250	NEE8ISO8859P4	ZHT32TRIS

Default value:	10240
Minimum value:	4096
Maximum value:	unlimited
Purpose:	Specifies the size in bytes of the array buffer used to fetch data. The size of the array buffer determines the number of rows fetched by the Scanner at any one time.

Default value:	none
Purpose:	Specifies the list of column boundary sizes that are used for an application data conversion summary report. This parameter is used to locate the distribution of the application data for the datatypes `CHAR`, `VARCHAR2`, `NCHAR,` and `NVARCHAR2`.

Default value:	N
Range of values:	Y or N
Purpose:	Indicates whether to capture the information on the individual convertible rows as well as the default of storing the exception rows. The convertible rows information is written to the table `CSM$ERRORS` if the parameter `CAPTURE` is set to Y. This information can be used to deduce which records need to be converted to the target character set by selective export and import.

Column	Datatype	NULL	Description
`NAME`	`VARCHAR2(30)`	`NOT NULL`	Parameter name
`VALUE`	`VARCHAR2(30)`	`NOT NULL`	Parameter value

Column	Datatype	NULL	Description
`USR#`	`NUMBER`	`NOT NULL`	Userid of the table owner
`OBJ#`	`NUMBER`	`NOT NULL`	Object id of the table
`MINROWID`	`ROWID`		Minimum rowid of the split range of the table
`MAXROWID`	`ROWID`		Maximum rowid of the split range of the table
`PROPERTY`	`NUMBER`		Table property
`WHO`	`NUMBER`		Internal id of the scan process that scanned the table
`SCNSTART`	`DATE`		Time table scan started
`SCNEND`	`DATE`		Time table scan completed
`SCNCOLS`	`NUMBER`		Number of columns (to be) scanned
`SCNROWS`	`NUMBER`		Number of rows scanned

Column	Datatype	NULL	Description
`ERR#`	`NUMBER`	`NOT NULL`	Scanner internal exception code
`USR#`	`NUMBER`	`NOT NULL`	Userid of the object owner or data owner
`OBJ#`	`NUMBER`	`NOT NULL`	Object id
`COL#`	`NUMBER`		Column id
`INTCOL#`	`NUMBER`		Internal column id (for ADT)
`TYP#`	`NUMBER`		Column datatype or object type
`FRM#`	`NUMBER`		Character set form
`CNVSIZE`	`DATE`		Post-conversion data size
`ID$`	`DATE`		The rowid of the data or name that identifies the object

3 New Character Set Scanner Utility

Overview of Choosing and Migrating Character Sets

Data Truncation

Figure 3-1 Single-byte Character Sets Becoming Multibyte

Restrictions

Character Set Conversions

Figure 3-2 Replacement Characters in Character Set Conversion

Figure 3-3 Incorrect Character Set Replacement

Figure 3-4 Mixed Character Sets

Database Character Set Migration

Data Scanning

Conversion of Data

What is the Character Set Scanner Utility?

Conversion Tests on Character Data

Access Privileges

Restrictions

Database Containing Data from Two or More Character Sets

Database Containing Data not from the Database Character Set

Scan Modes in the Scanner

Full Database Scan

User Tables Scan

Single Table Scan

Using The Scanner

Before Using the Scanner

Compatibility

Invoking the Scanner

Getting Online Help

The Parameter File

Scanner Parameters

ARRAY

BOUNDARIES

CAPTURE

FEEDBACK

FROMCHAR

FROMNCHAR

FULL

HELP

LASTRPT

LOG

MAXBLOCKS

PARFILE

PROCESS

SUPPRESS

TABLE

TOCHAR

TONCHAR

USER

USERID

Sample Scanner Sessions

Sample Session of Full Database Scan

Parameter File Method

Command Line Method

Sample Session of User Tables Scan

Parameter File Method

Command Line Method

Scanner Messages

Sample Session of Single Table Scan

Parameter File Method

Command Line Method

Scanner Reports

Database Scan Summary Report

Database Scanner Parameters

Database Size

Scan Summary

Data Dictionary Conversion Summary

Table 3-1 Data Conversion Summary for Data Dictionary

Application Data Conversion Summary

Table 3-2 Data Conversion Summary for Application Data

Application Data Conversion Summary per Column Size Boundary

Table 3-3 Data Conversion Summary for Columns in Application Data

Distribution of Convertible Data per Table

Distribution of Convertible Data per Column

Indexes To Be Rebuilt

Individual Exception Report

Database Scan Parameters

Application Data Individual Exceptions

Storage and Performance Considerations in the Scanner

Storage Considerations

CSM$TABLES

CSM$COLUMNS

3
New Character Set Scanner Utility