Changing the Database Character Set ( NLS_CHARACTERSET ) [ID 225912.1] | |||||
| |||||
Modified 13-MAY-2011 Type BULLETIN Status PUBLISHED |
In this Document
Purpose
Scope and Application
Changing the Database Character Set ( NLS_CHARACTERSET )
A) The database character set (NLS_CHARACTERSET)
B) Choosing a new database character set
C) Changing the database character set.
C1) Using the "ALTER DATABASE CHARACTER SET" command in 8i or 9i and CSALTER in 10g and up.
C2) Using Export/Import (or Datapump in 10g and up).
C3) Using a combination of ALTER DATABASE CHARACTER SET (8i, 9i) / CSALTER (10g and up) and export/import
D) Further reading
References
Applies to:
Oracle Server - Enterprise Edition - Version: 8.1.5.0 and later [Release: 8.1.5 and later ]
Information in this document applies to any platform.
Purpose
This article gives an overview of methods to change the database character set or to check before exporting/importing between databases with a different NLS_CHARACTERSET..
The current NLS_CHARACTERSET is seen in NLS_DATABASE_PARAMETERS.
select value from NLS_DATABASE_PARAMETERS where parameter='NLS_CHARACTERSET';
Use this note to get a basic understanding of the methods, then use the in-depth notes at the end of this note for specific conversions.
For all questions regarding the NLS_NCHAR_CHARACTERSET please see: Note 276914.1 The National Character Set in Oracle 9i and 10g
To change the NLS_CHARACTERSET for an Oracle Applications database: please see Note 124721.1 Migrating an Applications Installation to a New Character Set.
This is the only way supported by Oracle applications. If you have any doubt log an Oracle Applications SR for assistance.
Scope and Application
Anyone trying to change the NLS_CHARACTERSET.
Changing the Database Character Set ( NLS_CHARACTERSET )
A) The database character set (NLS_CHARACTERSET)
The NLS_CHARACTERSET of an Oracle database defines what characters can be stored in the database using the CHAR, VARCHAR2, LONG and CLOB datatypes. A Characterset does not define languages, it defines a certain range of characters. Any language that uses only the characters known by that characterset can then be stored.
If you change character sets there is a possibility that characters that you currently use are not defined in the new character set or that the current setup is not correctly used and therefore you could lose data when changing the NLS_CHARACTERSET.
If binary data is stored/handled as a CHAR, VARCHAR2, LONG or CLOB datatype then data loss is expected, especially when using an AL32UTF8 database (even without using exp/imp). Or errors like ORA-29275 or ORA-600 [kole_t2u], [34] may appear.
See Note 1297507.1 Problems with (Importing) Encrypted Data After Character Set Change Using Other NLS_CHARACTERSET Database or Upgrading the (client) Oracle Version
You should always check this by using the Character Set Scanner (Csscan) before making any changes to your character set.
Even when using Exp/imp or EXPDP/IMPDP.
Note 458122.1 Installing and Configuring Csscan in 8i and 9i (Database Character Set Scanner)
Note 745809.1 Installing and configuring Csscan in 10g and 11g (Database Character Set Scanner)
Note 444701.1 Csscan output explained
B) Choosing a new database character set
For the majority of customers an Unicode character set (AL32UTF8) is the best choice See Note 333489.1 Choosing a database character set means choosing Unicode.
For non-Unicode charactersets the best choice are xx8MSWIN125x charactersets, even if the database itself runs on an Unix platform. The reason is simply that the majority of the clients are windows based systems, hence the best non-Unicode characterset for a database is a characterset that can store all the characters known by those clients, which means an xx8MSWIN125x characterset.
Detailed discussion is found in Note 264294.1 Choosing from WE8ISO8859P1, WE8ISO8859P15 or WE8MSWIN1252 as db character set.
From 11g onwards the DBCA offers by default a list of "recommended" charactersets to use as NLS_CHARACTERSET which , for the same reason as above, does not included charactersets like WE8ISO8859P1 , note that ALL charactersets are still SUPPORTED and "non recommended" charactersets can be used if needed.
See Note 306411.1 Character Set Consolidation for Oracle Database 11g for more information.
If you want to know what Languages can be stored in most common charactersets then please see Note 62421.1 Which Character Set Supports Which Language
To know what characters are known in a certain characterset then please see
Note 282336.1 Charts of most current mono-byte Character sets
Or use Locale builder to open the Oracle characterset definition
Note 223706.1 Using Locale Builder to view the definition of character sets
An excellent external resource is http://www.eki.ee/letter/ . The website allows you to choose a language and then gives a overview of all charactersets that contain all the letters needed for this language.
Please note that Oracle does not warrant that the information on this website is accurate.
C) Changing the database character set.
Please see the following note for an Oracle Applications database: Note 124721.1 Migrating an Applications Installation to a New Character Set.
This is the only way supported by Oracle applications. If you have any doubt log an Oracle Applications SR for assistance.
There are 2 basic ways of changing the character set and a third 'combined' way:
C1) Using the "ALTER DATABASE CHARACTER SET" command in 8i or 9i and CSALTER in 10g and up.
This is not always possible because seen ALTER DATABASE CHARACTER SET /CSALTER does not (!) change the actual code points of the stored data.
So this method can only be used if the data that is currently stored in the database is a Binary Sub set of the new character set (=all codes (!) of the old characterset are valid and mean the same character in the new characterset, the new characterset is then a strict superset of the old) .
For 8i / 9i this is documented in Note 66320.1 8i/9i only: Changing the Database Character Set or the Database National Character Set in 8i/9i
This can be used for these combinations: Note 119164.1 Changing Database Character Set - Valid Superset Definitions
In 10g and up the "ALTER DATABASE CHARACTER SET" command is NOT to be used anymore but Csscan/Csalter is the new way to change a database characterset.
In 10g you need first to run csscan and then check the csscan results if you can run Csalter. Csalter is not depending on the Superset definitions, it is depending on the csscan output.
More information about Csscan and Csalter is in
Note 745809.1 Installing and configuring Csscan in 10g and 11g (Database Character Set Scanner)
Note 444701.1 Csscan output explained
C2) Using Export/Import (or Datapump in 10g and up).
You simply export the current database, then create a new database with the new character set and import the data into that database.
Of course the characters that you are storing will still have to be defined in the new character set AND need to be properly define din the source database!
This is OFTEN not the case , this can be detected by using csscan upfront on the source database, so when using exp/imp we advice to use always Csscan upfront to detect any possible problems.
If you are exporting/importing certain users or table(s) between existing databases and one database is an UTF8 or AL32UTF8 database then please see:
Note 1297961.1 ORA-01401 / ORA-12899 While Importing Data In An AL32UTF8 / UTF8 (Unicode) Or Other Multibyte NLS_CHARACTERSET Database.
Exp/imp can be used for any change where all characters from the old characterset are know in the new (but they may use different codes for the same character).
from a 8 bit characterset to UTF8 / AL32UTF8 or an other multibyte characterset on ALL 10g versions (including 10.1.0.5 and 10.2.0.3) and 11.1.0.6. Impdp may provoke data corruption unless you applied Patch 5874989.
The "old" exp/imp work fine. This problem is fixed in the 10.2.0.4 and 11.1.0.7 patchset.
All existing patches for this bug are found here: https://updates.oracle.com/download/5874989.html
For windows the fix is included in
10.1.0.5.0 Patch 20 (10.1.0.5.20P) or later ( Note 276548.1 )
10.2.0.3.0 Patch 11 (10.2.0.3.11P) or later ( Note 342443.1 )
The patch is technically only needed on the impdp side, but if you use expdp/impdp between different character sets we suggest to patch all your systems.
C3) Using a combination of ALTER DATABASE CHARACTER SET (8i, 9i) / CSALTER (10g and up) and export/import
In some cases method 1 does not work seen Csscan tells you that some data needs to be converted to the new character set (= "Convertible" data), and method 2 will simply take too much time. In those cases it is usually possible to use a combination of the 2 methods:
a) Export only the "Convertible" data from the tables that are listed by Csscan (= characters where the CODE changes from between the current and the new characterset)
b) Truncate or drop those tables.
c) Run csscan again to confirm that all data is now ready to be moved to the new character set directly and if that is the case change the character set of the database using the ALTER DATABASE CHARACTERSET (8i,9i) / CSALTER (10g and up) command (method C1).
d) Now that the character set has changed we can simply import the data exported in step (a). The import will convert that data so that it gets stored using the correct character codes for this character set.
These specific notes can guide you through some often used conversions, they show how to use the the above mentioned "combined method" in practice and document extra checks:
Changing from US7ASCII to WE8MSWIN1252 or other xxIOS8859Pxx to xx8MSWIN12xx charactersets
Note 555823.1 Changing from US7ASCII or WE8ISO8859P1 to WE8MSWIN1252
Note 1213683.1 Changing from US7ASCII or WE8ISO8859P1 or IW8ISO8859P8 to IW8MSWIN1255
Note 260022.1 Changing From US7ASCII or WE8ISO8859P1 or AR8ISO8859P6 to AR8MSWIN1256
Note 261871.1 Changing EL8ISO8859P7 to EL8MSWIN1253
Note 263119.1 Changing EE8ISO8859P2 to EE8MSWIN1250
Note 266309.1 Changing WE8ISO8859P9 to WE8ISO8859P1/WE8MSWIN1252
Note 246008.1 Changing WE8ISO8859P15 to WE8MSWIN1252
Other combinations:
Note 1104893.1 Changing WE8DEC to WE8ISO8859P1, WE8ISO8859P15 or WE8MSWIN1252
Note 257722.1 Changing WE8ISO8859P1 to WE8ISO8859P15
Note 261639.1 Changing WE8MSWIN1252 to WE8ISO8859P15
Note 273281.1 Changing WE8ISO8859P15 TO WE8ISO8859P1
Changing the NLS_CHARACTERSET to Unicode (= AL32UTF8 or UTF8)
Note 788156.1 AL32UTF8 / UTF8 (Unicode) Database Character Set Implications
Note 260192.1 Changing the NLS_CHARACTERSET to AL32UTF8 / UTF8 (Unicode)
We strongly suggest however to use AL32UTF8 as NLS_CHARACTERSET, there is no added value in using one of the other varying width charactersets as NLS_CHARACTERSET. Basically AL32UTF8 is "the way forward", AL32UTF8 supports ALL characters defined in any of the other charactersets .
Changing the NLS_CHARACTERSET from AL32UTF8 to UTF8 (or from UTF8 to AL32UTF8) is also simply following Note 260192.1 Changing the NLS_CHARACTERSET to AL32UTF8 / UTF8 (Unicode) .
This is only needed for Oracle RDBMS Version 7 systems: Note 234381.1 Changing NLS_CHARACTERSET from AL24UTFFSS to UTF8 - AL32UTF8
If you have an AL32UTF8 or UTF8 database but it turns out your application is not ready for AL32UTF8 or UTF8 then you can revert this to an other NLS_CHARACTERSET by following.
note 1283764.1 Changing the NLS_CHARACTERSET From AL32UTF8 / UTF8 (Unicode) to another NLS_CHARACTERSET
D) Further reading
There are some additional considerations when you change the character set of an Oracle Applications database, please see the following note for a complete overview of those: Note 124721.1 Migrating an Applications Installation to a New Character Set
Note that display problems are most likely *NOT* resolved by start changing the database characterset.
Instead please check first of all if you can store/retrieve the data using SQLdeveloper, this is a "know good client" that needs no NLS configuration.
You can download it from http://www.oracle.com/technology/products/database/sql_developer/
If the data is displayed correctly in SQLdeveloper then you are sure it's correct in the database and that the current NLS_CHARACTERSET supports the character.
If this is the case then see the following notes to correctly configure your other client(s), you can then use the data entered trough SqlDeveloper as "reference":
Note 158577.1 NLS_LANG Explained (How does Client-Server Character Conversion Work?)
Note 179133.1 The correct NLS_LANG in a Windows Environment
Note 264157.1 The correct NLS_LANG on Unix Environments
Note 229786.1 NLS_LANG and webservers explained.
A more detailed debugging guide is Note 788931.1 Troubleshooting RDBMS (client and server) NLS Problems (Charactersets, sorts, dates, ..)
References
NOTE:124721.1 - Migrating an Applications Installation to a New Character Set
NOTE:282336.1 - Charts of most current mono-byte / 8 bit Character sets
NOTE:444701.1 - Csscan output explained
NOTE:458122.1 - Installing and Configuring Csscan in 8i and 9i (Database Character Set Scanner)
NOTE:60134.1 - Globalization (NLS) - Frequently Asked Questions
NOTE:745809.1 - Installing and configuring Csscan in 10g and 11g (Database Character Set Scanner)
Related Products
|