I am having some data reject on an insert of Japanese characters to a
Varchar column in my DB2 UTF-8 database. I am trying to understand how
big to make the Varchar column for the inesert to work successfully for
all of my data. I would also like ot understand if Vargraphic is a
better approach and what ramifications that might have.
This data is sourced from UTF-8 Oracle (using byte semantics) which
defines a columns as
Varchar2(255). I had defined my DB2 columns Varchar(255) but the
insert rejects some rows.
I notice that for Oracle : if I apply the length function to a specific
column , I get a value of 8 for a specified key. However, for this
same key that sucecssfully loaded on the DB2 side, I get a length of
33. Sometimes, the length on the DB2 side is more than 5 times the
Oracle side. I do not understand how this can be.
Also, Can someone clarify how much storage DB2 allocates in bytes, for
the following (assume non-nullable) ? My assumptions are below -
CHAR(1) - assume 1 byte
CHAR(100) - assume 100 bytes
VARCHAR(1) - assume length + 1 byte = 2 bytes
VARCHAR(100) - assume lenght + 100 bytes = 101 bytes at most
VARGRAPHIC(100) - ?
Thank you in advance for your assistance.
解决方案"mike_dba" wrote in message
news:11**********************@i39g2000cwa.googlegr oups.com...I am having some data reject on an insert of Japanese characters to a
Varchar column in my DB2 UTF-8 database. I am trying to understand how
big to make the Varchar column for the inesert to work successfully for
all of my data. I would also like ot understand if Vargraphic is a
better approach and what ramifications that might have.
This data is sourced from UTF-8 Oracle (using byte semantics) which
defines a columns as
Varchar2(255). I had defined my DB2 columns Varchar(255) but the
insert rejects some rows.
I notice that for Oracle : if I apply the length function to a specific
column , I get a value of 8 for a specified key. However, for this
same key that sucecssfully loaded on the DB2 side, I get a length of
33. Sometimes, the length on the DB2 side is more than 5 times the
Oracle side. I do not understand how this can be.
Also, Can someone clarify how much storage DB2 allocates in bytes, for
the following (assume non-nullable) ? My assumptions are below -
CHAR(1) - assume 1 byte
CHAR(100) - assume 100 bytes
VARCHAR(1) - assume length + 1 byte = 2 bytes
VARCHAR(100) - assume lenght + 100 bytes = 101 bytes at most
VARGRAPHIC(100) - ?
Thank you in advance for your assistance.
I would triple the size of the VARCHAR in DB2. You can go up to about 32K
(not quite) with VARCHAR in DB2.
DB2 uses 2 extra bytes for the length, but that does not reduce the amount
of space for the data. If it is nullable, it uses one additional byte.
Thank you for your response. The triple size will work in most cases.
However, we have noticed the following scenario -
I have a varchar(255) in Oracle. For a particular data item on Oracle,
I apply the Oracle function for length and get 86 character back. I
also apply the Oracle function for byte size and get back 238.
Inserting this same data into DB2 varchar(255) rejects. Inserting
into DB2 varchar(2000) works. DB2 length function returns 466 (this
should be bytes, correct ?). How could this be ?
I don''t know how Oracle measures the length of a UTF-8 varchar field or
how it physically stores it but UDB stores UTF-8 data as 1-4 bytes for
each character. Characters in pictographic languages, especially
Japanese, Chinese, and other languages used in that part of the world,
are more often the longer three and four byte ones.
A varchar definition specifies the number of bytes, not the number of
characters to be stored. The data capacity of the column can''t be
precisely determined because each character may occupy a different
number of bytes.
A single UTF-8 character occupies a maximum of four bytes. Specify a
varchar length four times the number of characters you expect to store
and you''ll always have enough space. Unfortunately, this will NOT
prevent storing more characters than you want if each character is
shorter than the 4/character worst case.
Philip Sherman
mike_dba wrote:Thank you for your response. The triple size will work in most cases.
However, we have noticed the following scenario -
I have a varchar(255) in Oracle. For a particular data item on Oracle,
I apply the Oracle function for length and get 86 character back. I
also apply the Oracle function for byte size and get back 238.
Inserting this same data into DB2 varchar(255) rejects. Inserting
into DB2 varchar(2000) works. DB2 length function returns 466 (this
should be bytes, correct ?). How could this be ?