lob类型的字段由数据库自动管理,一个字段对应一个LOBSEGMENT和一个LOBINDEX, 在dba_segment中看到的表大小不包括lob字段的大小,在dba_lobs里可以看到表合 LOBSEGMENT的对应关系。
今天我主要是关注了以下LOBSEGMENT占用的物理空间的大小。
有张表T,其中字段columnC 为clob类型,其中columnC是经过Base64编码后的,所以其中的字符全部为英文,不包含汉字,数据库字符集为UTF-8.但是现象是这样的
select sum(length(columnC))/1024/1024,count(*) from T where columnC is not null
SUM(LENGTH(columnC))/1024/1024 COUNT(*)
5413.65621185302734375 35919
下面再看从dba_segment中看到的结果
select s.OWNER,s.SEGMENT_NAME,lob.TABLE_NAME,s.TABLESPACE_NAME,s.BYTES/1024/1024 from dba_segments s,dba_lobs lob
where s.SEGMENT_NAME = lob.SEGMENT_NAME
and lob.table_name = 'T'
OWNER SEGMENT_NAME TABLE_NAME TABLESPACE_NAME S.BYTES/1024/1024
aa SYS_LOB0000154534C00002$$ T ttt 12325
结果相差比较多,考虑到这张表T到目前为止都是在做insert,和select,没有delete和update,产生的碎片应该不会这么多,但是如果按照UTF-8存储的英文是一个字节的,这又怎么解释呢,于是我dump了数据文件中的其中的一些block出来。
00 35 00 4e 00 51 00 74 00 37 00 64 00 64 00 72 00 67 00 2b 00 61 00 65
00 69 00 41 00 5a 00 7a 00 55 00 71 00 61 00 6b 00 37 00 44 00 4c 00 51
00 32 00 73 00 65 00 52 00 67 00 67 00 5a 00 48 00 76 00 53 00 58 00 4b
00 74 00 35 00 42 00 63 00 77 00 77 00 73 00 46 00 35 00 33 00 50 00 32
00 2f 00 44 00 6e 00 2b 00 56 00 56 00 49 00 44 00 66 00 7a 00 79 00 6f
00 37 00 6c 00 49 00 34 00 73 00 35 00 32 00 6b 00 5a 00 4a 00 48 00 39
00 4b 00 75 00 47 00 56 00 49 00 35 00 46 00 6c 00 75 00 4d 00 46 00 51
00 77 00 44 00 45 00 6e 00 43 00 67 00 65 00 39 00 58 00 74 00 71 00 42
00 68 00 33 00 65 00 6f 00 4e 00 63 00 53 00 4d 00 73 00 4e 00 79 00 67
00 41 00 54 00 4d 00 6a 00 78 00 68 00 6a 00 6b 00 5a 00 41 00 77 00 4d
00 39 00 44 00 7a 00 37 00 56 00 6a 00 7a 00 70 00 4d 00 37 00 73 00 59
00 6a 00 76 00 69 00 59 00 35 00 38 00 78 00 51 00 63 00 6e 00 38 00 54
00 33 00 72 00 63 00 31 00 4f 00 39 00 73 00 57 00 75 00 58 00 57 00 78
00 74 00 6c 00 45 00 45 00 67 00 35 00 32 00 6e 00 49 00 63 00 6b 00 45
00 67 00 6a 00 36 00 66 00 31 00 72 00 4b 00 64 00 65 00 52 00 44 00 43
00 6a 00 73 00 70 00 64 00 51 00 47 00 42 00 34 00 4a 00 2b 00 67 00 72
发现其中的字符都是双字节的,这又是怎么回事呢,后来在文档中查到了以下描述
Varying-width character data in CLOB and NCLOB datatypes is stored in an internal format that is compatible with UCS2 Unicode character set format. This ensures that there is no storage loss of character data in a varying-width format. Also note the following if you are using LOBs to store varying-width character data:
可见oracle是内容都转换为了UCS2来存储的,而UCS2是固定双字节的。
今天我主要是关注了以下LOBSEGMENT占用的物理空间的大小。
有张表T,其中字段columnC 为clob类型,其中columnC是经过Base64编码后的,所以其中的字符全部为英文,不包含汉字,数据库字符集为UTF-8.但是现象是这样的
select sum(length(columnC))/1024/1024,count(*) from T where columnC is not null
SUM(LENGTH(columnC))/1024/1024 COUNT(*)
5413.65621185302734375 35919
下面再看从dba_segment中看到的结果
select s.OWNER,s.SEGMENT_NAME,lob.TABLE_NAME,s.TABLESPACE_NAME,s.BYTES/1024/1024 from dba_segments s,dba_lobs lob
where s.SEGMENT_NAME = lob.SEGMENT_NAME
and lob.table_name = 'T'
OWNER SEGMENT_NAME TABLE_NAME TABLESPACE_NAME S.BYTES/1024/1024
aa SYS_LOB0000154534C00002$$ T ttt 12325
结果相差比较多,考虑到这张表T到目前为止都是在做insert,和select,没有delete和update,产生的碎片应该不会这么多,但是如果按照UTF-8存储的英文是一个字节的,这又怎么解释呢,于是我dump了数据文件中的其中的一些block出来。
00 35 00 4e 00 51 00 74 00 37 00 64 00 64 00 72 00 67 00 2b 00 61 00 65
00 69 00 41 00 5a 00 7a 00 55 00 71 00 61 00 6b 00 37 00 44 00 4c 00 51
00 32 00 73 00 65 00 52 00 67 00 67 00 5a 00 48 00 76 00 53 00 58 00 4b
00 74 00 35 00 42 00 63 00 77 00 77 00 73 00 46 00 35 00 33 00 50 00 32
00 2f 00 44 00 6e 00 2b 00 56 00 56 00 49 00 44 00 66 00 7a 00 79 00 6f
00 37 00 6c 00 49 00 34 00 73 00 35 00 32 00 6b 00 5a 00 4a 00 48 00 39
00 4b 00 75 00 47 00 56 00 49 00 35 00 46 00 6c 00 75 00 4d 00 46 00 51
00 77 00 44 00 45 00 6e 00 43 00 67 00 65 00 39 00 58 00 74 00 71 00 42
00 68 00 33 00 65 00 6f 00 4e 00 63 00 53 00 4d 00 73 00 4e 00 79 00 67
00 41 00 54 00 4d 00 6a 00 78 00 68 00 6a 00 6b 00 5a 00 41 00 77 00 4d
00 39 00 44 00 7a 00 37 00 56 00 6a 00 7a 00 70 00 4d 00 37 00 73 00 59
00 6a 00 76 00 69 00 59 00 35 00 38 00 78 00 51 00 63 00 6e 00 38 00 54
00 33 00 72 00 63 00 31 00 4f 00 39 00 73 00 57 00 75 00 58 00 57 00 78
00 74 00 6c 00 45 00 45 00 67 00 35 00 32 00 6e 00 49 00 63 00 6b 00 45
00 67 00 6a 00 36 00 66 00 31 00 72 00 4b 00 64 00 65 00 52 00 44 00 43
00 6a 00 73 00 70 00 64 00 51 00 47 00 42 00 34 00 4a 00 2b 00 67 00 72
发现其中的字符都是双字节的,这又是怎么回事呢,后来在文档中查到了以下描述
Varying-width character data in CLOB and NCLOB datatypes is stored in an internal format that is compatible with UCS2 Unicode character set format. This ensures that there is no storage loss of character data in a varying-width format. Also note the following if you are using LOBs to store varying-width character data:
可见oracle是内容都转换为了UCS2来存储的,而UCS2是固定双字节的。
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/13350499/viewspace-614693/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/13350499/viewspace-614693/