准确地判断字段是否含有汉字或者提取汉字

最新推荐文章于 2021-11-09 21:14:52 发布

flg_inwind

最新推荐文章于 2021-11-09 21:14:52 发布

阅读量735

点赞数

分类专栏： ORACLE 文章标签： integer sql function null oo 数据库

ORACLE 专栏收录该内容

36 篇文章 0 订阅

订阅专栏

转自：atgc的博客

http://atgc.itpub.net/category/22412/38862

写一函数，准确地判断字段是否含有汉字或者提取汉字等
从表里提取汉字, 需要考虑字符集, 不同的字符集汉字的编码有所不同
这里以GB2312为例, 写一函数准确地从表里提取简体汉字.

假设数据库字符集编码是GB2312, 环境变量(注册表或其它)的字符集也是GB2312编码
并且保存到表里的汉字也都是GB2312编码的

那么也就是汉字是双字节的，且简体汉字的编码范围是
B0A1 - F7FE
换算成10进制就是
B0 A1 F7 FE
176,161 - 247,254

我们先看一下asciistr函数的定义
Non-ASCII characters are converted to the form xxxx, where xxxx represents a UTF-16 code unit.
但是这并不表示以 "" 开始的字符就是汉字了

举例如下

create or replace function get_chinese(p_name    in varchar2,
                                       p_chinese in varchar2)
  return varchar2 as
  v_code        varchar2(30000) := '';
  v_chinese     varchar2(4000) := '';
  v_non_chinese varchar2(4000) := '';
  v_comma       pls_integer;
  v_code_q      pls_integer;
  v_code_w      pls_integer;
begin
  if p_name is not null then
    select replace(substrb(dump(p_name, 1010),
                           instrb(dump(p_name, 1010), 'ZHS16GBK:')),
                   'ZHS16GBK: ',
                   '')
      into v_code
      from dual
     where rownum = 1;
    for i in 1 .. length(p_name) loop
      if lengthb(substr(p_name, i, 1)) = 2 then
        v_comma  := instrb(v_code, ',');
        v_code_q := to_number(substrb(v_code, 1, v_comma - 1));
        v_code_w := to_number(substrb(v_code,
                                      v_comma + 1,
                                      abs(instrb(v_code, ',', 1, 2) -
                                          v_comma - 1)));
        if v_code_q >= 176 and v_code_q <= 247 and v_code_w >= 161 and
           v_code_w <= 254 then
          v_chinese := v_chinese || substr(p_name, i, 1);
        else
          v_non_chinese := v_non_chinese || substr(p_name, i, 1);
        end if;
        v_code := ltrim(v_code, '1234567890');
        v_code := ltrim(v_code, ',');
      else
        v_non_chinese := v_non_chinese || substr(p_name, i, 1);
      end if;
      v_code := ltrim(v_code, '1234567890');
      v_code := ltrim(v_code, ',');
    end loop;
    if p_chinese = '1' then
      return v_chinese;
    else
      return v_non_chinese;
    end if;
  else
    return '';
  end if;
end;

这里第5条记录有一个实心的五角星
然后用asciistr函数转换一下试试

我们看到最后一条记录的实心五角星也是 ""开头的
此时我们就不能用asciistr(字段)是否存在 "" 来判断是否含有汉字了.

我的函数如下，基本思路是判断字符的编码是否在GB2312规定的汉字编码范围之内

create or replace function get_chinese(p_name in varchar2) return varchar2 as v_code varchar2(30000) := ''; v_chinese varchar2(4000) := ''; v_comma pls_integer; v_code_q pls_integer; v_code_w pls_integer; begin if p_name is not null then select replace(substrb(dump(p_name, 1010), instrb(dump(p_name, 1010), 'ZHS16GBK:')), 'ZHS16GBK: ', '') into v_code from dual where rownum = 1; for i in 1 .. length(p_name) loop if lengthb(substr(p_name, i, 1)) = 2 then v_comma := instrb(v_code, ','); v_code_q := to_number(substrb(v_code, 1, v_comma - 1)); v_code_w := to_number(substrb(v_code, v_comma + 1, abs(instrb(v_code, ',', 1, 2) - v_comma - 1))); if v_code_q >= 176 and v_code_q <= 247 and v_code_w >= 161 and v_code_w <= 254 then v_chinese := v_chinese || substr(p_name, i, 1); end if; v_code := ltrim(v_code, '1234567890'); v_code := ltrim(v_code, ','); end if; v_code := ltrim(v_code, '1234567890'); v_code := ltrim(v_code, ','); end loop; return v_chinese; else return ''; end if; end;

好，现在来执行一些语句

5 rows selected.

1. 列出有汉字的记录

2. 列出有汉字的记录，并且只列出汉字

需要说明的是GB2312共有6763个汉字，即72*94-5=6763
我这里是计算72*94，没有减去那5个，那五个是空的。等查到了再减去
============

改写这个函数，可以提取非汉字或者汉字
该函数有两个参数，第一个表示要提取的字符串，第二个是1，表示提取汉字，是非1，表示提取非汉字

.--------------------------------------------------------------------------------

flg_inwind

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
准确地判断字段是否含有汉字或者提取汉字

转自：atgc的博客http://atgc.itpub.net/category/22412/38862 写一函数，准确地判断字段是否含有汉字或者提取汉字等从表里提取汉字, 需要考虑字符集, 不同的字符集汉字的编码有所不同这里以GB2312为例, 写一函数准确地从表里提取简体汉字.假设数据库字符集编码是GB2312, 环境变量(注册表或其它)的字符集也是GB231
复制链接

扫一扫