PostgreSQL Oracle 兼容性之 - substrb (基于字节的字符串截取)

最新推荐文章于 2024-01-10 09:30:05 发布

weixin_34259232

最新推荐文章于 2024-01-10 09:30:05 发布

阅读量2.8k

点赞数

文章标签：数据库

原文链接：https://yq.aliyun.com/articles/228289

版权

背景

Oracle的substrb函数，用于基于字节流的截取，需要考虑多字节字符串的编码问题，未截取完整字符，则不截取。

https://docs.oracle.com/cd/B12037_01/olap.101/b10339/x_stddev004.htm

substr则用于基于字符串的截取。

PostgreSQL也可以支持类似的功能。

orafce插件

安装orafce插件，里面包含了大量的oracle兼容函数。

https://pgxn.org/dist/orafce/

postgres=# \df *.*substrb*  
                               List of functions  
   Schema   |  Name   | Result data type |    Argument data types     |  Type    
------------+---------+------------------+----------------------------+--------  
 pg_catalog | substrb | varchar2         | varchar2, integer          | normal  
 pg_catalog | substrb | varchar2         | varchar2, integer, integer | normal  
(2 rows)

实际上这部分代码在PostgreSQL中已经存在，只是没有创建SQL函数。

src/backend/utils/adt/varlena.c

/*  
 * bytea_substr()  
 * Return a substring starting at the specified position.  
 * Cloned from text_substr and modified as required.  
 *  
 * Input:  
 *      - string  
 *      - starting position (is one-based)  
 *      - string length (optional)  
 *  
 * If the starting position is zero or less, then return from the start of the string  
 * adjusting the length to be consistent with the "negative start" per SQL.  
 * If the length is less than zero, an ERROR is thrown. If no third argument  
 * (length) is provided, the length to the end of the string is assumed.  
 */  
Datum  
bytea_substr(PG_FUNCTION_ARGS)  
{  
        PG_RETURN_BYTEA_P(bytea_substring(PG_GETARG_DATUM(0),  
                                                                          PG_GETARG_INT32(1),  
                                                                          PG_GETARG_INT32(2),  
                                                                          false));  
}  
  
  
static bytea *  
bytea_substring(Datum str,  
                                int S,  
                                int L,  
                                bool length_not_specified)  
{  
        int                     S1;                             /* adjusted start position */  
        int                     L1;                             /* adjusted substring length */  
  
        S1 = Max(S, 1);  
  
        if (length_not_specified)  
        {  
                /*  
                 * Not passed a length - DatumGetByteaPSlice() grabs everything to the  
                 * end of the string if we pass it a negative value for length.  
                 */  
                L1 = -1;  
        }  
        else  
        {  
                /* end position */  
                int                     E = S + L;  
  
                /*  
                 * A negative value for L is the only way for the end position to be  
                 * before the start. SQL99 says to throw an error.  
                 */  
                if (E < S)  
                        ereport(ERROR,  
                                        (errcode(ERRCODE_SUBSTRING_ERROR),  
                                         errmsg("negative substring length not allowed")));  
  
                /*  
                 * A zero or negative value for the end position can happen if the  
                 * start was negative or one. SQL99 says to return a zero-length  
                 * string.  
                 */  
                if (E < 1)  
                        return PG_STR_GET_BYTEA("");  
  
                L1 = E - S1;  
        }  
  
        /*  
         * If the start position is past the end of the string, SQL99 says to  
         * return a zero-length string -- DatumGetByteaPSlice() will do that for  
         * us. Convert to zero-based starting position  
         */  
        return DatumGetByteaPSlice(str, S1 - 1, L1);  
}

postgres=# select octet_length(public.substrb('nihao 中国 abc你好'::varchar,2,9));
 octet_length 
--------------
            9
(1 row)

postgres=# select public.substrb('nihao 中国 abc你好'::varchar,2,9);
 substrb 
---------
 ihao 中
(1 row)

postgres=# select public.substrb('nihao 中国 abc你好'::varchar,2,10);
 substrb 
---------
 ihao 中
(1 row)

postgres=# select public.substrb('nihao 中国 abc你好'::varchar,2,8);
 substrb 
---------
 ihao 中
(1 row)

postgres=# select public.substrb('nihao 中国 abc你好'::varchar,2,6);
 substrb 
---------
 ihao 
(1 row)

postgres=# select public.substrb('nihao 中国 abc你好'::varchar,2,7);
 substrb 
---------
 ihao 
(1 row)

postgres=# select public.substrb('nihao 中国 abc你好'::varchar,2,9);
 substrb 
---------
 ihao 中
(1 row)

postgres=# select public.substrb('nihao 中国 abc你好'::varchar,2,10);
 substrb 
---------
 ihao 中
(1 row)