生僻字在数据库中的存储解决方案-CSDN博客

一、Oracle数据库
就一般情况来说，Oracle存储中英文的字段用varchar2类型就可以了，但有些时候，遇到生僻字就不行了，在默认字符集环境下，实现Oracle储存生僻字：㛃、䶮…（使用nvarchar2字段类型实现）
首先，把生僻字转换为Unicode。工具链接https://tool.chinaz.com/tools/unicode.aspx
“㛃” 转为Unicode为 “\u36c3”（注意： \u 是Unicode的转义字符，使用的时候要去掉）然后，插入SQL进行比较一下。

insert into TEST(SNAME,TNAME)  values('u36c3','㛃');

二、mysql数据库
将该表或者改字段设置为字符集修改为utf8mb4
在这里插入图片描述
三、postgresql数据库
该数据库设置varchar可以存储生僻字

四、java处理生僻字

public class RareCharacterUtility {

    public static boolean containsUserDefinedUnicode(String string) {
        if (string == null) {
            throw new NullPointerException("Stirng must be non-null");
        }
        int[] code = toCodePointArray(string);
        // U+E000..U+F8FF
        for (int c : code) {
            if (c >= '\ue000' && c <= '\uf8ff') {
                return true;
            }
        }
        return false;
    }

    static int[] toCodePointArray(String str) {
        int len = str.length();
        int[] acp = new int[str.codePointCount(0, len)];

        for (int i = 0, j = 0; i < len; i = str.offsetByCodePoints(i, 1)) {
            acp[j++] = str.codePointAt(i);
        }
        return acp;
    }

    static String toHex(int[] chars) {
        String r = "[";
        for (int i=0; i<chars.length; i++) {
            if (r.length() > 1) {
                r += ",";
            }
            r += Integer.toHexString(chars[i]);
        }
        r += "]";
        return r;
    }

    public static void main(String[] argu) {
        String rr = ("\u5f20\ue0bf\uD86C\uDE70\uD840\uDC10\uD86D\uDF44\uD87E\uDCAC\u9fc6");
        System.out.println("Unicode = " + toHex(toCodePointArray(rr)));

        boolean r = (containsUserDefinedUnicode(rr));
        System.out.println("Test result = " + r + " should be true");
    }
}

参考:https://blog.csdn.net/qq_37312208/article/details/81869779?from=timeline