CSDN 输入Emoji颜文字乱码

Daydreaming Kid

已于 2024-02-29 10:49:02 修改

阅读量468

点赞数 8

分类专栏： CodingBug 文章标签： bug CSDN开发云个人开发 java

于 2023-11-24 10:43:51 首次发布

本文链接：https://blog.csdn.net/SpaceTravellers/article/details/134593282

版权

CodingBug 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

文章目录

- - Q:
  - Solution

Q:

在使用CSDN MarkDown时会出现出现乱码的情况。如下

1⃣️ 和 1️⃣

这两个在Apple输入法中显示相同内容，但是后者在mardown中正常显示，前者成了乱码。

前者是中文输入法中的颜文字，后者则是control + command +space 调出的颜文字框自带文字。

经过Java编码，我们可以发现两个表情编码均是UTF-8

public class Solution {
    public static void main(String[] args) {
        System.out.println(getEncoding("1⃣️"));
        System.out.println(getEncoding("1️⃣"));
        String a = new String("1⃣️");
        String b = new String("1️⃣");
        String c = new String("1⃣️");
        System.out.println(a.hashCode());
        System.out.println(b.hashCode());
        System.out.println(c.hashCode());
        
    }
    public static String getEncoding(String str) {
        String encode = "GB2312";
        try {
            if (isEncoding(str, encode)) { // 判断是不是GB2312
                return encode;
            }
        } catch (Exception exception) {
        }
        encode = "ISO-8859-1";
        try {
            if (isEncoding(str, encode)) { // 判断是不是ISO-8859-1
                return encode;
            }
        } catch (Exception exception1) {
        }
        encode = "UTF-8";
        try {
            if (isEncoding(str, encode)) { // 判断是不是UTF-8
                return encode;
            }
        } catch (Exception exception2) {
        }
        encode = "GBK";
        try {
            if (isEncoding(str, encode)) { // 判断是不是GBK
                return encode;
            }
        } catch (Exception exception3) {
        }
        return ""; // 如果都不是，说明输入的内容不属于常见的编码格式。
    }

    public static boolean isEncoding(String str, String encode) {
        try {
            if (str.equals(new String(str.getBytes(), encode))) {
                return true;
            }
        } catch (UnsupportedEncodingException e) {
            e.printStackTrace();
        }
        return false;
    }
}

但是当我们获取两个字符串的hashcode时，我们就会发现，两者的hashcode并不相同。同时，我们去转换颜文字为utf-8编码时，两者的编码也不尽相同。