JAVA 查找并移除字符串中的Emoji

一、基础知识

  1. Emoji 实际上是 UTF-8 (Unicode) 字符集上的特殊字符,多数基本 Emoji 都被分配到 Unicode 编码表 1 号平面的 U+1F300–1F6FF 和 U+1F900–1FAFF 两个区域,由2个字符组成。

  2. 肤色修饰:大多数与人相关的 Emoji 默认是黄色的,所以后来引入了五个新码点作为修饰符:U+1F3FBU+1F3FCU+1F3FDU+1F3FEU+1F3FF。肤色修饰符追加到现有的 Emoji 后形成新的样式: U+1F44B(👋 ) + U+1F3FD= 👋🏽

  3. 符号变体或组合:一个普通的字后连接一个或多个变体、组合标识(字符),组合形成的 Emoji : U+25C0+U+FE0F= ◀️ U+27A1+U+FE0F= ➡️ 1+U+FE0F+U+20E3= 1️⃣

  4. 国旗:每个国旗由2个地区标识符组合而成,地区标识符的对应码点范围为U+1F1E6~U+1F1FF,等同于2个指定范围的普通 Emoji 字符组成。 U+1F1E8+U+1F1F3= 🇨🇳

  5. 零宽度连接符(ZWJ):多个基础 Emoji 通过零宽度连接符(U+200D)形成的复杂 Emoji: 👩+U+200D+🌾= 👩‍🌾 👩+U+200D+❤️‍+U+200D+👩= 👩‍❤️‍👩 👨+U+200D+❤️‍+U+200D+💋+U+200D+👨= 👨‍❤️‍💋‍👨

  6. 序列:一个基础 Emoji 加上多个标签字符 (U+E0020~ U+E007F )并以 Tag Cancel(U+E007)结尾,组合形成一个复杂 Emoji: U+1F3F4(🏴) +U+E0067+U+E0062+U+E0065+U+E006E+U+E0067+U+E007F= 🏴󠁧󠁢󠁥󠁮󠁧󠁿

  7. 特殊符号: 特殊符号只有1个字符,有些符号在某些环境下会被当做Emoj处理:⏯、⏫、⏹;

Unicode 只是约定了码点到 emoji 的映射关系,并没有约定 Emoji 图形,每个 Emoji 字体文件可以按照自己的想法设计 Emoji。

二、解决方案

  1. 除了一些特殊符号形式的 Emoji,其他Emoji至少有2个字符,所以先根据第二个字符类型判断是否为Emoji,使用Character.UnicodeBlock.ofCharacter.getType方法判定每个字符的类型。
  2. 通过第二个字符类型判断当前2个字符为 Emoji 后: 1)判断是否有后续修饰 2)判断处理国旗类型;判断处理肤色修饰;判断处理 Emoji 序列标签;判断处理零宽度连接符;判断处理连续变体、组合标识;按照普通 Emoji 处理;
  3. 处理单字符的特殊符号,这一类型内有的属于 Emoji,有的不是,目前全部简单的按照普通 Emoji 处理;

三、完整代码

package com.zpf.tool;

import java.util.List;

public class EmojiUtil {

    public static boolean isEmojiNationalFlag(int codePoint) {
        return codePoint >= 127462 && codePoint <= 127487;
    }

    // String str = new String(new int[]{0x1F44B, 0x1F3FD}, 0, 2);
    public static boolean isEmojiSkinColor(int codePoint) {
        return codePoint >= 127995 && codePoint <= 127999;
    }

    // String str = new String(new int[]{0x1F3F4, 0xE0067, 0xE0062, 0xE0065, 0xE006E, 0xE0067, 0xE007F}, 0, 7);
    public static boolean isEmojiTagEnd(int codePoint) {
        return codePoint == 917631;
    }

    public static boolean isEmojiTagSpec(int codePoint) {
        return codePoint >= 917536 && codePoint <= 917630;
    }

    public static boolean isEmojiDecorateBlock(Character.UnicodeBlock block) {
        if (block == null) {
            return false;
        }
        return block.equals(Character.UnicodeBlock.VARIATION_SELECTORS)
                || block.equals(Character.UnicodeBlock.VARIATION_SELECTORS_SUPPLEMENT)
                || block.equals(Character.UnicodeBlock.COMBINING_HALF_MARKS)
                || block.equals(Character.UnicodeBlock.COMBINING_MARKS_FOR_SYMBOLS)
                || block.equals(Character.UnicodeBlock.COMBINING_DIACRITICAL_MARKS)
                || block.equals(Character.UnicodeBlock.COMBINING_DIACRITICAL_MARKS_SUPPLEMENT);
    }

    public static void pickAllEmoji(CharSequence data, StringBuilder removeResult, List<String> emojiList) {
        if (removeResult == null && emojiList == null) {
            return;
        }
        if (removeResult != null) {
            removeResult.delete(0, removeResult.length());
        }
        if (emojiList != null) {
            emojiList.clear();
        }
        if (data == null || data.length() == 0) {
            return;
        }
        StringBuilder emojiBuilder = new StringBuilder();
        int i = 0;
        int j;
        Character.UnicodeBlock block;
        while (i < data.length()) {
            if (i + 1 < data.length()) {
                block = Character.UnicodeBlock.of(data.charAt(i + 1));
                if (isEmojiDecorateBlock(block) || Character.UnicodeBlock.LOW_SURROGATES.equals(block)) {
                    if (i + 2 >= data.length()) {
                        emojiBuilder.append(data, i, i + 2);
                        break;
                    }
                    j = handleNationalFlag(data, i, emojiBuilder, emojiList);
                    if (i != j) {
                        i = j;
                        continue;
                    }
                    j = handleHumanSkin(data, i, emojiBuilder, emojiList);
                    if (i != j) {
                        i = j;
                        continue;
                    }
                    j = handleTagSequence(data, i, emojiBuilder, emojiList);
                    if (i != j) {
                        i = j;
                        continue;
                    }
                    emojiBuilder.append(data, i, i + 2);
                    i = handleNextChar(data, i + 2, emojiBuilder, emojiList);
                    continue;
                }
            }
            recordEmoji(emojiBuilder, emojiList);
            int type = Character.getType(data.charAt(i));
            if (type == (int) Character.OTHER_SYMBOL) {//特殊符号一律按照Emoji处理
                if (emojiList != null) {
                    emojiList.add(String.valueOf(data.charAt(i)));
                }
            } else if (removeResult != null) {
                removeResult.append(data.charAt(i));
            }
            i++;
        }
        recordEmoji(emojiBuilder, emojiList);
    }

    private static int handleNextChar(CharSequence data, int i, StringBuilder emojiBuilder, List<String> emojiList) {
        if (i >= data.length()) {
            return i;
        }
        char nextChar = data.charAt(i);
        if (nextChar == '\u200D') {//零宽度连接符
            emojiBuilder.append(nextChar);
            return i + 1;
        }
        int j = i;
        Character.UnicodeBlock block;
        while (j < data.length()) {
            nextChar = data.charAt(j);
            block = Character.UnicodeBlock.of(nextChar);
            if (isEmojiDecorateBlock(block)) {
                emojiBuilder.append(nextChar);
                j++;
            } else {
                break;
            }
        }
        if (i != j) {
            recordEmoji(emojiBuilder, emojiList);
        }
        return j;
    }

    private static int handleNationalFlag(CharSequence data, int i, StringBuilder emojiBuilder, List<String> emojiList) {
        int codePoint = Character.codePointAt(data, i);
        if (isEmojiNationalFlag(codePoint)) {//处理国旗类型
            recordEmoji(emojiBuilder, emojiList);//提交未处理
            if (i + 3 < data.length()) {
                codePoint = Character.codePointAt(data, i + 2);
                if (isEmojiNationalFlag(codePoint)) {
                    emojiBuilder.append(data, i, i + 4);
                    recordEmoji(emojiBuilder, emojiList);
                    i = i + 4;
                }
            }
            i = i + 2;
        }
        return i;
    }

    private static int handleHumanSkin(CharSequence data, int i, StringBuilder emojiBuilder, List<String> emojiList) {
        if (i + 3 >= data.length()) {
            return i;
        }
        int codePoint = Character.codePointAt(data, i + 2);
        if (isEmojiSkinColor(codePoint)) {//肤色修饰
            emojiBuilder.append(data, i, i + 4);
            recordEmoji(emojiBuilder, emojiList);
            i = i + 4;
        }
        return i;
    }

    private static int handleTagSequence(CharSequence data, int i, StringBuilder emojiBuilder, List<String> emojiList) {
        if (i + 3 >= data.length()) {
            return i;
        }
        int codePoint = Character.codePointAt(data, i + 2);
        if (isEmojiTagSpec(codePoint)) {
            emojiBuilder.append(data, i, i + 4);
            i = i + 4;
            while (i < data.length()) {
                codePoint = Character.codePointAt(data, i);
                if (isEmojiTagSpec(codePoint)) {
                    emojiBuilder.append(data, i, i + 2);
                    i = i + 2;
                } else if (isEmojiTagEnd(codePoint)) {
                    emojiBuilder.append(data, i, i + 2);
                    recordEmoji(emojiBuilder, emojiList);
                    i = i + 2;
                    break;
                } else { //error
                    break;
                }
            }
            emojiBuilder.delete(0, emojiBuilder.length());
        } else if (isEmojiTagEnd(codePoint)) {
            emojiBuilder.append(data, i, i + 4);
            recordEmoji(emojiBuilder, emojiList);
            i = i + 4;
        }
        return i;
    }

    private static void recordEmoji(StringBuilder builder, List<String> emojiList) {
        if (builder != null && builder.length() > 0) {
            if (emojiList != null) {
                emojiList.add(builder.toString());
            }
            builder.delete(0, builder.length());
        }
    }

}

最后

如果想要成为架构师或想突破20~30K薪资范畴,那就不要局限在编码,业务,要会选型、扩展,提升编程思维。此外,良好的职业规划也很重要,学习的习惯很重要,但是最重要的还是要能持之以恒,任何不能坚持落实的计划都是空谈。

如果你没有方向,这里给大家分享一套由阿里高级架构师编写的《Android八大模块进阶笔记》,帮大家将杂乱、零散、碎片化的知识进行体系化的整理,让大家系统而高效地掌握Android开发的各个知识点。
img
相对于我们平时看的碎片化内容,这份笔记的知识点更系统化,更容易理解和记忆,是严格按照知识体系编排的。

欢迎大家一键三连支持,若需要文中资料,直接扫描文末CSDN官方认证微信卡片免费领取↓↓↓(文末还有ChatGPT机器人小福利哦,大家千万不要错过)

PS:群里还设有ChatGPT机器人,可以解答大家在工作上或者是技术上的问题
图片

  • 25
    点赞
  • 10
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值