String.indexOf的实现源码

最新推荐文章于 2022-04-27 08:00:00 发布

AlexTseGDUT

最新推荐文章于 2022-04-27 08:00:00 发布

阅读量967

点赞数

文章标签： java

本文链接：https://blog.csdn.net/weixin_45664397/article/details/116781755

版权

String的indexOf()方法源码

首先打开String.java的源码看到indexOf的代码

public int indexOf(int ch, int fromIndex) {
    return isLatin1() ? StringLatin1.indexOf(value, ch, fromIndex)
        : StringUTF16.indexOf(value, ch, fromIndex);
}

他判断了一个isLatin1()

boolean isLatin1() {
    return COMPACT_STRINGS && coder == LATIN1;
}

按照编码的不同调用StringLatin1和StringUTF16的indexOf()方法

public static int indexOf(byte[] value, int ch, int fromIndex) {	//StringLatin1的indexOf()方法
        if (!canEncode(ch)) {
            return -1;
        }
        int max = value.length;
        if (fromIndex < 0) {
            fromIndex = 0;
        } else if (fromIndex >= max) {
            // Note: fromIndex might be near -1>>>1.
            return -1;
        }
        byte c = (byte)ch;
        for (int i = fromIndex; i < max; i++) {
            if (value[i] == c) {
               return i;
            }
        }
        return -1;
    }


public static int indexOf(byte[] value, int ch, int fromIndex) {	//StringUTF16的indexOf()方法
        int max = value.length >> 1;
        if (fromIndex < 0) {
            fromIndex = 0;
        } else if (fromIndex >= max) {
            // Note: fromIndex might be near -1>>>1.
            return -1;
        }
        if (ch < Character.MIN_SUPPLEMENTARY_CODE_POINT) {
            // handle most cases here (ch is a BMP code point or a
            // negative value (invalid code point))
            return indexOfChar(value, ch, fromIndex, max);
        } else {
            return indexOfSupplementary(value, ch, fromIndex, max);
        }
    }

StringLatin1的方法原理：

判断ch能否转为byte，方法是看右移8位是否为0，如果为0说明除了低8位其他都为0
```
public static boolean canEncode(int cp) {
        return cp >>> 8 == 0;
    }
```
判断fromIndex是否合法，不合法即修正
将ch转为byte类型
从fromindex开始到max遍历检查数组中哪个值相等并返回对应的index值
查找失败返回-1

StringUTF16的方法原理：

由于本人不才，看不懂这个UTF16的源码，copy一段前辈的文章字段

类似地，对于 UTF16 编码也做类似处理，但因为 unicode 包含了基本多语言平面（Basic Multilingual Plane，BMP）外，还存在补充平面。而传入的值为 int 类型（4字节），所以如果超出 BMP 平面，此时需要4个字节，分别用来保存 High-surrogate 和 Low-surrogate，此时就需要对比4个字节。

贴出StringUTF16调用的方法源码：

public static final int MIN_SUPPLEMENTARY_CODE_POINT = 0x010000;

private static int indexOfChar(byte[] value, int ch, int fromIndex, int max) {
        checkBoundsBeginEnd(fromIndex, max, value);
        return indexOfCharUnsafe(value, ch, fromIndex, max);
    }



private static int indexOfSupplementary(byte[] value, int ch, int fromIndex, int max) {
        if (Character.isValidCodePoint(ch)) {
            final char hi = Character.highSurrogate(ch);
            final char lo = Character.lowSurrogate(ch);
            checkBoundsBeginEnd(fromIndex, max, value);
            for (int i = fromIndex; i < max - 1; i++) {
                if (getChar(value, i) == hi && getChar(value, i + 1) == lo) {
                    return i;
                }
            }
        }
        return -1;
    }

先挖个坑，等以后有能力了再回来看

AlexTseGDUT

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
String.indexOf的实现源码

String的indexOf()方法源码首先打开String.java的源码看到indexOf的代码public int indexOf(int ch, int fromIndex) { return isLatin1() ? StringLatin1.indexOf(value, ch, fromIndex) : StringUTF16.indexOf(value, ch, fromIndex);}他判断了一个isLatin1()boolean isLatin1().
复制链接

扫一扫