算法跟学Day9【代码随想录】-CSDN博客

本文链接：https://blog.csdn.net/qq_37016176/article/details/131229310

文章介绍了如何使用KMP算法实现字符串中的子串匹配，包括找到第一个匹配项的下标和检查重复子字符串。讲解了next数组在KMP算法中的作用，并提供了相关代码实现。同时，概述了Java字符串的特性，如线程安全和常用的库函数，以及String、StringBuffer和StringBuilder的区别和使用场景。

摘要由CSDN通过智能技术生成

第四章字符串part02

大纲

● 28. 实现 strStr()
● 459.重复的子字符串
● 字符串总结
● 双指针回顾

`leetcode 28` 找出字符串中第一个匹配项的下标

思路

KMP算法实现在目标串中找模式串

细节

经典字符串匹配算法使用之前匹配后的最大失败信息来判断新的子串

代码

class Solution {
    public int strStr(String haystack, String needle) {
        int m = haystack.length(), n = needle.length();
        if (m < n) return -1;
        int[] next = new int[n];
        for (int i = 1, j = 0; i < n; i++) {
            while (j > 0 && needle.charAt(i) != needle.charAt(j)) j = next[j - 1];
            if (needle.charAt(i) == needle.charAt(j)) j++;
            next[i] = j;
        }
        for (int i = 0, j = 0; i < m; i++) {
            while (j > 0 && haystack.charAt(i) != needle.charAt(j))  j = next[j - 1];
            if (haystack.charAt(i) == needle.charAt(j)) j++;
            if (j == n) return i - j + 1;
        }
        return -1;
    }
}

复杂度

时间 O(m+n)
空间 O(n)

`leetcode 459` 重复的子字符串

思路

KMP算法的经典应用 next数组实际意义的应用

细节

next数组中保存的是当前子串的公共前后缀长度，若为重复字符串，next数组中的分布必为 xxx0123456…
因为每次出现的相同子串作为公共前后缀在next数组中一直累加
注意最终判断next数组中最后一个初始化的元素和最后一个初始化的元素是否都满足条件
- 最后一个元素必不为零 && 长度 % （长度 - 最后元素值）== 0

代码

class Solution {
    public boolean repeatedSubstringPattern(String s) {
        int len = s.length();
        int[] next = new int[len];
        for (int i = 1, j = 0; i < len; i++) {
            while (j > 0 && s.charAt(i) != s.charAt(j)) j = next[j - 1];
            if (s.charAt(i) == s.charAt(j)) j++;
            next[i] = j;
        }
        if (next[len - 1] != 0 && len % (len - next[len - 1]) == 0) return true;
        return false;
    }
}

复杂度

时间 O(n)
空间 O(n)

字符串总结

Java字符串的库函数熟练使用

String

底层数据结构 private final char value[];
实现的接口
- Serializable
- CharSequence
- Comparable
类的修饰符 public final class String extends Object implements Serializable, Comparable, CharSequence
字段 static Comparator CASE_INSENSITIVE_ORDER // 默认字符串排序顺序受os环境影响
构造方法
- String()
- String(byte[] bytes)
- String(byte[] bytes, Charset charset)
- String(byte[] bytes, int offset, int length)
- String(byte[] bytes, int offset, int length, Charset charset)
- String(byte[] bytes, int offset, int length, String charsetName)
- String(byte[] bytes, String charsetName)
- String(char[] value)
- String(char[] value, int offset, int count)
- String(int[] codePoints, int offset, int count)
- String(String original)
- String(StringBuffer buffer)
- String(StringBuilder builder)
常用方法
- char charAt(int index)
- int codePointAt(int index) // Returns the character (Unicode code point) at the specified index.
- int compareTo(String anotherString)
- int compareToIgnoreCase(String str)
- String concat(String str)
- boolean contains(CharSequence s)
- boolean contentEquals(CharSequence cs)
- boolean contentEquals(StringBuffer sb)
- static String copyValueOf(char[] data)
- boolean endsWith(String suffix)
- boolean equalsIgnoreCase(String anotherString)
- byte[] getBytes()
- byte[] getBytes(Charset charset)
- byte[] getBytes(String charsetName)
- int indexOf(int ch)
- int indexOf(int ch, int fromIndex)
- int indexOf(String str)
- int indexOf(String str, int fromIndex)
- String intern() // Returns a canonical representation for the string object.
- boolean isEmpty()
- int lastIndexOf(int ch)
- int lastIndexOf(int ch, int fromIndex)
- int lastIndexOf(String str)
- int lastIndexOf(String str, int fromIndex)
- int length()
- boolean matches(String regex)
- String replace(char oldChar, char newChar)
- String replaceAll(String regex, String replacement)
- replaceFirst(String regex, String replacement)
- String[] split(String regex)
- String[] split(String regex, int limit)
- boolean startsWith(String prefix)
- boolean startsWith(String prefix, int toffset)
- CharSequence subSequence(int beginIndex, int endIndex)
- String substring(int beginIndex)
- String substring(int beginIndex, int endIndex)
- char[] toCharArray
- String toLowerCase()
- String toUpperCase()
- static String valueOf(boolean b / char c/ char[] data/ double d/ float f/ int i/ long l/ Object obj)
- static String valueOf(char[] data, int offset, int count) // Returns the string representation of a specific subarray of the char array argument.
继承得来的方法(Object)
- clone, finalize, getClass, notify, notifyAll, wait, wait, wait
线程安全问题
- String为不可变类底层数组也使用final修饰故而线程安全
字符串拼接
- String s = "a" + "b";会在编译期间被优化为String s = "ab";
- String s1 = "a", s2 = new String("b"); String s3 = s1 + s2;会在底层中调用StringBuilder来拼接
- String s1 = new String("a"), s2 = new String("b"); 总计创建多少对象？
  - 4~6个 new StringBuilder(), new String(“a”), new String(“b”), toString()
  - 之所以为4~6 是因为JVM先创建String对象再去常量池中查找是否存在字符串没有才会在常量池创建相应字符串对象
String的最大长度
- 底层是使用char数组实现而该数组的长度用int来表示故而理论上最大长度为2^31-1
- 实际上JVM实现的时候对String类型的结构体定义了索引u2, 即无符号占2字节即String长度被压缩为2^16-1
- 还有一个被占位的符号故最终大小2^16-2 即65534
- 注意这只是编译时最大长度运行的最大长度可以达到理论值2^32-1
- 在Java中，所有需要保存在常量池中的数据，长度最大不能超过65535，这当然也包括字符串的定义
new String()和new String(“”) 区别
- 一个是null一个是空串
intern()的作用
- 存在于.class文件中的常量池，在运行期被JVM装载，并且可以扩充
- String的intern()方法就是扩充常量池的一个方法
- 当一个String实例str调用intern()方法时，Java查找常量池中是否有相同Unicode的字符串常量，如果有，则返回其的引用，如果没有，则在常量池中增加一个Unicode等于str的字符串并返回它的引用

StringBuffer

底层数据结构 char[] value; 继承自AbstractBuilder
类的修饰 public final class StringBuffer extends AbstractStringBuilder implements Serializable, CharSequence
构造方法
- StringBuffer() // Constructs a string buffer with no characters in it and an initial capacity of 16 characters.
- StringBuffer(CharSequence seq) // Constructs a string buffer that contains the same characters as the specified CharSequence.
- StringBuffer(int capacity)
- StringBuffer(String str)
常用方法
- StringBuffer append(boolean b/ char c/ char[] str/ CharSequence s/ double d/ float f/ int i/ long l/ Object o/ String str/ StringBuffer sb)
- StringBuffer append(CharSequence s, int start, int end)
- int capacity()
- char charAt()
- StringBuffer delete(int start, int end)
- StringBuffer deleteCharAt(int index)
- void ensureCapacity(int minimumCapacity)
- void getChars(int srcBegin, int srcEnd, char[] dst, int dstBegin) // Characters are copied from this sequence into the destination character array dst.
- int indexOf(String str)
- int indexOf(String str, int fromIndex)
- StringBuffer insert(int offset, boolean b/…)
- int lastIndexOf(String str)
- int lastIndexOf(String str, int fromIndex)
- int length()
- StringBuffer replace(int start, int end, String str)
- StringBuffer reverse()
- void setCharAt(int index, char ch) // The character at the specified index is set to ch.
- void setLength(int newLength)
- CharSequence subSequence(int start, int end)
- String substring(int start)
- String substring(int start, int end)
- String toString()
- void trimToSize()
继承的方法
- Object
  - clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
- CharSequence
  - chars, codePoints
线程安全性
- 很多方法使用synchronized修饰保证线程安全
扩容机制
- 无参初始化时为16 带参初始化为参数字符串长度+16
- 扩容算法触发时机是当append()被调用该字符串长度超过存储空间新容量大小为原大小的2倍+2 (+2避免0左移1位还是0这样的问题)
- 若扩容后的大小扔无法满足append后的大小直接扩容到需要的大小
缩容机制
- 需要手动调用trimToSize()缩容类似于ArrayList等容器

StringBuilder

底层数据结构 char[] value; 继承自AbstractBuilder
类修饰 public final class StringBuilder extends AbstractStringBuilder implements Serializable, CharSequence
构造方法同StringBuffer
常用方法同StringBuffer 但无synchronized修饰
线程安全不安全可变数组方法也无同步锁修饰
扩容机制与StringBuffer的区别是新容量为原大小的2倍+1
缩容同StringBuffer

CharSequence

表示字符序列的接口对字符序列进行操作的方法灵活和通用化
常用方法
- char charAt(int index)
- default IntStream chars()
- default IntStream codePoints()
- int length()
- charSequence subSequence(int start, int end)
- String toString()