Java基础知识整理1：String

最新推荐文章于 2024-08-24 17:09:16 发布

weixin_30666753

最新推荐文章于 2024-08-24 17:09:16 发布

阅读量93

点赞数

文章标签： java 数据库

原文链接：http://www.cnblogs.com/ssj234/p/6387966.html

版权

归纳

字符串是不可变的，内部有一个final的char[]，保存字符串内部的字符，必须在构造方法中进行初始化。
字符串字面量、常量在类加载时保存在方法区的常量池中。
CodePoint表示一个完整字符，是一个int值，32位，而char是16位，codePoint需要在0000-FFFF区间，超过FFFF使用Surrogate。Unicode 包含的字符已经远远超过65536个（0-10FFFF）。那编号大于65536的，还要用 16-bit 编码。于是，Unicode 标准制定组想出的办法就是，从这65536个编号里，拿出2048个，规定它们是「Surrogates」，让它们两个为一组，来代表编号大于65536的那些字符。
字节数组转换为字符串时，需要根据字符集进行转换，使用StringDecoder
substring每次创建一个新的字符串实例，根据旧字符串的offset和count进行复制。
hashcode的公式为：s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]

JDK描述

public final class String extends Objectimplements Serializable, Comparable, CharSequenceString 类代表字符串。Java 程序中的所有字符串字面值（如 "abc" ）都作为此类的实例实现。

字符串是常量；它们的值在创建之后不能更改。字符串缓冲区支持可变的字符串。因为 String 对象是不可变的，所以可以共享。例如：

 String str = "abc";

等效于：

 char data[] = {'a', 'b', 'c'};
 String str = new String(data);

下面给出了一些如何使用字符串的更多示例：

 System.out.println("abc");
 String cde = "cde";
 System.out.println("abc" + cde);
 String c = "abc".substring(2,3);
 String d = cde.substring(1, 2);

String 类包括的方法可用于检查序列的单个字符、比较字符串、搜索字符串、提取子字符串、创建字符串副本并将所有字符全部转换为大写或小写。大小写映射基于 Character 类指定的 Unicode 标准版。

Java 语言提供对字符串串联符号（"+"）以及将其他对象转换为字符串的特殊支持。
字符串串联是通过 StringBuilder（或 StringBuffer）类及其 append 方法实现的。字符串转换是通过 toString 方法实现的，该方法由 Object 类定义，并可被 Java 中的所有类继承。有关字符串串联和转换的更多信息，请参阅 Gosling、Joy 和 Steele 合著的 The Java Language Specification。

除非另行说明，否则将 null 参数传递给此类中的构造方法或方法将抛出 NullPointerException。

String 表示一个 UTF-16 格式的字符串，其中的增补字符由代理项对表示（有关详细信息，请参阅 Character 类中的 Unicode 字符表示形式）。索引值是指 char 代码单元，因此增补字符在 String 中占用两个位置。

String 类提供处理 Unicode 代码点（即字符）和 Unicode 代码单元（即 char 值）的方法。

代码分析

String类实现了三个接口

Serializable 支持序列化
Comparable 实现了compareTo方法，用于与另一个字符串进行比较
```
  public int compareTo(String anotherString)
```

CharSequence char值的一个可读序列，有以下方法：

  charAt(int index) 返回指定索引的char值
  length() 返回此字符序列的长度
  subSequence(int start,int end) 返回一个新的CharSequence，它是此序列的子序列
  toString() 返回一个包含序列中字符的字符串

String的变量

private final char value[] 保存字符
private final int offset 偏移量
private final int count 字符的数量
private int hash 缓存字符串的hash，默认是0
private static final long serialVersionUID = -6849794470754667710L
private static final ObjectStreamField[] serialPersistentFields =new ObjectStreamField[0]; 声明类的Serializable字段，数组为0

字符数组和offset及count都是final的，必须在构造方法中初始化，之后不能修改。

构造方法

public String() 构造方法，value是0的数组，offset和index均为0，构造函数主要是设置这三个变量

  public String() {
    this.offset = 0;
    this.count = 0;
    this.value = new char[0];
}

public String(String original) 给定一个字符串，创建一个新的字符串。若value长度大于count（如：），从源字符串复制；若指定字符串的value长度与count相等，则返回的新字符串的value指向源字符串的value，这是由于value是final的，源字符串的value不能改变，如果不是final的话也需要复制，否则会影响返回的字符串。

public String(String original) {
    int size = original.count;//
    char[] originalValue = original.value;//original的数组
    char[] v;
    if (originalValue.length > size) {//原字符串value的大小大于其count
            int off = original.offset;//这个肯定不是0
            v = Arrays.copyOfRange(originalValue, off, off+size);//从原字符串的offset出拷贝size个字符
    } else {
        v = originalValue;//原字符串value的大小等于其count，不可能小于count
    }
    this.offset = 0;
    this.count = size;
    this.value = v;
    }

public String(char valueOri[]) 设置offset count，拷贝valueOri赋给value，使用Arrays.copyOf(value, size)
public String(char value[], int offset, int count)使用Arrays.copyOfRange(value, offset, offset+count);
public String(int[] codePoints, int offset, int count)

public String(int[] codePoints, int offset, int count) {
        if (offset < 0) {
            throw new StringIndexOutOfBoundsException(offset);
        }
        if (count < 0) {
            throw new StringIndexOutOfBoundsException(count);
        }
        // Note: offset or count might be near -1>>>1.
        if (offset > codePoints.length - count) {
            throw new StringIndexOutOfBoundsException(offset + count);
        }

        final int end = offset + count;

        // Pass 1: Compute precise size of char[] 计算int中有效char的大小
        int n = count;//初始值为count，主要查找大于FFFF的，将n++，使用两个char表示
        for (int i = offset; i < end; i++) {
            int c = codePoints[i];
            if (Character.isBmpCodePoint(c))//unicode是16-bit的，int是32-bit判断是否在0000-FFFF区间
                continue;
            else if (Character.isValidCodePoint(c))//Code Point 在0-10FFFF之间
                n++;
            else throw new IllegalArgumentException(Integer.toString(c));
        }

        // Pass 2: Allocate and fill in char[] 
        final char[] v = new char[n];

        for (int i = offset, j = 0; i < end; i++, j++) {
            int c = codePoints[i];
            if (Character.isBmpCodePoint(c))//在0-FFFF之间
                v[j] = (char) c;
            else
                Character.toSurrogates(c, v, j++);//超过FFFF的使用Surrogate
        }

        this.value  = v;
        this.count  = n;
        this.offset = 0;
    }

public String(byte ascii[], int hibyte, int offset, int count)
public String(byte ascii[], int hibyte) 使用byte-hibyte生成字符串的方法以及不推荐了，可参照NIO的Charset

对byte数组编码生成字符串

public String(byte bytes[], int offset, int length, String charsetName)
public String(byte bytes[], int offset, int length, Charset charset)
public String(byte bytes[], String charsetName)根据byte数组和编码方式
public String(byte bytes[], int offset, int length) 从Charset.defaultCharset().name();中获取默认的编码方式，file.encoding指定，文件的编码
public String(byte bytes[]) 默认编码 offset=0 length=bytes.length

public String(byte bytes[], int offset, int length, String charsetName)
        throws UnsupportedEncodingException
    {
        if (charsetName == null)
            throw new NullPointerException("charsetName");
        checkBounds(bytes, offset, length);//校验参数是否正确
        //StringCoding是default的类，其他包下不能使用，decode时需要查找charset
        char[] v = StringCoding.decode(charsetName, bytes, offset, length);
        this.offset = 0;
        this.count = v.length;
        this.value = v;
    }

public String(StringBuffer buffer) 将buffer.toString后调用获取其value，offser，count
public String(StringBuilder builder)同StringBuffer

其他方法

public int length() 返回count
public boolean isEmpty() 判断count是否等于0
public char charAt(int index) 获取index上的字符，从value数组取即可
public int hashCode() 获取hashcode

public char charAt(int index) {
        if ((index < 0) || (index >= count)) {
            throw new StringIndexOutOfBoundsException(index);
        }
        return value[index + offset];//并非直接去value的index，因为offset可能不为0
    }
public int hashCode() {
        int h = hash;
        if (h == 0 && count > 0) {
            int off = offset;
            char val[] = value;
            int len = count;

            for (int i = 0; i < len; i++) {
                h = 31*h + val[off++];
            }
            hash = h;
        }
        return h;
    }

判断内容是否相等

public boolean equals(Object anObject) 比较是否相等
public boolean contentEquals(StringBuffer sb)
public boolean contentEquals(CharSequence cs)
public boolean equalsIgnoreCase(String anotherString) 忽略大小写，分别对Character.toUpperCase和Character.toLowerCase的结果进行比较；CharacterData是一个抽象类，其子类里定义了大小写的转换，如CharacterData00、CharacterData01等
public int compareTo(String anotherString) 实现Compare接口，比较两个字符串从offset开始共count个char是否一致
public boolean regionMatches(int, String, int, int) 判断两个字符串某个区域内是否相等
public boolean regionMatches(boolean ignoreCase, int toffset,String other, int ooffset, int len)

public boolean equals(Object anObject) {
        if (this == anObject) {//先判断是否相同的引用
            return true;
        }
        if (anObject instanceof String) { //必须也是String
            String anotherString = (String)anObject;
            int n = count;
            if (n == anotherString.count) {//先比较count
                char v1[] = value;
                char v2[] = anotherString.value;
                int i = offset;
                int j = anotherString.offset;
                while (n-- != 0) {//倒序循环比较是否相等
                    if (v1[i++] != v2[j++])
                        return false;
                }
                return true;
            }
        }
        return false;
    }
    
public boolean contentEquals(CharSequence cs) {
        if (count != cs.length())
            return false;
        // AbstractStringBuilder 有两个子类：StringBuffer, StringBuilder
        if (cs instanceof AbstractStringBuilder) {
            char v1[] = value;
            char v2[] = ((AbstractStringBuilder)cs).getValue();
            int i = offset;
            int j = 0;
            int n = count;
            while (n-- != 0) {
                if (v1[i++] != v2[j++])
                    return false;
            }
            return true;
        }
        // Argument is a String
        if (cs.equals(this)) //Object对象的equals，this == obj 判断地址是否相等
            return true;
        // Argument is a generic CharSequence，比较每个字符
        char v1[] = value;
        int i = offset;
        int j = 0;
        int n = count;
        while (n-- != 0) {
            if (v1[i++] != cs.charAt(j++))
                return false;
        }
        return true;
    }

编码CodePoint相关

一个完整的Unicode字符叫CodePoint

public int codePointAt(int index) 获取index字符的Unicode code point，如史是53f2
public int codePointBefore(int index)
public int codePointCount(int beginIndex, int endIndex)
public int offsetByCodePoints(int index, int codePointOffset)

public int codePointAt(int index) {
        if ((index < 0) || (index >= count)) {
            throw new StringIndexOutOfBoundsException(index);
        }
        return Character.codePointAtImpl(value, offset + index, offset + count);
    }
//主要是计算字符是否是Surrogate型的
static int codePointAtImpl(char[] a, int index, int limit) {
        char c1 = a[index++];
        if (isHighSurrogate(c1)) {// 范围为\uD800-\uDBFF
            if (index < limit) {
                char c2 = a[index];
                if (isLowSurrogate(c2)) {// 范围为 \uDC00-\uDFFF
                    return toCodePoint(c1, c2);
                }
            }
        }
        return c1;
    }
//将高低位合并的方法
public static int toCodePoint(char high, char low) {
        return ((high << 10) + low) + (MIN_SUPPLEMENTARY_CODE_POINT
                                       - (MIN_HIGH_SURROGATE << 10)
                                       - MIN_LOW_SURROGATE);
    }

获取char、byte数组

public void getChars(int srcBegin, int srcEnd, char dst[], int dstBegin) 从本字符串中拷贝到dst数组
public byte[] getBytes(String charsetName) 将字符串以指定编码转换为字节数组
public byte[] getBytes(Charset charset) 将字符串以指定编码转换为字节数组
public byte[] getBytes() 将字符串以指定默认编码转换为字节数组

位置相关

public boolean startsWith(String prefix) 是否以prefix起始，调用下面的方法 toffset为0
public boolean endsWith(String suffix) 是否以suffix结束，调用下面的方法，toffset为count-suffix.count
public boolean startsWith(String prefix, int toffset) 计算该字符串从toffset开始是否与prefix相等，比较prefix.length个字符
public int indexOf(int ch)
public int indexOf(int ch, int fromIndex)
public int lastIndexOf(int ch)
public int lastIndexOf(int ch, int fromIndex)
public int indexOf(String str) 调用下面的方法fromIndex=0
public int indexOf(String str, int fromIndex)
public int lastIndexOf(String str)
public int lastIndexOf(String str, int fromIndex)
public String substring(int beginIndex)
public String substring(int beginIndex, int endIndex)
public CharSequence subSequence(int beginIndex, int endIndex)
public String concat(String str) 拼接字符串，
public String replace(char oldChar, char newChar) 创建一个buf，替换字符后创建新字符串
public boolean contains(CharSequence s) 调用indexOf(s.toString()) > -1

//若beginIndex=0且endIndex为count则直接返回本字符串
public String substring(int beginIndex, int endIndex) {
        return ((beginIndex == 0) && (endIndex == count)) ? this :
            new String(offset + beginIndex, endIndex - beginIndex, value);
    }
    
public String concat(String str) {
        int otherLen = str.length();
        if (otherLen == 0) {//str长度为0，返回自己
            return this;
        }
        char buf[] = new char[count + otherLen];//创建char数组
        getChars(0, count, buf, 0);//将this的value从0-count复制到buf中
        str.getChars(0, otherLen, buf, count);//将str的value从0-otherLen复制到buf的count-count + otherLen中
        return new String(0, count + otherLen, buf);
    }

常用操作

public boolean matches(String regex) 是否匹配某个正则 g.matches("^a[0-9]+")
public String replaceFirst(String regex, String replacement) 使用正则进行替换首个
public String replaceAll(String regex, String replacement) 使用正则进行替换所有
public String replace(CharSequence target, CharSequence replacement)
public String[] split(String regex) 按照regex拆分，调用下面的方法 limit=0
public String[] split(String regex, int limit) 按照regex拆分，limit

// limit参数控制pattern应用的次数，会影响返回数组的长度
// 若limit大于0，pattern会至少应用n-1次，数组长度不会大于n，最后一个元素会包含最后match后的所有字符
// 若limit小于0，pattern会应用尽可能多的次数，数组长度可能是任意值    
// 若limit等于0，pattern会应用尽可能多的次数，数组长度可能是任意值，数组最后连续的空字符串会被丢弃
The string "boo:and:foo", for example, yields the following results with these parameters: 
Regex Limit Result 
:  2 { "boo", "and:foo" } 最多匹配2-1次
:  5 { "boo", "and", "foo" } 最多匹配4次
: -2 { "boo", "and", "foo" } 匹配尽可能多次
o  5 { "b", "", ":and:f", "", "" } 最多匹配4次
o -2 { "b", "", ":and:f", "", "" } 匹配尽可能多次
o  0 { "b", "", ":and:f" } 匹配尽可能多次，移除最后连续的空格

public String toLowerCase() locale为Locale.getDefault()
public String toLowerCase(Locale locale)
public String toUpperCase()
public String toUpperCase(Locale locale)
public String trim() 移除字符串前后连续的空格，从0开始查找大于' '的位置st，再从last向其查找大于' '的位置len，最后调用substring（st, len）
public String toString() 返回this
public char[] toCharArray() 将字符串的value拷贝一份返回
public static String format(String format, Object ... args) 格式化

与其他类型转换

//将对象转为字符串，调用其toString方法
public static String valueOf(Object obj) {
        return (obj == null) ? "null" : obj.toString();
    }
//将char数组转为字符串，创建一个新的
public static String valueOf(char data[]) {
        return new String(data);
    }
public static String valueOf(char data[], int offset, int count) {
        return new String(data, offset, count);
}
public static String copyValueOf(char data[], int offset, int count) {
        // All public String constructors now copy the data.
        return new String(data, offset, count);
}
public static String copyValueOf(char data[]) {
        return copyValueOf(data, 0, data.length);
    }    
//boolean型
public static String valueOf(boolean b) {
        return b ? "true" : "false";
}
//char转为字符串
public static String valueOf(char c) {
    char data[] = {c};
    return new String(0, 1, data);
}
//int转String
public static String valueOf(int i) {
    return Integer.toString(i);
}
//long转String
public static String valueOf(long l) {
        return Long.toString(l);
    }
//float转string
public static String valueOf(float f) {
        return Float.toString(f);
    }
//double转string
 public static String valueOf(double d) {
    return Double.toString(d);
}

public native String intern(); String类维护的一个字符串池，调用该方法时，若字符串池中包含该字符串，常量池中的会返回；否则会加入常量池再返回

总结

Format String Syntax [https://docs.oracle.com/javase/7/docs/api/java/util/Formatter.html#syntax]
java Locale [http://blog.csdn.net/love_xsq/article/details/41908977]
Unicode详解 http://www.jianshu.com/p/07b578adfbf8
StringCoding.decode查找编码过程
Surrogate 這個概念，具体请见： UTF-16
Java 语言内部的字符信息是使用 UTF-16 编码。因为，char 这个类型是 16-bit 的。它可以有65536种取值，即65536个编号，每个编号可以代表1种字符。但是，Unicode 包含的字符已经远远超过65536个（0-10FFFF）。那编号大于65536的，还要用 16-bit 编码，该怎麽办？于是，Unicode 标准制定组想出的办法就是，从这65536个编号里，拿出2048个，规定它们是「Surrogates」，让它们两个为一组，来代表编号大于65536的那些字符。
更具体地，编号为 U+D800 至 U+DBFF 的规定为「High Surrogates」，共1024个。编号为 U+DC00 至 U+DFFF 的规定为「Low Surrogates」，也是1024个。它们两两组合出现，就又可以多表示1048576种字符。
Arrays.copyOfRange及System.arraycopy

转载于:https://www.cnblogs.com/ssj234/p/6387966.html

weixin_30666753

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Java基础知识整理1：String

归纳字符串是不可变的，内部有一个final的char[]，保存字符串内部的字符，必须在构造方法中进行初始化。字符串字面量、常量在类加载时保存在方法区的常量池中。CodePoint表示一个完整字符，是一个int值，32位，而char是16位，codePoint需要在0000-FFFF区间，超过FFFF使用Surrogate。Unicode 包含的字符已经远远超过65536个（0-10FFFF...
复制链接

扫一扫