第13章 字符串
- String的不变性
- Java不允许程序员重载任何操作符,String中的+和+=是仅有的两个重载过的操作符,当使用+来合并字符串时,JVM会自动优化使用StringBuilder
- StringBuilder/StringBuffer
- 两者都是可以改变的
- StringBuilder是线程不安全的执行效率会较高,而StringBuffer是线程安全的
- 实例化时不传入长度,会默认实例化长度是16字符的数组
- 常用的方法如:inser(),repleace(),substring(),reverse(),append(),tostring(),delete()
- 在覆写toString时容易照成递归
public class InfiniteRecurison { public String toString() { // return " InifiniteRecurison address: " + this;//this会自动调用自己的toString()方法 return " InifiniteRecurison address: " + super.toString(); } @Test public void test(){ List<InfiniteRecurison> v = new ArrayList<InfiniteRecurison>(); for(int i=0;i<10;i++){ v.add(new InfiniteRecurison()); } System.out.println(v); } }
- String 上的操作都是返回一个新的String对象,同时,如果内容没有发生改变,则返回原先对象的引用(以下列一些方法,其他的可以去看源码)
- valueOf():返回一个表示参数内容的String,如果valueOf(true)则返回“true”字符串
- intern():返回字符串对象的规范化表示形式。一个初始为空的字符串池,它由类
String
私有地维护。当调用 intern 方法时,如果池已经包含一个等于此String
对象的字符串(用equals(Object)
方法确定),则返回池中的字符串。否则,将此String
对象添加到池中,并返回此String
对象的引用。
- 格式化输出:Formatter类
- Formatter类输出时默认情况下是右对齐的,可以使用 - 来该表方向
- width:控制一个域的尺寸
- precise:
- 应用于String:表示输出字符的最大数量
- 应用于浮点数:表示小数部分要显示出来的位数(默认是6位)
- 应用于整数:报异常
- 类型转换字符:值得注意的是当想把0用%b转换时是返回的true
- String.format()是一个static方法,它接受有Formatter.format()一样的参数,但是只返回String
- Formatter类输出时默认情况下是右对齐的,可以使用 - 来该表方向
- 正则表达式(仅限Java)
- \\在Java和其他语言中的不同含义
- 其他语言:我想要在正则表达式中插入一个普通的反斜杠,请不要给他任何特殊的含义
- Java中:我要插入一个正则表达式的反斜杠,所以其后的字符具有特殊的含义:如,一位数字\\d,想插入一个普通的反斜杠\\\\,不过如换行和制表符只需要单反斜杠\n\t...
- ‘+’:一个或者多个之前的表达式 [-?:表示零个或者一个-号]
- ‘?’:零个或者一个
- 括号有着将表达式分组的效果
- '|':则表示或操作[(-|\\+)?:表示零个或者一个-或+]
- '\W'表示非单词字符;'\w'表示一个单词字符
- String类
- 内建正则
- split()
- 与正则表达式匹配的部分,在最终的结果中都不存在
- 重载版本允许限制字符串分割的次数
- 替换
- 允许只替换表达式的第一个匹配的子串(replaceFirst()),也可以替换所有匹配的子串(replaceAll())
- 字符:
- B:指定字符B
- \xhh:十六进制值为0xhh的字符
- \uhhhh:十六进制为0xhhhh的字符
- \t:制表符;\n:换行符;\r:回车;\f:换页;\e:转义
- 字符类:
- .:任意字符
- [abc]:包含a,b,c的任意字符和[a|b|c]相同
- [^abc]:除了a,b,c之外的任意字符
- [a-zA-Z]:从a-z,A-Z的任意字符
- [abc[hij]]:任意a,b,c,h,i,j的字符和[a|b|c|h|i|j]相同
- [a-z&&[hij]]:任意h,i,j(交)
- \s:空白符;\S:非空白符
- \d:数字[0-9];\D:非数字[^0-9]
- \w:词字数[a-zA-Z0-9];\W:非词数字[^\w]
/* * Copyright (c) 2000, 2003, Oracle and/or its affiliates. All rights reserved. * ORACLE PROPRIETARY/CONFIDENTIAL. Use is subject to license terms. */ package java.lang; /** * A <tt>CharSequence</tt> is a readable sequence of <code>char</code> values. This * interface provides uniform, read-only access to many different kinds of * <code>char</code> sequences. * A <code>char</code> value represents a character in the <i>Basic * Multilingual Plane (BMP)</i> or a surrogate. Refer to <a * href="Character.html#unicode">Unicode Character Representation</a> for details. * * <p> This interface does not refine the general contracts of the {@link * java.lang.Object#equals(java.lang.Object) equals} and {@link * java.lang.Object#hashCode() hashCode} methods. The result of comparing two * objects that implement <tt>CharSequence</tt> is therefore, in general, * undefined. Each object may be implemented by a different class, and there * is no guarantee that each class will be capable of testing its instances * for equality with those of the other. It is therefore inappropriate to use * arbitrary <tt>CharSequence</tt> instances as elements in a set or as keys in * a map. </p> * * @author Mike McCloskey * @since 1.4 * @spec JSR-51 */ public interface CharSequence { /** * Returns the length of this character sequence. The length is the number * of 16-bit <code>char</code>s in the sequence.</p> * * @return the number of <code>char</code>s in this sequence */ int length(); /** * Returns the <code>char</code> value at the specified index. An index ranges from zero * to <tt>length() - 1</tt>. The first <code>char</code> value of the sequence is at * index zero, the next at index one, and so on, as for array * indexing. </p> * * <p>If the <code>char</code> value specified by the index is a * <a href="{@docRoot}/java/lang/Character.html#unicode">surrogate</a>, the surrogate * value is returned. * * @param index the index of the <code>char</code> value to be returned * * @return the specified <code>char</code> value * * @throws IndexOutOfBoundsException * if the <tt>index</tt> argument is negative or not less than * <tt>length()</tt> */ char charAt(int index); /** * Returns a new <code>CharSequence</code> that is a subsequence of this sequence. * The subsequence starts with the <code>char</code> value at the specified index and * ends with the <code>char</code> value at index <tt>end - 1</tt>. The length * (in <code>char</code>s) of the * returned sequence is <tt>end - start</tt>, so if <tt>start == end</tt> * then an empty sequence is returned. </p> * * @param start the start index, inclusive * @param end the end index, exclusive * * @return the specified subsequence * * @throws IndexOutOfBoundsException * if <tt>start</tt> or <tt>end</tt> are negative, * if <tt>end</tt> is greater than <tt>length()</tt>, * or if <tt>start</tt> is greater than <tt>end</tt> */ CharSequence subSequence(int start, int end); /** * Returns a string containing the characters in this sequence in the same * order as this sequence. The length of the string will be the length of * this sequence. </p> * * @return a string consisting of exactly this sequence of characters */ public String toString(); }
- Pattern和Matcher
- Pattern.compile():编译正则表达式串生成Matcher对象
- Matcher.find():可以用来在CharSequence中查找多个匹配,可以利用该重载方法选取搜索的起始位置
- Matcher.lonkingAt()和Matcher.matches()都可以匹配字符串,但是区别在于:1.这两个都必须在正则表达式与输入的最开始处就开始匹配时才会成功
- 2.而matches只有在整个输入都匹配正则表达式时才会成功
- Matcher.group():
- 组是用括号划分的正则表达式,可以根据组的编号来引用某个组
- groupCount()
- group()
- group(int)
- start(int group)
- end(int group)
- Matcher.start()/Matcher.end()
- Matcher.reset():将现有的Matcher对象用于一个新的字符序列
- \\在Java和其他语言中的不同含义
- Scanner:具有多个不同的方法,可以读取串,整数等等,还有hasNext()方法判断是否有下一个词
- Scanner在读取的时候默认是有空白字符对输入进行分词
- 我们可以使用uerDelemiter(Pattern)方法指定分词的定界符
- 可以使用正则表达式来扫描复杂数据,如日志
public class ThreatAnalyzer { static String threatFile = "58.27.82.161@02/10/2005\n" + "58.27.82.161@02/10/2005\n" + "58.27.82.161@02/10/2005\n" + "58.27.82.161@02/10/2005\n" + "58.27.82.161@02/10/2005\n" + "204.45.234.40@20/11/2005"; public static void main(String[] args) { Scanner scanner = new Scanner(threatFile); String pattern = "(\\d+[.]\\d+[.]\\d+[.]\\d+)@(\\d{2}/\\d{2}/\\d{4})"; while(scanner.hasNext(pattern)){ scanner.next(pattern); MatchResult result = scanner.match(); String ip = result.group(1); String date = result.group(2); System.out.println("IP: "+ip+"; Date: "+date); } } }
- StringTokenizer类