String浅析 - 969251639/study GitHub Wiki
String是一个很特别类,它是对字符数组的封装,其特点主要如下:
- 不变性:字符串一旦生成遍不可改变,不变可以极大提高程序效率,但耗内存,空间换时间
- 常量池:字符串缓存池
- final类:保护字符串类不被任何程序所改写
public final class String implements java.io.Serializable, Comparable<String>, CharSequence
字符串类实现了Serializable,Comparable,CharSequence三个接口,说明它支持序列化,支持比较,支持字符的一些常规功能,比如获取字符长度等
字符数组存储内容
private final char value[];
接下来分析几个比较常用的方法:
获取字符串长度:
public int length() {
return value.length;
}
直接返回字符数组的长度
根据指定位置获取该位置的内容
public char charAt(int index) {
if ((index < 0) || (index >= value.length)) {
throw new StringIndexOutOfBoundsException(index);
}
return value[index];
}
首先校验index的合法性,然后返回数组的index位置的内容
比较两个字符内容
public boolean equals(Object anObject) {
if (this == anObject) {
return true;
}
if (anObject instanceof String) {
String anotherString = (String)anObject;
int n = value.length;
if (n == anotherString.value.length) {
char v1[] = value;
char v2[] = anotherString.value;
int i = 0;
while (n-- != 0) {
if (v1[i] != v2[i])
return false;
i++;
}
return true;
}
}
return false;
}
String重写了equals方法,不再是Object那样直接比较内存地址
首先判断两个类的地址引用是否相等,否则一个一个支出的比较
是否以某个字符串开头
public boolean startsWith(String prefix, int toffset) {
char ta[] = value;
int to = toffset;
char pa[] = prefix.value;
int po = 0;
int pc = prefix.value.length;
// Note: toffset might be near -1>>>1.
if ((toffset < 0) || (toffset > value.length - pc)) {
return false;
}
while (--pc >= 0) {
if (ta[to++] != pa[po++]) {
return false;
}
}
return true;
}
也是根据指定的起始位置(默认为0)一个一个字符的比较,还有endsWith类似,倒序比较,就不贴了
计算哈希码
public int hashCode() {
int h = hash;
if (h == 0 && value.length > 0) {
char val[] = value;
for (int i = 0; i < value.length; i++) {
h = 31 * h + val[i];
}
hash = h;
}
return h;
}
String重写了hashCode方法,可以看到是根据字符串的每个字符去计算哈希的
返回指定字符的起始位置
static int indexOf(char[] source, int sourceOffset, int sourceCount,
char[] target, int targetOffset, int targetCount,
int fromIndex) {
if (fromIndex >= sourceCount) {
return (targetCount == 0 ? sourceCount : -1);
}
if (fromIndex < 0) {
fromIndex = 0;
}
if (targetCount == 0) {
return fromIndex;
}
char first = target[targetOffset];
int max = sourceOffset + (sourceCount - targetCount);
for (int i = sourceOffset + fromIndex; i <= max; i++) {
/* Look for first character. */
if (source[i] != first) {
while (++i <= max && source[i] != first);//把i挪到指定的起始位置
}
/* Found first character, now look at the rest of v2 */
if (i <= max) {
int j = i + 1;
int end = j + targetCount - 1;
for (int k = targetOffset + 1; j < end && source[j]
== target[k]; j++, k++);//循环一个一个字符的判断source和target是否相等
if (j == end) {//找到指定字符
/* Found whole string. */
return i - sourceOffset;
}
}
}
return -1;
}
首先检查边界大小,然后也是到指定位置开始一个一个字符的比较,还有lastIndexOf类似,就不贴了
截取字符串
public String substring(int beginIndex, int endIndex) {
if (beginIndex < 0) {
throw new StringIndexOutOfBoundsException(beginIndex);
}
if (endIndex > value.length) {
throw new StringIndexOutOfBoundsException(endIndex);
}
int subLen = endIndex - beginIndex;
if (subLen < 0) {
throw new StringIndexOutOfBoundsException(subLen);
}
return ((beginIndex == 0) && (endIndex == value.length)) ? this
: new String(value, beginIndex, subLen);
}
首先检查索引边界,然后new一个一模一样的字符串出来,截取其中的部分字符。这里依赖了字符串的一个构造方法如下:
public String(char value[], int offset, int count) {
if (offset < 0) {
throw new StringIndexOutOfBoundsException(offset);
}
if (count < 0) {
throw new StringIndexOutOfBoundsException(count);
}
// Note: offset or count might be near -1>>>1.
if (offset > value.length - count) {
throw new StringIndexOutOfBoundsException(offset + count);
}
this.value = Arrays.copyOfRange(value, offset, offset+count);
}
public static char[] copyOfRange(char[] original, int from, int to) {
int newLength = to - from;
if (newLength < 0)
throw new IllegalArgumentException(from + " > " + to);
char[] copy = new char[newLength];
System.arraycopy(original, from, copy, 0,
Math.min(original.length - from, newLength));
return copy;
}
也就是根据偏移量的位置直接copy了一份新的字符数组出来,然后根据这个新的字符数据创建字符串
注:这里的代码被优化了,旧版本的jdk的代码直接是根据偏移量计算自己的字符串可见性,又是一个空间换时间的代价
字符串切割
public String[] split(String regex, int limit) {
/* fastpath if the regex is a
(1)one-char String and this character is not one of the
RegEx's meta characters ".$|()[{^?*+\\", or
(2)two-char String and the first char is the backslash and
the second is not the ascii digit or ascii letter.
*/
char ch = 0;
if (((regex.value.length == 1 &&
".$|()[{^?*+\\".indexOf(ch = regex.charAt(0)) == -1) ||
(regex.length() == 2 &&
regex.charAt(0) == '\\' &&
(((ch = regex.charAt(1))-'0')|('9'-ch)) < 0 &&
((ch-'a')|('z'-ch)) < 0 &&
((ch-'A')|('Z'-ch)) < 0)) &&
(ch < Character.MIN_HIGH_SURROGATE ||
ch > Character.MAX_LOW_SURROGATE))
{
int off = 0;
int next = 0;
boolean limited = limit > 0;
ArrayList<String> list = new ArrayList<>();
while ((next = indexOf(ch, off)) != -1) {
if (!limited || list.size() < limit - 1) {
list.add(substring(off, next));
off = next + 1;
} else { // last one
//assert (list.size() == limit - 1);
list.add(substring(off, value.length));
off = value.length;
break;
}
}
// If no match was found, return this
if (off == 0)
return new String[]{this};
// Add remaining segment
if (!limited || list.size() < limit)
list.add(substring(off, value.length));
// Construct result
int resultSize = list.size();
if (limit == 0) {
while (resultSize > 0 && list.get(resultSize - 1).length() == 0) {
resultSize--;
}
}
String[] result = new String[resultSize];
return list.subList(0, resultSize).toArray(result);
}
return Pattern.compile(regex).split(this, limit);
}
这个方法比较复杂,一步一步来,
- 首先判断是否需要根据正则来切割,判断依据如下:
(1)分隔符长度是1且是否包含了正则的字符 或者 分隔符长度为2且以\开头且是第二字符不能是0到9或者a到z或者A到Z
(2)分隔符不在Unicode编码的\uD800-\uDBFF之间
满足以上两个条件则不需要正则来参与切割 - 遍历整个字符,遇到分隔符时,将其前面的字符通过substring放到ArrayList
- 如果没有匹配到任何分隔符,则返回当前字符串数组
- 将最后一部分字符串放到list中
- 将list转成目标字符串数组
返回字符串池中的对象,如果池中没有则将其加到池并返回该对象地址
public native String intern();