String浅析 - 969251639/study GitHub Wiki

String是一个很特别类，它是对字符数组的封装，其特点主要如下：

不变性：字符串一旦生成遍不可改变，不变可以极大提高程序效率，但耗内存，空间换时间
常量池：字符串缓存池
final类：保护字符串类不被任何程序所改写

public final class String implements java.io.Serializable, Comparable<String>, CharSequence

字符串类实现了Serializable，Comparable，CharSequence三个接口，说明它支持序列化，支持比较，支持字符的一些常规功能，比如获取字符长度等

字符数组存储内容

private final char value[];

接下来分析几个比较常用的方法：

获取字符串长度：

public int length() {
    return value.length;
}

直接返回字符数组的长度

根据指定位置获取该位置的内容

public char charAt(int index) {
    if ((index < 0) || (index >= value.length)) {
        throw new StringIndexOutOfBoundsException(index);
    }
    return value[index];
}

首先校验index的合法性，然后返回数组的index位置的内容

比较两个字符内容

public boolean equals(Object anObject) {
    if (this == anObject) {
        return true;
    }
    if (anObject instanceof String) {
        String anotherString = (String)anObject;
        int n = value.length;
        if (n == anotherString.value.length) {
            char v1[] = value;
            char v2[] = anotherString.value;
            int i = 0;
            while (n-- != 0) {
                if (v1[i] != v2[i])
                    return false;
                i++;
            }
            return true;
        }
    }
    return false;
}

String重写了equals方法，不再是Object那样直接比较内存地址
首先判断两个类的地址引用是否相等，否则一个一个支出的比较

是否以某个字符串开头

public boolean startsWith(String prefix, int toffset) {
    char ta[] = value;
    int to = toffset;
    char pa[] = prefix.value;
    int po = 0;
    int pc = prefix.value.length;
    // Note: toffset might be near -1>>>1.
    if ((toffset < 0) || (toffset > value.length - pc)) {
        return false;
    }
    while (--pc >= 0) {
        if (ta[to++] != pa[po++]) {
            return false;
        }
    }
    return true;
}

也是根据指定的起始位置（默认为0）一个一个字符的比较，还有endsWith类似，倒序比较，就不贴了

计算哈希码

public int hashCode() {
    int h = hash;
    if (h == 0 && value.length > 0) {
        char val[] = value;
        for (int i = 0; i < value.length; i++) {
            h = 31 * h + val[i];
        }
        hash = h;
    }
    return h;
}

String重写了hashCode方法，可以看到是根据字符串的每个字符去计算哈希的

返回指定字符的起始位置

static int indexOf(char[] source, int sourceOffset, int sourceCount,
            char[] target, int targetOffset, int targetCount,
            int fromIndex) {
    if (fromIndex >= sourceCount) {
        return (targetCount == 0 ? sourceCount : -1);
    }
    if (fromIndex < 0) {
        fromIndex = 0;
    }
    if (targetCount == 0) {
        return fromIndex;
    }

    char first = target[targetOffset];
    int max = sourceOffset + (sourceCount - targetCount);

    for (int i = sourceOffset + fromIndex; i <= max; i++) {
        /* Look for first character. */
        if (source[i] != first) {
            while (++i <= max && source[i] != first);//把i挪到指定的起始位置
        }

        /* Found first character, now look at the rest of v2 */
        if (i <= max) {
            int j = i + 1;
            int end = j + targetCount - 1;
            for (int k = targetOffset + 1; j < end && source[j]
                    == target[k]; j++, k++);//循环一个一个字符的判断source和target是否相等
             if (j == end) {//找到指定字符
                /* Found whole string. */
                return i - sourceOffset;
            }
        }
    }
    return -1;
}

首先检查边界大小，然后也是到指定位置开始一个一个字符的比较，还有lastIndexOf类似，就不贴了

截取字符串

public String substring(int beginIndex, int endIndex) {
    if (beginIndex < 0) {
        throw new StringIndexOutOfBoundsException(beginIndex);
    }
    if (endIndex > value.length) {
        throw new StringIndexOutOfBoundsException(endIndex);
    }
    int subLen = endIndex - beginIndex;
    if (subLen < 0) {
        throw new StringIndexOutOfBoundsException(subLen);
    }
    return ((beginIndex == 0) && (endIndex == value.length)) ? this
            : new String(value, beginIndex, subLen);
}

首先检查索引边界，然后new一个一模一样的字符串出来，截取其中的部分字符。这里依赖了字符串的一个构造方法如下：

public String(char value[], int offset, int count) {
     if (offset < 0) {
         throw new StringIndexOutOfBoundsException(offset);
     }
     if (count < 0) {
         throw new StringIndexOutOfBoundsException(count);
     }
     // Note: offset or count might be near -1>>>1.
     if (offset > value.length - count) {
         throw new StringIndexOutOfBoundsException(offset + count);
     }
     this.value = Arrays.copyOfRange(value, offset, offset+count);
}

public static char[] copyOfRange(char[] original, int from, int to) {
    int newLength = to - from;
    if (newLength < 0)
        throw new IllegalArgumentException(from + " > " + to);
    char[] copy = new char[newLength];
    System.arraycopy(original, from, copy, 0,
                     Math.min(original.length - from, newLength));
    return copy;
}

也就是根据偏移量的位置直接copy了一份新的字符数组出来，然后根据这个新的字符数据创建字符串
注：这里的代码被优化了，旧版本的jdk的代码直接是根据偏移量计算自己的字符串可见性，又是一个空间换时间的代价

字符串切割

public String[] split(String regex, int limit) {
    /* fastpath if the regex is a
     (1)one-char String and this character is not one of the
        RegEx's meta characters ".$|()[{^?*+\\", or
     (2)two-char String and the first char is the backslash and
        the second is not the ascii digit or ascii letter.
     */
    char ch = 0;
    if (((regex.value.length == 1 &&
         ".$|()[{^?*+\\".indexOf(ch = regex.charAt(0)) == -1) ||
         (regex.length() == 2 &&
          regex.charAt(0) == '\\' &&
          (((ch = regex.charAt(1))-'0')|('9'-ch)) < 0 &&
          ((ch-'a')|('z'-ch)) < 0 &&
          ((ch-'A')|('Z'-ch)) < 0)) &&
        (ch < Character.MIN_HIGH_SURROGATE ||
         ch > Character.MAX_LOW_SURROGATE))
    {
        int off = 0;
        int next = 0;
        boolean limited = limit > 0;
        ArrayList<String> list = new ArrayList<>();
        while ((next = indexOf(ch, off)) != -1) {
            if (!limited || list.size() < limit - 1) {
                list.add(substring(off, next));
                off = next + 1;
            } else {    // last one
                //assert (list.size() == limit - 1);
                list.add(substring(off, value.length));
                off = value.length;
                break;
            }
        }
        // If no match was found, return this
        if (off == 0)
            return new String[]{this};
         // Add remaining segment
        if (!limited || list.size() < limit)
            list.add(substring(off, value.length));
         // Construct result
        int resultSize = list.size();
        if (limit == 0) {
            while (resultSize > 0 && list.get(resultSize - 1).length() == 0) {
                resultSize--;
            }
        }
        String[] result = new String[resultSize];
        return list.subList(0, resultSize).toArray(result);
    }
    return Pattern.compile(regex).split(this, limit);
}

这个方法比较复杂，一步一步来，

首先判断是否需要根据正则来切割，判断依据如下：
（1）分隔符长度是1且是否包含了正则的字符或者分隔符长度为2且以\开头且是第二字符不能是0到9或者a到z或者A到Z
（2）分隔符不在Unicode编码的\uD800-\uDBFF之间
满足以上两个条件则不需要正则来参与切割
遍历整个字符，遇到分隔符时，将其前面的字符通过substring放到ArrayList
如果没有匹配到任何分隔符，则返回当前字符串数组
将最后一部分字符串放到list中
将list转成目标字符串数组

返回字符串池中的对象，如果池中没有则将其加到池并返回该对象地址

public native String intern();

String浅析 - 969251639/study GitHub Wiki

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️