程式師世界 >> 編程語言 >> JAVA編程 >> JAVA綜合教程 >> String詳解，string

String詳解，string

編輯：JAVA綜合教程

String詳解，string

在開發中，我們都會頻繁的使用String類，掌握String的實現和常用方法是必不可少的，當然，我們還需要了解它的內部實現。

一. String的實現

在Java中，采用了一個char數組實現String類型，這個char數組被定義為final類型，這就意味著一旦一個String被創建，那麼它就是不可變的。除此之外，還定義了一個int類型的hash，用來保存該String的hash值。

 /** The value is used for character storage. */
    private final char value[];

    /** Cache the hash code for the string */
    private int hash; // Default to 0

二. String中的構造器

創建String的方法很多，構造器也有多個。但是其目的就是給value數組賦值。構造器中傳入的參數大致可以分為幾個部分：

src：來源，就是希望在value中保存的字符串，傳入的值可以是String，char數組，int數組，byte數組，Stringbuffer，StringBuilder，boolean值。

offset：src中字符串的起始位置。

count：src中賦值到value中的字符串的個數。

Charset：指定字符集。

當然，不是每一個構造器都需要這些參數，我們也不需要一個個都詳細掌握，只要知道大概存在哪些構造方法即可，要用的時候可以查詢API。

三. String中常用的方法以及實現

1. 獲取字符數組

獲取字符數組方法：

　　　　　public void getChars(int srcBegin, int srcEnd, char dst[], int dstBegin)

該方法將String中指定位置的字符復制個dst，具體的實現如下：

 public void getChars(int srcBegin, int srcEnd, char dst[], int dstBegin) {
　　　　//對指定位置進行判斷
        if (srcBegin < 0) {
            throw new StringIndexOutOfBoundsException(srcBegin);
        }
        if (srcEnd > value.length) {
            throw new StringIndexOutOfBoundsException(srcEnd);
        }
        if (srcBegin > srcEnd) {
            throw new StringIndexOutOfBoundsException(srcEnd - srcBegin);
        }
　　　　//調用native方法進行數組復制
        System.arraycopy(value, srcBegin, dst, dstBegin, srcEnd - srcBegin);
    }

2. equals()、equalsIgnoreCase()、regionMatches()、compareTo()、compareToIgnoreCase()、hashCode()

equals()方法用來判斷兩個String是否相等，實現邏輯如下：

 public boolean equals(Object anObject) {
        if (this == anObject) {//如果兩個String是引用同一個String對象，則相等
            return true;
        }
        if (anObject instanceof String) {//否則，在長度相等的前提下，從第一個字符開始進行比較
            String anotherString = (String)anObject;
            int n = value.length;
            if (n == anotherString.value.length) {
                char v1[] = value;
                char v2[] = anotherString.value;
                int i = 0;
                while (n-- != 0) {
                    if (v1[i] != v2[i])
                        return false;
                    i++;
                }
                return true;
            }
        }
        return false;
    }

equalsIgnoreCase()忽略大小寫進行判斷，其內部實現是調用regionMatches()方法。

　　regionMatches()方法是用來判斷兩個字符串區域是否相等。String中有兩個regionMatches()方法，不同之處在於有一個增加一個boolean值決定是否忽略大小寫進行判斷。具體的實現如下：

 public boolean regionMatches(boolean ignoreCase, int toffset,
            String other, int ooffset, int len) {
        char ta[] = value;
        int to = toffset;
        char pa[] = other.value;
        int po = ooffset;
        if ((ooffset < 0) || (toffset < 0)
                || (toffset > (long)value.length - len)
                || (ooffset > (long)other.value.length - len)) {
            return false;//不滿足以上條件的都返回false
        }
        while (len-- > 0) {
            char c1 = ta[to++];
            char c2 = pa[po++];
            if (c1 == c2) {
                continue;
            }
            if (ignoreCase) {//是否區分大小寫。
                char u1 = Character.toUpperCase(c1);
                char u2 = Character.toUpperCase(c2);
                if (u1 == u2) {
                    continue;
                }   
                if (Character.toLowerCase(u1) == Character.toLowerCase(u2)) {
                    continue;
                }
            }
            return false;
        }
        return true;
    }

　　另外還存在一個判斷兩個字符串區是否相等的方法，但是是區分大小寫的： public boolean regionMatches(int toffset, String other, int ooffset, int len)，但是這個方法內部實現沒有復用上面這個方法。

　　compareTo()方法是用來比較兩個字符串大小，比較規則為：按字典順序比較兩個字符串。該比較基於字符串中各個字符的 Unicode 值。將此 String 對象表示的字符序列與參數字符串所表示的字符序列進行比較。如果按字典順序此 String 對象在參數字符串之前，則比較結果為一個負整數。如果按字典順序此 String 對象位於參數字符串之後，則比較結果為一個正整數。如果這兩個字符串相等，則結果為 0。具體實現如下：　　

 public int compareTo(String anotherString) {
        int len1 = value.length;
        int len2 = anotherString.value.length;
        int lim = Math.min(len1, len2);
        char v1[] = value;
        char v2[] = anotherString.value;

        int k = 0;
        while (k < lim) {
            char c1 = v1[k];
            char c2 = v2[k];
            if (c1 != c2) {
                return c1 - c2;//字符不等，返回Unicode差值。
            }
            k++;
        }
        return len1 - len2;//兩個字符串長度，返回長度差值。若為0，表示兩字符串大小相等。
    }

　　compareToIgnoreCase()方法實現：String在內部自定義了一個名為CaseInsensitiveComparator的類，實現Comparator，用來比較忽略大小寫的兩個字符串，比較邏輯是，依次取出兩個字符進行忽略大小寫的比較，其余邏輯和上面類似。

　　hashCode()方法返回String的hash值。

3. startWith()、endWith()

startsWith(String prefix)是判斷字符串是不是以某個指定的子字符串開始，返回boolean值。

startsWith(String prefix, int toffset)是判斷字符串從指定的位置開始是否是以指定的字符串開始，返回boolean值。其實現邏輯是取出對於位置的兩個字符，進行判斷。

endsWith(String suffix)是判斷字符串是不是以某個字符串結尾。它的實現邏輯是調用startsWith(String prefix, int toffset)方法，具體實現如下：

 public boolean endsWith(String suffix) {
        return startsWith(suffix, value.length - suffix.value.length);
    }

4. indexOf()、lastIndexOf()

indexOf(int ch)：返回指定字符在此字符串中第一次出現處的索引。

indexOf(int ch, int fromIndex)：從指定的索引開始搜索，返回在此字符串中第一次出現指定字符處的索引。

lastIndexOf(int ch)：返回最後一次出現的指定字符在此字符串中的索引。

lastIndexOf(int ch, int fromIndex)：從指定的索引開始搜索，返回在此字符串中最後一次出現指定字符處的索引。

indexOf(String str)：返回第一次出現的指定子字符串在此字符串中的索引。

indexOf(String str, int fromIndex)：從指定的索引處開始，返回第一次出現的指定子字符串在此字符串中的索引。

lastIndexOf(String str)：返回在此字符串中最右邊出現的指定子字符串的索引。

lastIndexOf(String str, int fromIndex)：從指定的索引處開始向後搜索，返回在此字符串中最後一次出現的指定子字符串的索引。

以上方法如果沒有找到索引，則返回-1.

5. substring()、concat()、matches()、contains()

substring(int beginIndex)：返回一個新的字符串，它是此字符串的一個子字符串。該子字符串始於指定索引處的字符，一直到此字符串末尾。

substring(int beginIndex, int endIndex)：返回一個新字符串，它是此字符串的一個子字符串。該子字符串從指定的 beginIndex 處開始，一直到索引 endIndex - 1 處的字符。

concat(String str)：將指定字符串聯到此字符串的結尾。

matches(String regex)：判斷此字符串是否匹配給定的正則表達式。

contains(CharSequence s)：判斷字符串中是否有該字符序列。

6. replace()、replaceFirst()、replaceAll()

replace(char oldChar, char newChar)：返回一個新的字符串，用 newChar 替換此字符串中出現的所有 oldChar 。具體實現如下：　　

public String replace(char oldChar, char newChar) {
        if (oldChar != newChar) {
            int len = value.length;
            int i = -1;
            char[] val = value; /* avoid getfield opcode */

            while (++i < len) {
                if (val[i] == oldChar) {
                    break;
                }
            }
            if (i < len) {
                char buf[] = new char[len];
                for (int j = 0; j < i; j++) {
                    buf[j] = val[j];
                }
                while (i < len) {
                    char c = val[i];
                    buf[i] = (c == oldChar) ? newChar : c;
                    i++;
                }
                return new String(buf, true);
            }
        }
        return this;
    }

replaceFirst(String regex, String replacement)：使用給定的 replacement 字符串替換此字符串匹配給定的正則表達式的第一個子字符串。

replaceAll(String regex, String replacement)：使用給定的 replacement 字符串替換此字符串匹配給定的正則表達式的每個子字符串。

7. split()、join()

　　String[] split(String regex, int limit)：根據匹配給定的正則表達式來拆分此字符串。數控制模式應用的次數，因此影響結果數組的長度。如果該限制 n 大於 0，則模式將被最多應用 n - 1 次，數組的長度將不會大於 n，而且數組的最後項將包含超出最後匹配的定界符的所有輸入。如果 n 為非正，則模式將被應用盡可能多的次數，而且數組可以是任意長度。如果 n 為零，則模式將被應用盡可能多的次數，數組可有任何長度，並且結尾空字符串將被丟棄。

　　String[] split(String regex)：根據給定的正則表達式的匹配來拆分此字符串。該方法的作用就像是使用給定的表達式和限制參數 0 來調用兩參數 split 方法。

 public String[] split(String regex, int limit) {
        /* fastpath if the regex is a
         (1)one-char String and this character is not one of the
            RegEx's meta characters ".$|()[{^?*+\\", or
         (2)two-char String and the first char is the backslash and
            the second is not the ascii digit or ascii letter.
         */
        char ch = 0;
        if (((regex.value.length == 1 &&
             ".$|()[{^?*+\\".indexOf(ch = regex.charAt(0)) == -1) ||
             (regex.length() == 2 &&
              regex.charAt(0) == '\\' &&
              (((ch = regex.charAt(1))-'0')|('9'-ch)) < 0 &&
              ((ch-'a')|('z'-ch)) < 0 &&
              ((ch-'A')|('Z'-ch)) < 0)) &&
            (ch < Character.MIN_HIGH_SURROGATE ||
             ch > Character.MAX_LOW_SURROGATE))
        {
            int off = 0;
            int next = 0;
            boolean limited = limit > 0;
            ArrayList<String> list = new ArrayList<>();
            while ((next = indexOf(ch, off)) != -1) {
                if (!limited || list.size() < limit - 1) {
                    list.add(substring(off, next));
                    off = next + 1;
                } else {    // last one
                    //assert (list.size() == limit - 1);
                    list.add(substring(off, value.length));
                    off = value.length;
                    break;
                }
            }
            // If no match was found, return this
            if (off == 0)
                return new String[]{this};

            // Add remaining segment
            if (!limited || list.size() < limit)
                list.add(substring(off, value.length));

            // Construct result
            int resultSize = list.size();
            if (limit == 0) {
                while (resultSize > 0 && list.get(resultSize - 1).length() == 0) {
                    resultSize--;
                }
            }
            String[] result = new String[resultSize];
            return list.subList(0, resultSize).toArray(result);
        }
        return Pattern.compile(regex).split(this, limit);
    }

　　String join(CharSequence delimiter, CharSequence... elements)：

　　String join(CharSequence delimiter, Iterable<? extends CharSequence> elements)：這兩個方法是jdk1.8中出現的，類似字符串拼接，不過可以指定連接符delimiter，後面的elements中間使用該連接符連接。具體實現如下：

 public static String join(CharSequence delimiter, CharSequence... elements) {
        Objects.requireNonNull(delimiter);
        Objects.requireNonNull(elements);
        // Number of elements not likely worth Arrays.stream overhead.
        StringJoiner joiner = new StringJoiner(delimiter);
        for (CharSequence cs: elements) {
            joiner.add(cs);//該方法中會將連接符拼接
        }
        return joiner.toString();
    }