LinuxOnly - String performance in Java

String performance in Java

Introduction

Recently, I tried out the PMD Java source code analyzer, which pointed to some performance issues regarding String and StringBuffer usage. I am always sceptical about performance claims, but this message raised particular concern:

Avoid appending characters as strings in StringBuffer.append.

This means that you should not use a String of one character when appending something to a StringBuffer:

strBuffer.append("a"); // wrong
strBuffer.append('a'); // right

Of course, the two lines do exactly the same: appending a character to a StringBuffer. However, because the parameters are of different types different functions are called which can result in different performance.

In the next pages, we look into the performance of String concatenating methods.

Appending one character

StringBuffer.append(String)
StringBuffer.append(char)

The performance benchmark was done by appending a single character to a StringBuffer by using either append(char) or append(String). This was done many times, after which the time it took was measured using System.currentTimeMillis(). The code inside the while loop looked like this:

StringBuffer sentence = new StringBuffer("Sentence number ");
sentence.append("a");

This and the other tests in this article were done on three versions of the Sun compiler and JVM. Although there were some differences in the performance between the versions, the order from fastest to slowest did not change by running the tests on another JVM. The time in the graphs is an average of the three JVMs. That said, the tests were significantly faster on newer JVMs. Upgrading from Java 1.4.2 to Java 7 can give you an performance improvement of 50%.

The results are shown in the graph. Indeed, append(char) is faster than append(String). The String version takes almost 20% longer than the fast character version.

It makes sense that the String version takes longer. The two methods are basically the same, except that the String version has to check the length of the String. However, now that the JDK is open source, we can check the implementation instead of guessing the cause.

public AbstractStringBuilder append(char c) {
    int newCount = count + 1;
    if (newCount > value.length)
        expandCapacity(newCount);
    value[count++] = c;
    return this;
}

This code copies character c to the end of a character array (line 5). Before doing this, it checks whether this character array is big enough (line 3) and increases its size if necessary (line 4).

public AbstractStringBuilder append(String str) {
    if (str == null) str = "null";
    int len = str.length();
    if (len == 0) return this;
    int newCount = count + len;
    if (newCount > value.length)
        expandCapacity(newCount);
    str.getChars(0, len, value, count);
    count = newCount;
    return this;
}

This code copies the character array from the String parameter to the character array in the StringBuffer (line 8). Like the previous method it also checks whether the array is large enough, but now using the length of the String (lines 3-7). Getting the length of the String is not really a costly operation, because String stores its length in a field. However, there are a few more checks in this version, which makes it a bit slower.

The String version of append() is approximately 20% slower than the char version. In absolute terms, StringBuffer.append(String) takes 0.38 µs. whereas StringBuffer.append(char) takes 0.32 µs. This is only a 0.06 µs improvement, which is really a little. You won't notice the speed up until you call it thousands of times and if that is the case, there are better ways to get a speedup.

So it really doesn't matter which version of append() you use. On the other hand, you get the speedup for free: all else being equal, you can as well use the char version.

Concatenating two strings

"Sentence number " + i
"Sentence number ".concat(String.valueOf(i))
new StringBuffer("Sentence number ").append(i).toString()
strBuffer.delete(16, 30).append(i).toString()

The graph shows four scenarios to append the String representation of an int to an existing String. The first two make use of the String class, the last two make use of the StringBuffer class. Because the StringBuffer class is mutable, it is usually faster to use it for operations on strings. Instead, the String class is immutable, which means that the + operator and the concat() method create new String instances.

The last test uses the following code inside the loop:

strBuffer.delete(16, 30).append(i).toString()

The strBuffer variable is initialized to a StringBuffer with some content. Here is where the StringBuffer class really shines: changing existing strings. This way, the existing StringBuffer instance is repeatedly used and no new instances have to be made. Of course, it depends on the situation whether repeated usage is possible.

Since Java 5, java.lang.StringBuilder is also available. It is almost the same as StringBuffer, but its methods are not synchronized. This means it is not safe to modify the StringBuilder from different threads, but it is a little bit faster. I did not include it in the comparison, because I also wanted to test Java 1.4.2, in which StringBuilder is not yet available.

The graph above shows that using a StringBuffer is indeed faster than the + operator. However, String.concat() is also suprisingly fast and can offer a shorter alternative to the StringBuffer version.

Normally, you should use the + operator because it makes the code much more readable. However, when you have determined through profiling that the String concatination is slowing your program down, a StringBuffer or StringBuilder implementation can improve the speed, especially when you use it in a smart way.

Concatenating many strings

new StringBuffer().append(String)
new StringBuffer(12888897).append(char)

In these tests, we continiously append a int to a string to get a very long string. The code below shows the implementation with a StringBuffer, initialized with a String.

StringBuffer sentence = new StringBuffer("Sentence");
while (--i != 0) {
    sentence = sentence.append(i);
}
sentence.toString();

This code concatenates numbers to a StringBuffer to form a very long string. In our case, we are concatenating 2 million numbers to form a string of approximately 13 million characters.

The red bar in the graph shows the speedup when the StringBuffer is initialized with the total length of the String. Normally, the StringBuffer gets an initial length of the length of the constructor, plus 16. In our case, this is the length of "Sentence", which is 8, plus 16 is 24. If you append to the StringBuffer so that it becomes bigger than 24 characters, it allocates a bigger chunk of memory. It calculates the new size of the buffer, which is the double of the old size plus one. In our case, this is 2 * (24 + 1) = 50. After it has allocated this memory, it copies the string from the old chunk to the new array. Then, when we append to the StringBuffer so that it becomes larger than 50 characters, it has to do this again. Before we have a string of 12 million characters, this happened 19 times and 6,815,742 characters have been copied. Instead, when we tell the StringBuffer class upon construction that we want to make a string of 12,888,897 characters in length, it initializes its buffer to this length. This means it never has to be expanded and no copying is done.

Of course, this test is also possible using a String. However, it is rather slow. So slow, that it would not fit in the graph. String objects are immutable, which means that appending a String to a String can not happen in place. Instead, when string one is appended to string two, a new String object is created and the contents of both string one and string two are copied into it.

So lets see what this means for the test described above. The code would be like this:

String sentence = "Sentence";
while (--i != 0) {
    sentence += i;
}

What happens in line 3 is that the old value of sentence with i appended is copied to a new String, and sentence obtains the value of the new String. Since we are making a String of approximately 13 million characters, that is a lot of copying. Lets assume that the string we are appending has an average length of 6 and we take the initial length of the string as 0, to make things easy. The first pass, no bytes would be copied. The second pass, 6 bytes, the third 12, etc. So how many bytes are copied?

0×6 + 1×6 + 2×6 + 3×6 + ... + 1,999,998×6 + 1,999,999×6
= 6 * (0 + 1 + 2 + 3 + ... + 1,999,998 + 1,999,999)
≅ 6 * (2,000,000×1,000,000)
= ± 12 trillion

No wonder it takes a long time.

Conclusion

In this article, we saw that various methods of concatenating strings vary in performance. In most cases, the gains are negligible. Furthermore, the + operator gives by far the most readable code.

The only case when you should use a StringBuffer or StringBuilder instead of a String is when you append multiple times to one string. In this case, the StringBuffer class considerably reduces the amount of characters copied in memory.

The Sad Tragedy of Micro-Optimization Theater