我必须在内存中保留数千个字符串,以便在Java中串行访问。我应该把它们存储在数组中还是应该使用某种列表?
由于数组将所有数据保存在一个连续的内存块中(与list不同),使用数组存储数千个字符串会导致问题吗?
我必须在内存中保留数千个字符串,以便在Java中串行访问。我应该把它们存储在数组中还是应该使用某种列表?
由于数组将所有数据保存在一个连续的内存块中(与list不同),使用数组存储数千个字符串会导致问题吗?
当前回答
使用哪一种取决于问题本身。我们得看看大O。
图片来源:https://github.com/egonSchiele/grokking_algorithms
其他回答
没有一个答案有我感兴趣的信息——重复扫描同一个数组很多很多次。必须为此做一个JMH测试。
结果(Java 1.8.0_66 x32,迭代普通数组至少比ArrayList快5倍):
Benchmark Mode Cnt Score Error Units
MyBenchmark.testArrayForGet avgt 10 8.121 ? 0.233 ms/op
MyBenchmark.testListForGet avgt 10 37.416 ? 0.094 ms/op
MyBenchmark.testListForEach avgt 10 75.674 ? 1.897 ms/op
Test
package my.jmh.test;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.TimeUnit;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.Warmup;
@State(Scope.Benchmark)
@Fork(1)
@Warmup(iterations = 5, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 10)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
public class MyBenchmark {
public final static int ARR_SIZE = 100;
public final static int ITER_COUNT = 100000;
String arr[] = new String[ARR_SIZE];
List<String> list = new ArrayList<>(ARR_SIZE);
public MyBenchmark() {
for( int i = 0; i < ARR_SIZE; i++ ) {
list.add(null);
}
}
@Benchmark
public void testListForEach() {
int count = 0;
for( int i = 0; i < ITER_COUNT; i++ ) {
for( String str : list ) {
if( str != null )
count++;
}
}
if( count > 0 )
System.out.print(count);
}
@Benchmark
public void testListForGet() {
int count = 0;
for( int i = 0; i < ITER_COUNT; i++ ) {
for( int j = 0; j < ARR_SIZE; j++ ) {
if( list.get(j) != null )
count++;
}
}
if( count > 0 )
System.out.print(count);
}
@Benchmark
public void testArrayForGet() {
int count = 0;
for( int i = 0; i < ITER_COUNT; i++ ) {
for( int j = 0; j < ARR_SIZE; j++ ) {
if( arr[j] != null )
count++;
}
}
if( count > 0 )
System.out.print(count);
}
}
数组建议你在任何地方使用它们而不是列表,特别是在你知道项目的数量和大小不会改变的情况下。
参见Oracle Java最佳实践:http://docs.oracle.com/cd/A97688_16/generic.903/bp/java.htm#1007056
当然,如果需要多次从集合中添加和删除对象,则使用简单列表。
Java的方式是,您应该考虑哪种数据抽象最适合您的需求。记住,在Java中,List是抽象的数据类型,而不是具体的数据类型。您应该将字符串声明为List,然后使用ArrayList实现初始化它。
List<String> strings = new ArrayList<String>();
抽象数据类型和特定实现的分离是面向对象编程的一个关键方面。
An ArrayList implements the List Abstract Data Type using an array as its underlying implementation. Access speed is virtually identical to an array, with the additional advantages of being able to add and subtract elements to a List (although this is an O(n) operation with an ArrayList) and that if you decide to change the underlying implementation later on you can. For example, if you realize you need synchronized access, you can change the implementation to a Vector without rewriting all your code.
事实上,ArrayList是专门为在大多数情况下替换低级数组构造而设计的。如果Java是今天设计的,那么完全有可能将数组完全排除在外,转而使用数组列表结构。
由于数组将所有数据保存在一个连续的内存块中(与list不同),使用数组存储数千个字符串会导致问题吗?
In Java, all collections store only references to objects, not the objects themselves. Both arrays and ArrayList will store a few thousand references in a contiguous array, so they are essentially identical. You can consider that a contiguous block of a few thousand 32-bit references will always be readily available on modern hardware. This does not guarantee that you will not run out of memory altogether, of course, just that the contiguous block of memory requirement is not difficult to fufil.
在存储字符串对象的情况下,数组还是列表的选择并不那么重要(考虑到性能)。因为数组和列表存储的都是字符串对象引用,而不是实际对象。
如果字符串的数量几乎是常数,则使用数组(或ArrayList)。但如果数字变化太大,那么你最好使用LinkedList。 如果有(或将会)需要在中间添加或删除元素,那么你当然必须使用LinkedList。
这里给出的许多微基准测试发现,像array/ArrayList读取这样的事情需要几纳秒。如果所有内容都在L1缓存中,这是非常合理的。
更高级别的缓存或主存访问的数量级可能是10nS-100nS,而L1缓存的数量级更接近1nS。访问ArrayList有一个额外的内存间接,在实际的应用程序中,你可以几乎从不或每次都支付这个代价,这取决于你的代码在访问之间所做的事情。当然,如果你有很多小的数组列表,这可能会增加你的内存使用,使你更有可能缓存丢失。
原来的海报似乎只使用一个,在短时间内访问了很多内容,所以应该没有太大的困难。但是对于其他人来说可能有所不同,在解释微基准测试时应该注意。
Java Strings, however, are appallingly wasteful, especially if you store lots of small ones (just look at them with a memory analyzer, it seems to be > 60 bytes for a string of a few characters). An array of strings has an indirection to the String object, and another from the String object to a char[] which contains the string itself. If anything's going to blow your L1 cache it's this, combined with thousands or tens of thousands of Strings. So, if you're serious - really serious - about scraping out as much performance as possible then you could look at doing it differently. You could, say, hold two arrays, a char[] with all the strings in it, one after another, and an int[] with offsets to the starts. This will be a PITA to do anything with, and you almost certainly don't need it. And if you do, you've chosen the wrong language.