Javaでサイズ付きMapを比べてみた

Javaで上限サイズ付きのMapが実装が必要でしたのでいくつか比べてみました。具体的にはLinkedHashMapとCaffeineを使ったので、その実装内容を見たうえで結果を比較していきたいと思います。

実装内容
- LinkedHashMap
- Caffeine
測定
- Insert
- Update
- Parallel Put
- Get
まとめ

実装内容

LinkedHashMap

まずデータ構造がどうなっているのか確認していきます。自分がAmazon CorrettoのJDK11を使っているのですが、逆コンパイルした結果をみるとLinkedHashMapはHashMapを拡張した形になっていました。

public class LinkedHashMap<K,V>
    extends HashMap<K,V>
    implements Map<K,V>
{

HashMapを拡張しているのでMap自体はNodeの配列で実現しており、配列のインデックス取得にhash値を使うというものになっています。

/* ---------------- Fields -------------- */

/**
 * The table, initialized on first use, and resized as
 * necessary. When allocated, length is always a power of two.
 * (We also tolerate length zero in some operations to allow
 * bootstrapping mechanics that are currently not needed.)
 */
transient Node<K,V>[] table;

ではLinkedListのデータ構造をどう実現しているかというとNodeにbeforeとafterに参照を持たせるように拡張していました。Javaだと自分で書くことはないのですが、C言語で構造体のチェーン構造を実装するときのように参照を制御しています。

/**
 * HashMap.Node subclass for normal LinkedHashMap entries.
 */
static class Entry<K,V> extends HashMap.Node<K,V> {
    Entry<K,V> before, after;
    Entry(int hash, K key, V value, Node<K,V> next) {
        super(hash, key, value, next);
    }
}

それからLinkedhashMapはNodeのリストのheadとlastは保持するようにしています。

    /**
     * The head (eldest) of the doubly linked list.
     */
    transient LinkedHashMap.Entry<K,V> head;

    /**
     * The tail (youngest) of the doubly linked list.
     */
    transient LinkedHashMap.Entry<K,V> tail;

putで新規に追加してきた時の動きを追ってみると,HashMap側の実装でtab[i] = newNode(hash, key, value, null);でnewNodeした後に配列に追加しています。

    /**
     * Implements Map.put and related methods.
     *
     * @param hash hash for key
     * @param key the key
     * @param value the value to put
     * @param onlyIfAbsent if true, don't change existing value
     * @param evict if false, the table is in creation mode.
     * @return previous value, or null if none
     */
    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        else {
~省略~

newNodeはLinkedHashMapの実装で以下のようにNodeを作り最後のNodeの参照先俊、最後だったNodeの参照先にも追加しています。

    Node<K,V> newNode(int hash, K key, V value, Node<K,V> e) {
        LinkedHashMap.Entry<K,V> p =
            new LinkedHashMap.Entry<>(hash, key, value, e);
        linkNodeLast(p);
        return p;
    }

    // link at the end of list
    private void linkNodeLast(LinkedHashMap.Entry<K,V> p) {
        LinkedHashMap.Entry<K,V> last = tail;
        tail = p;
        if (last == null)
            head = p;
        else {
            p.before = last;
            last.after = p;
        }
    }

ConcurrentHashMapは結局管理対象のデータ構造が配列だけなのでvolatileにすることでattomicな操作ができるのですが、LinkedHashMapは管理対象のデータ構造が配列とLinkedListの2つになるので同じキーの更新を同時に行うとLinkedList側のデータ構造はすぐに壊れるかと思います。

LinkedHashMapで上限サイズを指定できるようにするには以下の関数をoverrideすればよいようです。

    protected boolean removeEldestEntry(Map.Entry<K,V> eldest) {
        return false;
    }

上限サイズの指定ありで複数スレッドでも安全に扱えるようにしたいので、以下のようにlombokの@Synchronizedでブロックするようにしています。ほかにもputIfAbsentなどもブロッキングする必要があるのですが、今回は確認対象のputがブロッキングになっているので良しとします。

public class CapacityBlockingLinkedHashMap<K,V> extends LinkedHashMap<K,V> {
    private final int capacity;
    public CapacityBlockingLinkedHashMap(int capacity) {
        super(capacity);
        this.capacity = capacity;
    }

    @Override
    @Synchronized
    public V put(K key, V value) {
        return super.put(key, value);
    }

    @Override
    @Synchronized
    public V remove(Object key) {
        return super.remove(key);
    }

    @Override
    protected boolean removeEldestEntry(Map.Entry<K,V> eldest) {
        return this.size() > this.capacity;
    }
}

また、比較対象として自前でLinkedListを保持したものも準備します。この場合は更新、削除時にLinkedListの対象のキーにアクセスするのに時間がかかるので差が出るはずです。

public class CapacityBlockingLinkedHashMap2<K, V> extends HashMap<K, V> {
    private final LinkedList<K> list;
    private int capacity;

    public CapacityBlockingLinkedHashMap2(int capacity) {
        super(capacity);
        this.list = new LinkedList<>();
        this.capacity = capacity;
    }

    @Override
    @Synchronized
    public V put(K key, V value) {
        if(super.containsKey(key)) {
            this.remove(key);
        }
        if(this.list.size() >= capacity) {
            super.remove(list.poll());
        }
        list.addLast(key);
        return super.put(key, value);
    }

    @Override
    @Synchronized
    public V remove(Object key) {
        list.remove(key);
        return super.remove(key);
    }
}

Caffeine

次にCaffeine側の実装を見ていきますが、Cacheの実装としてローカルのオンメモリキャッシュであるBoundedLocalCacheを見ていきます。

/**
 * An in-memory cache implementation that supports full concurrency of retrievals, a high expected
 * concurrency for updates, and multiple ways to bound the cache.
 * <p>
 * This class is abstract and code generated subclasses provide the complete implementation for a
 * particular configuration. This is to ensure that only the fields and execution paths necessary
 * for a given configuration are used.
 *
 * @author ben.manes@gmail.com (Ben Manes)
 * @param <K> the type of keys maintained by this cache
 * @param <V> the type of mapped values
 */
abstract class BoundedLocalCache<K, V> extends BLCHeader.DrainStatusRef<K, V>
    implements LocalCache<K, V> {

データ自体は以下のようにConcurrentHashMapで管理されているのですが、

final ConcurrentHashMap<Object, Node<K, V>> data;

keyの方はnode.getKeyReference()で取得した値を使っています。

prior = data.putIfAbsent(node.getKeyReference(), node);

Caffeineでは自前でGCを実装していて、上限サイズを超えたものや期限切れになったものはそれで削除するようにしているようです。このNodeの実装クラスはいくつかあってweakKeysやsoftKeysなどがあるようです。

/**
 * Returns the reference that the cache is holding the entry by. This is either the key if
 * strongly held or a {@link java.lang.ref.WeakReference} to that key.
 */
public abstract Object getKeyReference();

weekKeysの方は以下のようになっており

  public final Object getKeyReference() {
    WeakValueReference<V> valueRef = (WeakValueReference<V>) VALUE.get(this);
    return valueRef.getKeyReference();
  }

softKeysの方は以下のようになっていました。

  public final Object getKeyReference() {
    return KEY.get(this);
  }

上限サイズを超えたものの削除やバッファーのメンテナンスなど、複数スレッドで整合性を合わせる必要があるものはブロックして行っているようです。

  /**
   * Performs the pending maintenance work and sets the state flags during processing to avoid
   * excess scheduling attempts. The read buffer, write buffer, and reference queues are
   * drained, followed by expiration, and size-based eviction.
   *
   * @param task an additional pending task to run, or {@code null} if not present
   */
  @GuardedBy("evictionLock")
  void maintenance(@Nullable Runnable task) {
    lazySetDrainStatus(PROCESSING_TO_IDLE);

    try {
      drainReadBuffer();

      drainWriteBuffer();
      if (task != null) {
        task.run();
      }

      drainKeyReferences();
      drainValueReferences();

      expireEntries();
      evictEntries();

      climb();
    } finally {
      if ((drainStatus() != PROCESSING_TO_IDLE) || !casDrainStatus(PROCESSING_TO_IDLE, IDLE)) {
        lazySetDrainStatus(REQUIRED);
      }
    }
  }

自前で実装したGCによりCaffeineの性能は高くなっているようでした。

測定

それでは性能を測定していきます。

Insert

まず新規のノードをputする場合の性能について、以下のように上限サイズとループの回数を指定できるようにしておいて

@SneakyThrows
public static long performInsert(int size, int time, int loop, Function<Integer, Supplier<DataAccess<Integer, Integer>>> gen) {
    long sum = 0L;
    for (int t = 0; t < time; t++) {
        DataAccess<Integer, Integer> da = gen.apply(size).get();
        long start = System.nanoTime();
        for (int l = 0; l < loop; l++) {
            da.put(l, l);
        }
        long end = System.nanoTime();
        sum += (end - start);
    }
    return sum / time;
}

Function<Integer, Supplier<DataAccess<Integer, Integer>>> blockLinkedHashMapGen =
        (capacity) -> () -> new DataAccessMap(new CapacityBlockingLinkedHashMap<>(capacity));
Function<Integer, Supplier<DataAccess<Integer, Integer>>> blockLinkedHashMapGen2 =
        (capacity) -> () -> new DataAccessMap(new CapacityBlockingLinkedHashMap2<>(capacity));
Function<Integer, Supplier<DataAccess<Integer, Integer>>> caffeineGen =
        (capacity) -> () -> new DataAccessCache<>(Caffeine.newBuilder().weakKeys().weakValues().maximumSize(capacity).build());
List.of(
        Tuples.of(1, 100, 1),
        Tuples.of(100, 100, 100),
        Tuples.of(10000, 100, 10000),
        Tuples.of(20, 100, 100),
        Tuples.of(20, 100, 10000)
).forEach(p -> {
    int size = p.getT1();
    int time = p.getT2();
    int loop = p.getT3();
    System.out.printf("performInsert size[%d] time[%d] loop[%d]%n", size, time, loop);
    System.out.printf("%25s:%dns%n", "blockLinkedHashmap", performInsert(size, time, loop, blockLinkedHashMapGen));
    System.out.printf("%25s:%dns%n", "blockLinkedHashmap2", performInsert(size, time, loop, blockLinkedHashMapGen2));
    System.out.printf("%25s:%dns%n", "caffeineCache", performInsert(size, time, loop, caffeineGen));
});

結果は以下のようになりました。1件登録するだけでみたらLinkedHashMapを拡張したものが早そうですが、複数件登録のスループットは特に差がなさそうでした。

performInsert size[1] time[100] loop[1]
       blockLinkedHashmap:868ns
      blockLinkedHashmap2:40689ns
            caffeineCache:50723ns
performInsert size[100] time[100] loop[100]
       blockLinkedHashmap:31970ns
      blockLinkedHashmap2:22311ns
            caffeineCache:460851ns
performInsert size[10000] time[100] loop[10000]
       blockLinkedHashmap:630141ns
      blockLinkedHashmap2:833954ns
            caffeineCache:4388130ns
performInsert size[20] time[100] loop[100]
       blockLinkedHashmap:17965ns
      blockLinkedHashmap2:26399ns
            caffeineCache:27371ns
performInsert size[20] time[100] loop[10000]
       blockLinkedHashmap:431842ns
      blockLinkedHashmap2:477565ns
            caffeineCache:3309402ns

Update

次に更新ですが、測定用関数は以下になります。

@SneakyThrows
public static long performUpdate(int size, int time, int loop, Function<Integer, Supplier<DataAccess<Integer, Integer>>> gen) {
    long sum = 0L;
    for (int t = 0; t < time; t++) {
        DataAccess<Integer, Integer> da = gen.apply(size).get();
        for (int w = 0; w < size; w++) {
            da.put(w, w);
        }
        long start = System.nanoTime();
        for (int l = 0; l < loop; l++) {
            int rand = (int) (Math.random() * size);
            da.put(rand, rand);
        }
        long end = System.nanoTime();
        sum += (end - start);
    }
    return sum / time;
}

結果は以下のようになりました、登録の時と比べると更新だとLinkedHashMapの拡張クラスが全体的に早いことが分かります。blockLinkedHashmap2はLinkedListを自前で定義したものを使っているの長くなるとノードへのアクセスが遅くなるので時間がかかっていることが分かります。caffeineも十分早そうです。

performUpdate size[1] time[100] loop[1]
       blockLinkedHashmap:2759ns
      blockLinkedHashmap2:3192ns
            caffeineCache:51632ns
performUpdate size[100] time[100] loop[100]
       blockLinkedHashmap:25216ns
      blockLinkedHashmap2:109116ns
            caffeineCache:118842ns
performUpdate size[10000] time[100] loop[10000]
       blockLinkedHashmap:729557ns
      blockLinkedHashmap2:240182840ns
            caffeineCache:2779608ns
performUpdate size[100] time[100] loop[1]
       blockLinkedHashmap:93ns
      blockLinkedHashmap2:205ns
            caffeineCache:2012ns
performUpdate size[10000] time[100] loop[1]
       blockLinkedHashmap:172ns
      blockLinkedHashmap2:35400ns
            caffeineCache:465ns

Parallel Put

次に並列でputする場合について、測定用の関数は以下になりまして

@SneakyThrows
public static long performParallelPut(int size, int thread, int loop, Function<Integer, Supplier<DataAccess<Integer, Integer>>> gen) {
    DataAccess<Integer, Integer> da = gen.apply(size).get();
    ExecutorService exec = Executors.newFixedThreadPool(thread);
    long start = System.nanoTime();
    for(int s = 0; s < thread; s++ ) {
        exec.execute(() -> {
            for (int l = 0; l < loop; l++) {
                int rand = (int) (Math.random() * size);
                da.put(rand, rand);
            }
        });
    }
    exec.shutdown();
    long end = System.nanoTime();
    return end - start;
}

結果は以下のようになり特に差がなさそうでした。

performParallelPut size[1] thread[4] loop[1]
       blockLinkedHashmap:2080000ns
      blockLinkedHashmap2:574500ns
            caffeineCache:343600ns
performParallelPut size[100] thread[4] loop[100]
       blockLinkedHashmap:376200ns
      blockLinkedHashmap2:1264401ns
            caffeineCache:430101ns
performParallelPut size[10000] thread[4] loop[10000]
       blockLinkedHashmap:455500ns
      blockLinkedHashmap2:504900ns
            caffeineCache:474700ns
performParallelPut size[20] thread[4] loop[100]
       blockLinkedHashmap:417300ns
      blockLinkedHashmap2:409000ns
            caffeineCache:417800ns
performParallelPut size[20] thread[4] loop[10000]
       blockLinkedHashmap:421100ns
      blockLinkedHashmap2:417500ns
            caffeineCache:15635699ns

Get

最後にgetの確認ですが、測定用の関数は以下になります。

@SneakyThrows
public static long performGet(int size, int time, int loop, Function<Integer, Supplier<DataAccess<Integer, Integer>>> gen) {
    long sum = 0L;
    for (int t = 0; t < time; t++) {
        DataAccess<Integer, Integer> da = gen.apply(size).get();
        for (int s = 0; s < size; s++) {
            da.put(s, s);
        }
        long start = System.nanoTime();
        for (int l = 0; l < loop; l++) {
            da.get((int) (Math.random() * size));
        }
        long end = System.nanoTime();
        sum += (end - start);
    }
    return sum / time;
}

結果は以下のようになりました。内部的にはHashMapとConcurrentHashMapのgetを呼び出しているだけですがCaffeineの方はイベントの呼び出しなど行っているのでその分は時間がかかっているようです。

performGet size[1] time[100] loop[1]
       blockLinkedHashmap:268ns
      blockLinkedHashmap2:224ns
            caffeineCache:3498ns
performGet size[100] time[100] loop[100]
       blockLinkedHashmap:8558ns
      blockLinkedHashmap2:6006ns
            caffeineCache:38825ns
performGet size[10000] time[100] loop[10000]
       blockLinkedHashmap:792030ns
      blockLinkedHashmap2:621276ns
            caffeineCache:1112894ns
performGet size[100] time[100] loop[1]
       blockLinkedHashmap:112ns
      blockLinkedHashmap2:57ns
            caffeineCache:1657ns
performGet size[10000] time[100] loop[1]
       blockLinkedHashmap:164ns
      blockLinkedHashmap2:155ns
            caffeineCache:859ns

まとめ

Caffeineは高機能で高性能ですが、キューなどレイテンシーがクリティカルな場面ではLinkedHashMapを使ったほうが良さそうです。putなどの操作をブロッキングにしても性能上問題なさそうです。レイテンシーがクリティカルでないのであればCaffeineは高機能なのでこちらを使えば良さそうです。