Java LinkedHashMap源码解析

上周把HashMap、TreeMap这两个Map体系中比较有代表性的类介绍完了，大家应该也能体会到，如果该类所对应的数据结构与算法掌握好了，再看这些类的源码真是太简单不过了。

其次，我希望大家能够触类旁通，比如我们已经掌握了HashMap的原理，我们可以推知HashSet的内部实现

HashSet 内部用一个HashMap对象存储数据，更具体些，只用到了key，value全部为一dummy对象。

HashSet这个类太简单了，我不打算单独写文章介绍。今天介绍个比较实用的类——LinkedHashMap。

本文源码分析基于Oracle JDK 1.7.0_71，请知悉。

$ java -version
java version "1.7.0_71"
Java(TM) SE Runtime Environment (build 1.7.0_71-b14)
Java HotSpot(TM) 64-Bit Server VM (build 24.71-b01, mixed mode)

签名

public class LinkedHashMap<K,V>
       extends HashMap<K,V>
       implements Map<K,V>

可以看到，LinkedHashMap是HashMap的一子类，它根据自身的特性修改了HashMap的内部某些方法的实现，要想知道LinkedHashMap具体修改了哪些方法，就需要了解LinkedHashMap的设计原理了。

设计原理

双向链表

LinkedHashMap是key键有序的HashMap的一种实现。它除了使用哈希表这个数据结构，使用双向链表来保证key的顺序

双向链表算是个很常见的数据结构，上图中的头节点的prev、尾节点的next指向null，双向链表还有一种变种，见下图环型双向链表

可以看到，这种链表把首尾节点相连，形成一个环。

LinkedHashMap中采用的这种环型双向链表，环型双向链表的用途比较多，感兴趣可以看这里：

http://stackoverflow.com/questions/3589772/why-exactly-do-we-need-a-circular-linked-list-singly-or-doubly-data-structur

双向链表这种数据结构，最关键的是保证在增加节点、删除节点时不要断链，后面在分析LinkedHashMap具体代码时会具体介绍，这里就不赘述了。

LinkedHashMap 特点

一般来说，如果需要使用的Map中的key无序，选择HashMap；如果要求key有序，则选择TreeMap。但是选择TreeMap就会有性能问题，因为TreeMap的get操作的时间复杂度是O(log(n))的，相比于HashMap的O(1)还是差不少的，LinkedHashMap的出现就是为了平衡这些因素，使得

能够以O(1)时间复杂度增加查找元素，又能够保证key的有序性

此外，LinkedHashMap提供了两种key的顺序：

访问顺序（access order）。非常实用，可以使用这种顺序实现LRU（Least Recently Used）缓存
插入顺序（insertion orde）。同一key的多次插入，并不会影响其顺序

源码分析

首先打开eclipse的outline面版看看LinkedHashMap里面有那些成员

可以看到，由于LinkedHashMap继承自HashMap，所以大部分的方法都是根据`key的有序性`而重写了HashMap中的部分方法。

构造函数

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
    //accessOrder为true表示该LinkedHashMap的key为访问顺序
    //accessOrder为false表示该LinkedHashMap的key为插入顺序
    private final boolean accessOrder;

    public LinkedHashMap(int initialCapacity, float loadFactor) {
        super(initialCapacity, loadFactor);
        //默认为false，也就是插入顺序
        accessOrder = false;
    }
    public LinkedHashMap(int initialCapacity) {
        super(initialCapacity);
        accessOrder = false;
    }
    public LinkedHashMap() {
        super();
        accessOrder = false;
    }
    public LinkedHashMap(Map<? extends K, ? extends V> m) {
        super(m);
        accessOrder = false;
    }
    public LinkedHashMap(int initialCapacity,
                         float loadFactor,
                         boolean accessOrder) {
        super(initialCapacity, loadFactor);
        this.accessOrder = accessOrder;
    }

    /**
     * Called by superclass constructors and pseudoconstructors (clone,
     * readObject) before any entries are inserted into the map.  Initializes
     * the chain.
     */
    @Override
    void init() {
        header = new Entry<>(-1, null, null, null);
        //通过这里可以看出，LinkedHashMap采用的是环型的双向链表
        header.before = header.after = header;
    }

LinkedHashMap.Entry

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
    private static class Entry<K,V> extends HashMap.Entry<K,V> {
        // These fields comprise the doubly linked list used for iteration.
        //每个节点包含两个指针，指向前继节点与后继节点
        Entry<K,V> before, after;

        Entry(int hash, K key, V value, HashMap.Entry<K,V> next) {
            super(hash, key, value, next);
        }

        /**
         * Removes this entry from the linked list.
         */
        //删除一个节点时，需要把
        //1. 前继节点的后继指针 指向 要删除节点的后继节点
        //2. 后继节点的前继指针 指向 要删除节点的前继节点
        private void remove() {
            before.after = after;
            after.before = before;
        }

        /**
         * Inserts this entry before the specified existing entry in the list.
         */
        //在某节点前插入节点
        private void addBefore(Entry<K,V> existingEntry) {
            after  = existingEntry;
            before = existingEntry.before;
            before.after = this;
            after.before = this;
        }

        /**
         * This method is invoked by the superclass whenever the value
         * of a pre-existing entry is read by Map.get or modified by Map.set.
         * If the enclosing Map is access-ordered, it moves the entry
         * to the end of the list; otherwise, it does nothing.
         */
        void recordAccess(HashMap<K,V> m) {
            LinkedHashMap<K,V> lm = (LinkedHashMap<K,V>)m;
            // 如果需要key的访问顺序，需要把
            // 当前访问的节点删除，并把它插入到双向链表的起始位置
            if (lm.accessOrder) {
                lm.modCount++;
                remove();
                addBefore(lm.header);
            }
        }

        void recordRemoval(HashMap<K,V> m) {
            remove();
        }
    }

为了更形象表示双向链表是如何删除、增加节点，下面用代码加图示的方式

删除节点

上图中，删除的是b节点 ``` private void remove() { before.after = after; //相当于上图中的操作 1 after.before = before; //相当于上图中的操作 3 } ``` #### 增加节点

上图中的c节点相当于下面代码中的existingEntry，要插入的是x节点

1
2
3
4
5
6
    private void addBefore(Entry<K,V> existingEntry) {
        after  = existingEntry;         //相当于上图中的操作 1
        before = existingEntry.before;  //相当于上图中的操作 3
        before.after = this;            //相当于上图中的操作 4
        after.before = this;            //相当于上图中的操作 2
    }

知道了增加节点的原理，下面看看LinkedHashMap的代码是怎么实现put方法的

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
    /**
     * This override alters behavior of superclass put method. It causes newly
     * allocated entry to get inserted at the end of the linked list and
     * removes the eldest entry if appropriate.
     */
    void addEntry(int hash, K key, V value, int bucketIndex) {
        super.addEntry(hash, key, value, bucketIndex);

        // Remove eldest entry if instructed
        Entry<K,V> eldest = header.after;
        //如果有必要移除最老的节点，那么就移除。LinkedHashMap默认removeEldestEntry总是返回false
        //也就是这里if里面的语句永远不会执行
        //这里removeEldestEntry主要是给LinkedHashMap的子类留下的一个钩子
        //子类完全可以根据自己的需要重写removeEldestEntry，后面我会举个现实中的例子🌰
        if (removeEldestEntry(eldest)) {
            removeEntryForKey(eldest.key);
        }
    }
    /**
     * This override differs from addEntry in that it doesn't resize the
     * table or remove the eldest entry.
     */
    void createEntry(int hash, K key, V value, int bucketIndex) {
        HashMap.Entry<K,V> old = table[bucketIndex];
        Entry<K,V> e = new Entry<>(hash, key, value, old);
        table[bucketIndex] = e;
        //这里把新增的Entry加到了双向链表的header的前面，成为新的header
        e.addBefore(header);
        size++;
    }   
    /**
     * Returns <tt>true</tt> if this map should remove its eldest entry.
     * This method is invoked by <tt>put</tt> and <tt>putAll</tt> after
     * inserting a new entry into the map.  It provides the implementor
     * with the opportunity to remove the eldest entry each time a new one
     * is added.  This is useful if the map represents a cache: it allows
     * the map to reduce memory consumption by deleting stale entries.
     *
     * <p>Sample use: this override will allow the map to grow up to 100
     * entries and then delete the eldest entry each time a new entry is
     * added, maintaining a steady state of 100 entries.
     * <pre>
     *     private static final int MAX_ENTRIES = 100;
     *
     *     protected boolean removeEldestEntry(Map.Entry eldest) {
     *        return size() > MAX_ENTRIES;
     *     }
     * </pre>
     *
     * <p>This method typically does not modify the map in any way,
     * instead allowing the map to modify itself as directed by its
     * return value.  It <i>is</i> permitted for this method to modify
     * the map directly, but if it does so, it <i>must</i> return
     * <tt>false</tt> (indicating that the map should not attempt any
     * further modification).  The effects of returning <tt>true</tt>
     * after modifying the map from within this method are unspecified.
     *
     * <p>This implementation merely returns <tt>false</tt> (so that this
     * map acts like a normal map - the eldest element is never removed).
     *
     * @param    eldest The least recently inserted entry in the map, or if
     *           this is an access-ordered map, the least recently accessed
     *           entry.  This is the entry that will be removed it this
     *           method returns <tt>true</tt>.  If the map was empty prior
     *           to the <tt>put</tt> or <tt>putAll</tt> invocation resulting
     *           in this invocation, this will be the entry that was just
     *           inserted; in other words, if the map contains a single
     *           entry, the eldest entry is also the newest.
     * @return   <tt>true</tt> if the eldest entry should be removed
     *           from the map; <tt>false</tt> if it should be retained.
     */
    protected boolean removeEldestEntry(Map.Entry<K,V> eldest) {
        return false;
    }

上面是LinkedHashMap中重写了HashMap的两个方法，当调用put时添加Entry（新增Entry之前不存在）整个方法调用链是这样的：

LinkedHashMap.put -> LinkedHashMap.addEntry -> HashMap.addEntry -> LinkedHashMap.createEntry

有了这个调用链，再结合上面createEntry方法中的注释，就可以明白如何在添加Entry保证双向链表不断链的了。

实战：LRU缓存

上面已经介绍了，利用访问顺序这种方式可以实现LRU缓存，正好最近在用flume向hadoop传数据，发现里面hdfs sink里面就用到了这种思想。

如果你不了解flume、hdfs、sink等这些概念，也不要紧，也不会影响阅读下面的代码，相信我😊。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
  /*
   * Extended Java LinkedHashMap for open file handle LRU queue.
   * We want to clear the oldest file handle if there are too many open ones.
   */
  private static class WriterLinkedHashMap
      extends LinkedHashMap<String, BucketWriter> {

    private final int maxOpenFiles;

    public WriterLinkedHashMap(int maxOpenFiles) {
      //这里的第三个参数为true，表示key默认的顺序为访问顺序，而不是插入顺序
      super(16, 0.75f, true); // stock initial capacity/load, access ordering
      this.maxOpenFiles = maxOpenFiles;
    }

    @Override
    protected boolean removeEldestEntry(Entry<String, BucketWriter> eldest) {
      if (size() > maxOpenFiles) {
        // If we have more that max open files, then close the last one and
        // return true
        try {
          eldest.getValue().close();
        } catch (IOException e) {
          LOG.warn(eldest.getKey().toString(), e);
        } catch (InterruptedException e) {
          LOG.warn(eldest.getKey().toString(), e);
          Thread.currentThread().interrupt();
        }
        return true;
      } else {
        return false;
      }
    }
  }

可以看到，这里的WriterLinkedHashMap主要是重写了removeEldestEntry方法，我们上面介绍了，在LinkedHashMap中，这个方法总是返回false，在这里设定了一个阈值maxOpenFiles，如果打开的文件数超过了这个阈值，就返回true，即把之前最不经常访问的节点给删除掉，达到释放资源的效果。

更高效的LinkedHashIterator

由于元素之间用双向链表连接起来了，所以在遍历元素时只需遍历双向链表即可，这比HashMap中的遍历方式要高效。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
    private abstract class LinkedHashIterator<T> implements Iterator<T> {
        Entry<K,V> nextEntry    = header.after;
        Entry<K,V> lastReturned = null;

        /**
         * The modCount value that the iterator believes that the backing
         * List should have.  If this expectation is violated, the iterator
         * has detected concurrent modification.
         */
        int expectedModCount = modCount;
        //由于采用环型双向链表，所以可以用header.after == header 来判断双向链表是否为空
        public boolean hasNext() {
            return nextEntry != header;
        }

        public void remove() {
            if (lastReturned == null)
                throw new IllegalStateException();
            if (modCount != expectedModCount)
                throw new ConcurrentModificationException();

            LinkedHashMap.this.remove(lastReturned.key);
            lastReturned = null;
            expectedModCount = modCount;
        }

        Entry<K,V> nextEntry() {
            if (modCount != expectedModCount)
                throw new ConcurrentModificationException();
            if (nextEntry == header)
                throw new NoSuchElementException();
            Entry<K,V> e = lastReturned = nextEntry;
            //在访问下一个节点时，直接使用当前节点的后继指针即可
            nextEntry = e.after;
            return e;
        }
    }

除了LinkedHashIterator利用了双向链表遍历的优势外，下面的两个方法也利用这个优势加速执行。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
    /**
     * Transfers all entries to new table array.  This method is called
     * by superclass resize.  It is overridden for performance, as it is
     * faster to iterate using our linked list.
     */
    @Override
    void transfer(HashMap.Entry[] newTable, boolean rehash) {
        int newCapacity = newTable.length;
        for (Entry<K,V> e = header.after; e != header; e = e.after) {
            if (rehash)
                e.hash = (e.key == null) ? 0 : hash(e.key);
            int index = indexFor(e.hash, newCapacity);
            e.next = newTable[index];
            newTable[index] = e;
        }
    }


    /**
     * Returns <tt>true</tt> if this map maps one or more keys to the
     * specified value.
     *
     * @param value value whose presence in this map is to be tested
     * @return <tt>true</tt> if this map maps one or more keys to the
     *         specified value
     */
    public boolean containsValue(Object value) {
        // Overridden to take advantage of faster iterator
        if (value==null) {
            for (Entry e = header.after; e != header; e = e.after)
                if (e.value==null)
                    return true;
        } else {
            for (Entry e = header.after; e != header; e = e.after)
                if (value.equals(e.value))
                    return true;
        }
        return false;
    }

总结

通过这次分析LinkedHashMap，我发现JDK里面的类设计确实巧妙，父类中很多为空的方法，看似无用，其实是为子类留的一个钩子，子类可以根据需要重写这个方法，像LinkedHashMap就重写了init方法，这个方法在HashMap中的实现为空。

其次我还想强调下一些基础数据结构与算法的重要性，语言现在很多，火的也多，我们不可能一一去学习，语法说白了就是一系列规则（也可以说是语法糖衣），不同的语言创建者所定的规则可能千差万别，但是他们所基于的数据结构与算法肯定是统一的。去伪存真，算法与数据结构才是我们真正需要学习的。

最近在看Y_combinator，函数式编程中最迷人的地方，希望自己完全理解后再与大家分享。Stay Tuned！