@xmruibi 2015-02-15T05:43:47.000000Z 字数 7893 阅读 773

Hash Map

Data_Structure

Acts as a binned (bucketed) hash table

0. Prerequisite

Hash Algorithm
Hashtable
No thread safety

1. Fields

    /**
     * The table, initialized on first use, and resized as
     * necessary. When allocated, length is always a power of two.
     * (We also tolerate length zero in some operations to allow
     * bootstrapping mechanics that are currently not needed.)
     */
    transient Node<K,V>[] table;
    /**
     * Holds cached entrySet(). Note that AbstractMap fields are used
     * for keySet() and values().
     */
    transient Set<Map.Entry<K,V>> entrySet;
    /**
     * The number of key-value mappings contained in this map.
     */
    transient int size;
    /**
     * The number of times this HashMap has been structurally modified
     * Structural modifications are those that change the number of mappings in
     * the HashMap or otherwise modify its internal structure (e.g.,
     * rehash).  This field is used to make iterators on Collection-views of
     * the HashMap fail-fast.  (See ConcurrentModificationException).
     */
    transient int modCount;
    /**
     * The next size value at which to resize (capacity * load factor).
     */
    int threshold;
    /**
     * The load factor for the hash table.
     *
     * @serial
     */
    final float loadFactor;

2. Constructors:

HashMap(): 构造一个具有默认初始容量 (16) 和默认加载因子 (0.75) 的空 HashMap。

HashMap(int initialCapacity): 构造一个带指定初始容量和默认加载因子 (0.75) 的空 HashMap。

HashMap(int initialCapacity, float loadFactor): 构造一个带指定初始容量和加载因子的空 HashMap。

HashMap(Map<? extends K, ? extends V> m): 构造一个与指定mapping对象相同的 HashMap。

NOTE that: two important parameter:
1) initialCapacity: is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created.

2）loadFactor: is a measure of how full the hash table is allowed to get before its capacity is automatically increased.

When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets.

As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the HashMap class, including get and put). The expected number of entries in the map and its load factor should be taken into account when setting its initial capacity, so as to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.

    /**
     * Constructs an empty HashMap with the specified initial
     * capacity and load factor.
     *
     * @param  initialCapacity the initial capacity
     * @param  loadFactor      the load factor
     * @throws IllegalArgumentException if the initial capacity is negative
     *         or the load factor is nonpositive
     */
    public HashMap(int initialCapacity, float loadFactor) {
        // initial capacity shouldn't less than 0
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        // cannot larger than MAXIMUM_CAPACITY: 2^30
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        // check Factor
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);
        this.loadFactor = loadFactor;
        //设置HashMap的容量极限，当HashMap的容量达到该极限时就会进行扩容操作
        // in old version: threshold = (int) (capacity * loadFactor);
        this.threshold = tableSizeFor(initialCapacity); // 
    }
    /**
     * Constructs an empty HashMap with the specified initial
     * capacity and the default load factor (0.75).
     *
     * @param  initialCapacity the initial capacity.
     * @throws IllegalArgumentException if the initial capacity is negative.
     */
    public HashMap(int initialCapacity) {
        this(initialCapacity, DEFAULT_LOAD_FACTOR);
    }
    /**
     * Constructs an empty HashMap with the default initial capacity
     * (16) and the default load factor (0.75).
     */
    public HashMap() {
        this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
    }
    /**
     * Constructs a new HashMap with the same mappings as the
     * specified Map.  The HashMap is created with
     * default load factor (0.75) and an initial capacity sufficient to
     * hold the mappings in the specified Map.
     *
     * @param   m the map whose mappings are to be placed in this map
     * @throws  NullPointerException if the specified map is null
     */
    public HashMap(Map<? extends K, ? extends V> m) {
        this.loadFactor = DEFAULT_LOAD_FACTOR;
        putMapEntries(m, false);
    }

    /**
     * Returns a power of two size for the given target capacity.
     */
    static final int tableSizeFor(int cap) {
        int n = cap - 1;
        n |= n >>> 1;
        n |= n >>> 2;
        n |= n >>> 4;
        n |= n >>> 8;
        n |= n >>> 16;
        return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
    }

3. Entry - Node in HashMap:

Every time when instance a HashMap, it will initializes a table(Enrtry) array
Entry in array but why need next pointer??

    static class Node<K,V> implements Map.Entry<K,V> {
    // we can also call the node as entry
        final int hash;
        final K key;
        V value;
        Node<K,V> next;
        Node(int hash, K key, V value, Node<K,V> next) {
            this.hash = hash;
            this.key = key;
            this.value = value;
            this.next = next;
        }
        public final K getKey()        { return key; }
        public final V getValue()      { return value; }
        public final String toString() { return key + "=" + value; }
        public final int hashCode() {
            return Objects.hashCode(key) ^ Objects.hashCode(value);
        }
        public final V setValue(V newValue) {
            V oldValue = value;
            value = newValue;
            return oldValue;
        }
        public final boolean equals(Object o) {
            if (o == this)
                return true;
            if (o instanceof Map.Entry) {
                Map.Entry<?,?> e = (Map.Entry<?,?>)o;
                if (Objects.equals(key, e.getKey()) &&
                    Objects.equals(value, e.getValue()))
                    return true;
            }
            return false;
        }
    }

4. Input key-value:

4.1

    public V put(K key, V value) {
        //当key为null，调用putForNullKey方法，保存null与table第一个位置中，这是HashMap允许为null的原因
        if (key == null)
            return putForNullKey(value);
        //计算key的hash值
        int hash = hash(key.hashCode());                  ------(1)
        //计算key hash 值在 table 数组中的位置
        int i = indexFor(hash, table.length);             ------(2)
        //从i出开始迭代 e,找到 key 保存的位置
        for (Entry<K, V> e = table[i]; e != null; e = e.next) {
            Object k;
            //判断该条链上是否有hash值相同的(key相同)
            //若存在相同，则直接覆盖value，返回旧value
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;    //旧值 = 新值
                e.value = value;
                e.recordAccess(this);
                return oldValue;     //返回旧值
            }
        }
        //修改次数增加1
        modCount++;
        //将key、value添加至i位置处
        addEntry(hash, key, value, i);
        return null;
    }

通过源码我们可以清晰看到HashMap保存数据的过程为：首先判断key是否为null，若为null，则直接调用putForNullKey方法。若不为空则先计算key的hash值，然后根据hash值搜索在table数组中的索引位置，如果table数组在该位置处有元素，则通过比较是否存在相同的key，若存在则覆盖原来key的value，否则将该元素保存在链头（最先保存的元素放在链尾）。若table在该处没有元素，则直接保存。这个过程看似比较简单，其实深有内幕。有如下几点：

  1、 先看迭代处。此处迭代原因就是为了防止存在相同的key值，若发现两个hash值（key）相同时，HashMap的处理方式是用新value替换旧value，这里并没有处理key，这就解释了HashMap中没有两个相同的key。

  2、 在看（1）、（2）处。这里是HashMap的精华所在。首先是hash方法，该方法为一个纯粹的数学计算，就是计算h的hash值。

    // (1)
    final int hash(Object k) {  
        int h = hashSeed;  
        if (0 != h && k instanceof String) {//这里针对String优化了Hash函数，是否使用新的Hash函数和Hash因子有关  
            return sun.misc.Hashing.stringHash32((String) k);  
        }  
    // 其他对象 只需用 对象的hashcode即可
        h ^= k.hashCode();  
        h ^= (h >>> 20) ^ (h >>> 12);  
        return h ^ (h >>> 7) ^ (h >>> 4);  
    }

我们知道对于HashMap的table而言，数据分布需要均匀（最好每项都只有一个元素，这样就可以直接找到），不能太紧也不能太松，太紧会导致查询速度慢，太松则浪费空间。计算hash值后，怎么才能保证table元素分布均与呢？我们会想到取模，但是由于取模的消耗较大，HashMap是这样处理的：调用indexFor方法。

    // (2)
    static int indexFor(int h, int length) {
        return h & (length-1);
    }

HashMap的底层数组长度总是2的n次方，在构造函数中存在：capacity <<= 1;这样做总是能够保证HashMap的底层数组长度为2的n次方。当length为2的n次方时，h&(length – 1)就相当于对length取模，而且速度比直接取模快得多，这是HashMap在速度上的一个优化。至于为什么是2的n次方下面解释。

 我们回到indexFor方法，该方法仅有一条语句：h&(length – 1)，这句话除了上面的取模运算外还有一个非常重要的责任：均匀分布table数据和充分利用空间。

4.2 Note that:

一是链的产生。

这是一个非常优雅的设计。系统总是将新的Entry对象添加到bucketIndex处。如果bucketIndex处已经有了对象，那么新添加的Entry对象将指向原有的Entry对象，形成一条Entry链，但是若bucketIndex处没有Entry对象，也就是e==null,那么新添加的Entry对象指向null，也就不会产生Entry链了。

二、扩容问题。

随着HashMap中元素的数量越来越多，发生碰撞的概率就越来越大，所产生的链表长度就会越来越长，这样势必会影响HashMap的速度，为了保证HashMap的效率，系统必须要在某个临界点进行扩容处理。该临界点在当HashMap中元素的数量等于table数组长度*加载因子。但是扩容是一个非常耗时的过程，因为它需要重新计算这些数据在新table数组中的位置并进行复制处理。所以如果我们已经预知HashMap中元素的个数，那么预设元素的个数能够有效的提高HashMap的性能。

5 Get by Key

    //获取key值为key的元素值  
    public V get(Object key) {  
        if (key == null)//如果Key值为空，则获取对应的值，这里也可以看到，HashMap允许null的key，其内部针对null的key有特殊的逻辑  
            return getForNullKey();  
        Entry<K,V> entry = getEntry(key);//获取实体  
        return null == entry ? null : entry.getValue();//判断是否为空，不为空，则获取对应的值  
    }  
    //获取key为null的实体  
    private V getForNullKey() {  
        if (size == 0) {//如果元素个数为0，则直接返回null  
            return null;  
        }  
        //key为null的元素存储在table的第0个位置  
        for (Entry<K,V> e = table[0]; e != null; e = e.next) {  
            if (e.key == null)//判断是否为null  
                return e.value;//返回其值  
        }  
        return null;  
    }  
    //判断是否有键为key的元素  
    public boolean containsKey(Object key) {  
        return getEntry(key) != null;//这里需要遍历  
    }  
    //获取键值为key的元素  
    final Entry<K,V> getEntry(Object key) {  
        if (size == 0) {//元素个数为0  
            return null;//直接返回null  
        }  
        int hash = (key == null) ? 0 : hash(key);//获取key的Hash值  
        for (Entry<K,V> e = table[indexFor(hash, table.length)];//根据key和表的长度，定位到Hash桶  
             e != null;  
             e = e.next) {//进行遍历  
            Object k;  
            if (e.hash == hash &&  
                ((k = e.key) == key || (key != null && key.equals(k))))//判断Hash值和对应的key，合适则返回值  
                return e;  
        }  
        return null;  
    }