Dominic Williams

Occasionally useful posts about RIAs, Web scale computing & miscellanea

ConcurrentHashMap – avoid a common misuse!

with 15 comments

If you program systems with Java, you have probably long been using ConcurrentHashMap. This post explores a caveat.

ConcurrentHashMap is often introduced to simplify code and application logic. For example:

HashMap<String, MyClass> m = new HashMap<String, MyClass>();
...
synchronized (m) {
    for each (Entry<String, MyClass> e in m.entrySet())
        system.out.println(e.getKey()+"="+e.getValue());
}

might be replaced with the following so long as a consistent snapshot is not required:

ConcurrentHashMap<String, MyClass> m = new ConcurrentHashMap<String, MyClass>();
...
for each (Entry<String, MyClass> e in m.entrySet())
    system.out.println(e.getKey()+"="+e.getValue());

More often though, ConcurrentHashMap is introduced to improve performance: internally ConcurrentHashMap allows concurrent threads to read values without locking at all, except in a minority of cases, and for concurrent writers to add and update values while only acquiring locks over localised segments of the internal data structure. Read operations are therefore mostly synchronization free and have greatly improved performance, and if there are many concurrent writers then lock contention will be reduced improving performance even further.

The rub though is that the benefits on offer mean that ConcurrentHashMap is now often used Willy Nilly! As it happens, that is often not such a bad idea, but there is unfortunately a big caveat that 99% have missed: The designers of ConcurrentHashMap originally conceived the class for occasional use in in high concurrency areas at the center of systems and the default construction options reflect this!!

Specifically, the fully parametized constructor of ConcurrentHashMap has 3 parameters, initialCapacity, loadFactor and concurrencyLevel and this last one can cause problems.

Recall that ConcurrentHashMap shards its data into segments to reduce writer lock contention. Well, in actual fact the concurrencyLevel parameter directly specifies the number of shards that are created by the class internally. The idea is that the number of shards should equal the number of concurrent writer threads that you normally see. And, the default value for concurrencyLevel is 16!

So, if like most people you simply use the parameterless constructor, and accept the default configuration, your map will be instantiating the objects needed for 16 shards before you even add your first value…

Before proceeding to see how pernicious this can be, consider how unnecessary this is for most cases. The likelihood that 16 threads will actually be simultaneously spinning on your map trying to write to its data structures implies a huge level of concurrency given the fact that readers don’t even block writers. In most cases, even if several threads are accessing and writing the structure, a single shard will be completely sufficient to get most of the benefits.

But you say, it’s only a few extra objects, why should I care…

Well, once you start using ConcurrentHashMap as a matter of course, the number of instances allocated can grow dramatically. Consider for instance that you are building a game server that maintains contextual data for each connected user. You might associate a ConcurrentHashMap with each connected user allowing multiple threads to query and update online user’s contextual information with a high degree of performance and safety

In principle this works fine, but in practice you may find your game server has several thousand users connected, which drives the allocation of several thousand ConcurrentHashMap instances. Now, when you do a heap dump of your server and have a poke around, you notice there are millions and zillions of ConcurrentHashMap$Segment, ConcurrentHashMap$HashEntry[] and ReentrantLock$NonfairSync objects. This is because, in fact, the creation of a single ConcurrentHashMap instance for each of say 5000 users minimally results in 240,000 of these objects being allocated even before any values are added to the maps.

Depending upon how widespread your usage of ConcurrentHashMap is, how many maps are created per user and the manner in which third party authors have been using ConcurrentHashMap, usage of ConcurrentHashMap can end up adding serious memory and GC load to your system.

BEST PRACTICE USAGE

Because you often never really know how your classes might be re-used and therefore how many instances might be created in production, by default developers need to get in the habit of creating their ConcurrentHashMap instances with parameters something like this:

ConcurrentHashMap<String, MyClass> m =
new ConcurrentHashMap<String, MyClass>(8, 0.9f, 1);

In the above, only a single shard segment is created internally that allocates an initial capacity for its HashEntry[] table of 8, which allows for some reasonable number of values to be added before reallocation, and the load factor of 0.9 ensures reasonably dense packing. The single shard offers full read benefits and, unless you have very high concurrency sufficient write throughput without risking crazy unnecessary memory loading.

For completeness it is worth pointing out that there are indeed places where you might want to increase segment numbers, but such applications are fairly unusual and specialised. For example,  we have an EventRoutingManager class in our Starburst platform (an application layer platform that provides a horizontally scalable event publisher/consumer model) and it’s constructed something like as follows:

ConcurrentHashMap<String, EventPathContext> m =
new ConcurrentHashMap<String, MyClass>(524288, 0.75f, 32);

But to be clear, unfortunately for most usage the default ConcurrentHashMap parameters should be the exception, not the rule!

Written by dominicwilliams

December 12, 2011 at 2:18 pm

Posted in Uncategorized

15 Responses

Subscribe to comments with RSS.

  1. Excellent info, thanks for spreading the knowledge

    Mondain

    December 12, 2011 at 3:15 pm

  2. Hey Dominic, did you give a try to highscale lib? (Non-blocking hashmap)

    Patricio Echague

    December 12, 2011 at 3:40 pm

  3. Very nice article.
    Just noting that ConcurrentHashMap was revised in Java 7 to use lazy segment initialization, so the memory waste that used to occur with the default constructor is now mitigated. An empty map now contains a single segment. However, the lazy initialization is triggered by puts to missing segments and NOT by contention.
    ConcurrentHashMapV8 which may replace ConcurrentHashMap in Java 8, does not use a fixed lock striping scheme at all, instead each bucket serves as a “stripe” using intrinsic synchronization.

    Amir Hadadi

    January 14, 2012 at 7:17 pm

    • Hi thanks for that info. Will be moving to Java 7 asap I think!.. actually it’s amazing ConcurrentHashMap didn’t use segment initialization from the start, given what this can lead to, but good to hear things will be fixed.

      dominicwilliams

      January 15, 2012 at 9:58 pm

  4. Great catch, I agree 99% of people probably don’t know this!

    BTW, you should check out cliff c. implementation, I ran into this via the cassandra code base: http://www.azulsystems.com/blog/cliff/2007-03-26-non-blocking-hashtable

    Salman

    February 10, 2012 at 2:39 pm

    • @Dominic Great article….
      You mentioned best practice is to create only one segment , but by creating one segment won’t the map act as simple Hashtable? What I understand is ConcurrentHashMap divides map in16 segments (default) and put a separate lock on each segment. But if we Initialize ConcurrentHashMap with one segment then its like one lock on full map i.e Hashtable!!

      aryashekhar

      July 13, 2012 at 8:06 pm

  5. Excelente article for Java6 users!!
    Thanks

    Francisco

    April 2, 2013 at 8:09 pm

  6. Excellent information thanks a lot and please keep sharing such valuable information to java community.

    Bhargav Shah

    July 3, 2013 at 6:54 am

  7. Excellent ! How can i learn more from you bro ?

    Java Pathshala

    September 9, 2013 at 8:15 am

  8. how to know how many thread are working in concurrentHashMap?

    dd

    April 3, 2014 at 4:50 pm

  9. Thanks for this – was diagnosing why one of our heaps jumped in size recently, and it was because of lots of ConcurrentHashMaps, which made me end up here.

    James Stanier

    June 30, 2014 at 1:48 pm

  10. Great article, Dominic! One thing I noticed though:

    ConcurrentHashMap m = new ConcurrentHashMap(524288, 0.75f, 32);

    EventPathContext vs MyClass.:P

    Haris Osmanagić

    June 29, 2015 at 10:26 am

  11. Very Useful Article… Thanks so much

    Dhanushka Nayanananda

    August 10, 2015 at 1:14 pm


Leave a comment