How to Use Java Collections Safely in Multi-Threaded Environments

Yiğit PIRILDAK
Level Up Coding
Published in
4 min readMar 27, 2020

--

Collections is a Java Framework that provides a variety of utility classes that implement some of the well-known data structures such as maps, lists, and sets. Nearly all high-level programming languages provide some sort of implementation of these data structures and Java is no exception. Familiar classes such as ArrayList, HashMap and HashSet are part of this framework.

Using them in a single-threaded environment is trivial but we need to be careful in multi-threaded applications since most of these classes do not have internal synchronization, which means they are not thread-safe by default.

What is Thread Safety?

Thread safety for an object means that accessing/modifying that object from multiple threads at the same time does not leave it in an inconsistent state. There are many ways to achieve this, one of the easiest way is mutual exclusion which means only one thread can interact with an object at a time. By putting read/write operations behind a lock that can only be acquired by one thread at any given time, potential race conditions are eliminated, thus thread-safety is achieved.

Exceptions : Stack, Vector, Properties, Hashtable

Let’s get these out of the way first. When I said most of the implementations in Collections framework are not thread-safe, I was talking about these bad boys. Yes, these are thread-safe.

Stack, Vector, Properties and Hashtable classes have all been implemented in Java 1.0, therefore they are mostly considered to be legacy classes. If you look at their implementations, you will see that all of them are synchronized at object-level. Here’s the implementation of Stack, which uses synchronized keyword to lock the entire object for all operations.

Why not synchronize all of them by default?

“Why would they implement earliest collections with thread-safety in mind and abandon that philosophy later on?” you may ask. Well, the answer is actually very simple: cost of synchronization.

Synchronization is not cheap, therefore embedding it into each collection by default takes away developer’s freedom. Because of this, most collections are now written to optimize throughput in single-threaded applications.

So, what happens if you use a non thread-safe collection in a multi-threaded application?

ConcurrentModificationException

Collections are fail-fast. Being fail-fast means that an operation is terminated as soon as something unexpected is detected. Non thread-safe collections do this by keeping a variable called modCount. This variable is responsible of keeping track of modifications done to that object. If modCount is somehow increased during an operation, ConcurrentModificationException is thrown to inform you. You should not rely on handling this exception however, since this is not a guaranteed method of detecting concurrent modifications. This exception is simply thrown in order to help developers catch problems and let them know maybe they need to to consider some extra synchronization.

How to use Collections in multi-threaded applications?

Collections framework actually provides a method to convert a regular collection into a synchronized one:

Synchronization is achieved by putting the provided collection into a wrapper class that simply keeps a mutex and locks it before each operation. This is also object-level synchronization, which means the entire collection is locked before performing any type of operation.

ITERATORS ARE STILL NOT SYNCHRONIZED! If you want to use iterators to move through collections, you need to manually synchronize them. Keep in mind that smart for loops also use iterators behind the scenes, therefore same thing applies.

While object-level synchronization may be suitable for most of the applications, it’s the brute-force way of achieving thread-safety. It is not efficient and creates read/write contention. Luckily, we have better alternatives.

java.util.concurrent Package

This package provides a variety of implementations to achieve concurrency in collections as efficiently as possible. Rather than relying on object-level synchronization, more sophisticated methods are used to handle concurrency.

Let’s look at some of the examples here:

  • ConcurrentHashMap
    HashMaps in Java store data at hash blocks called Segments. By default, there are 32 segments upon creating a HashMap. ConcurrentHashMap only locks segments rather than the entire object, using a different mutex for each segment. This provides multiple threads to access different segments at the same time. Read operations are not locked at all.
  • CopyOnWriteArrayList
    Keeps an internal array to store elements. When you add an element to the list, everything gets copied and new element is appended to the end. This new list is then assigned to the internal array. This method allows you to iterate through the list safely since even if there’s an update, a new array is created for it. If you need to update the list very regularly, copying becomes a huge overhead, so you should only use this when you need to very rarely add/remove elements.

References:

--

--

A curious Software Engineer who is interested in Embedded Systems and ML. Wastes time by playing video games, watching TV Shows and reading fantasy novels.