ik

Ivan Kusalic - home page

A Collection With Identity Crisis

Yesterday we had a refactoring kata facilitated by Carlos Blé. The focus was on implementing refactoring changes that shouldn’t at any point break any of the existing unit tests.

We had a short discussion about a type of a certain collection. It seemed the situation we had was confusing to most of the people present. In this post I’ll outline my thinking with the reasons behind it.

The problem

We started with a trivial ShoppingCart class that could only have one item:

Original ShoppingCart class
1
2
3
4
5
public class ShoppingCart {
    private int price;

    // ...
}

We needed to change this class so it supports multiple items. It was agreed upon that the shopping cart needs to support duplicate items and that there shouldn’t be any semantics of ordering between the items.

We decided to implement this with the following code:

ShoppingCart supporting multiple items
1
2
3
4
5
public class ShoppingCart {
    private Collection<Integer> prices = new ArrayList<>();

    // ...
}

A friend later started a discussion about the type of the collection. In his opinion the type should not be Collection<Integer> but List<Integer> as there are collections like HashSet<Integer> that break the abstraction by not allowing duplicate elements:

1
private Collection<Integer> prices = new HashSet<>();

I was against the List<Integer> for the type as it implies ordering between items in the shopping cart. Most people understood both points, but it was not obvious what is better.

A case against the List type

If we are going to use an ArrayList instance, there are two acceptable types for the prices collection:

1
2
private ArrayList<Integer> prices = new ArrayList<>();
private Collection<Integer> prices = new ArrayList<>();

The following is not acceptable:

1
private List<Integer> prices = new ArrayList<>();

The prices collection is internal to the ShoppingCart class and so ArrayList<Integer> can be used directly. This is the simplest possibility and would not surprise a reader of the code. Most likely they would not even notice the type on the left side. What about coding to an interface you ask? In our current situation this doesn’t apply – prices collection is just a private, internal detail of the ShoppingCart class.

Having Collection<Integer> is also ok, it only reveals that there can be more than one item in a shopping cart, nothing more. Combined with instantiation of an ArrayList we now know that ordering does not matter – otherwise the type on the left side would be List and not Collection. Because a list is exactly that – a collection that has an ordering of items.

We can actually infer a bit more. Given that we now know it’s a collection where the ordering does not matter and we are still not using the Set type which prohibits duplicates – we can assume that the duplicates are allowed. In reality almost nobody gives this much thought to a private field. With experience we get to know that the List is a wrong type to use here and we develop the “intuition” that the Collection works well.

This brings me to the problem with using the List here: seeing an abstract type in this situation implies that the implementer has thought a bit about the types and has decided for undisclosed reasons to care about the type. After all they did not go with the simplest possible scenario of just having the ArrayList type. As the only thing that the List type adds to the Collection type is ordering, we are now led to believe that the ordering of the items in a shopping cart matters and we have previously agreed that is not the case!

Why is the List so enticing?

There are two reasons why people feel like the List could be used here.

First one1 is that lists are ubiquitously present in basically any codebase. Different lists types are often a good choice of a collection from performance perspective. They are used so often2 that most people think of a list when you talk about a collection and ignore the ordering implication. And that can be ok when you are using a concrete list type and are not saying that you explicitly want to use the List abstraction.

The other reason you already read – the List type can prevent bugs like:

1
private Collection<Integer> prices = new HashSet<>();

Things get a bit more complicated here.

Preventing bugs?

Can usage of the List prevent all bugs that don’t allow duplicate items in a shopping cart? The answer is no. Here is an example of a possible bug:

1
private List<Integer> prices = new SetUniqueList<>();

SetUniqueList is an example of a collection that conforms to the List type, but still only allows unique elements.

But really this should not matter at all. Even if it was possible to prevent usage of a concrete collection that allows only unique elements, it should not be done with an abstract type that implies false semantics. If this is a problem, why not just have a unit test that ensures you can have duplicates? You should have that test anyway and so the List abstraction provides no value!

You could of course reasonably want to have this ensured on a type level. Then you should have your own collection type that is semantically sound.

Conclusion

It is interesting to note that, depending on your language, it might be impossible to use an abstract class or an interface and at the same time prevent bugs like the one shown above. The type would need to be both final and abstract at the same time. This is another reason for creating types to encapsulate your collections.

Try to show only the intent you have and do not imply more. Be aware that anything that’s not the simplest possible thing implies additional semantical meaning!

Don’t break the semantics of your types as a means to prevent possible future bugs – tests are perfect for that.

Note that the type systems currently in use are imperfect and are not expressive enough. Even more so in the case of mainstream languages like Java or C#. Luckily we have additional means to express intent – having good names, using instances of appropriate concrete types and having readable usage patterns.

Thanks to Carlos Blé and Peter Kofler for discussion & comments.


  1. Actually let me correct myself here: it’s not the ‘first’ and ‘second’, but ‘one’ and the ‘other’. ;)

  2. Too much actually – sometimes even in places where semantics of a set are needed.