You use the Java Comparator
interface when you need a certain sort-order in some container. But Comparator
does not only declare compare(o1, o2)
, it implicitly also declares equality of o1
and o2
, which is the case when the comparator returns zero (0).
This will not affect you as long as you don't reuse your comparators with different kinds of containers like ArrayList
or TreeMap
. But when, you may experience the TreeMap
gotcha:
1 | import java.util.Comparator; |
What I do here is sort some strings which represent property-names. But I want the "xxx.name" properties to stay together, which would not be the case when using an alphabetical sort order, because then "end" would go between "employee.name" and "job.name". Following is the order I want to achieve (whereby the order of "xxx.name" properties does not matter):
- "end"
- "employee.name"
- "job.name"
- "start"
So I implemented a Comparator
that generates such a sort order. It cuts away anything before the last dot, and only then compares the two elements. And it actually works when used with an ArrayList
!
But not with TreeMap
. When you try out the source code above, you'll see that it throws the IllegalStateException
:
Exception in thread "main" java.lang.IllegalStateException: Lost some values (4) when using TreeMap (3) !
at TreeMapComparatorTest.main(TreeMapComparatorTest.java:32)
When you output the keys of the map, you'll see that one of the "name" properties is missing:
- "end"
- "employee.name"
- "start"
What's happening here?
I had in mind that comparators are responsible for sorting. That's true, but not the whole truth.
Let's look at the JavaDoc of the compare(T o1, T o2) method of interface Comparator<T>:
Compares its two arguments for order. Returns a negative integer, zero, or a positive integer as the first argument is less than, equal to, or greater than the second.
In the foregoing description, the notation sgn(expression) designates the mathematical signum function, which is defined to return one of -1, 0, or 1 according to whether the value of expression is negative, zero or positive.
The implementor must ensure that sgn(compare(x, y)) == -sgn(compare(y, x)) for all x and y. (This implies that compare(x, y) must throw an exception if and only if compare(y, x) throws an exception.)
The implementor must also ensure that the relation is transitive: ((compare(x, y) > 0) && (compare(y, z) > 0)) implies compare(x, z) > 0.
Finally, the implementor must ensure that compare(x, y) == 0 implies that sgn(compare(x, z)) == sgn(compare(y, z)) for all z.
It is generally the case, but not strictly required that (compare(x, y) == 0) == (x.equals(y)). Generally speaking, any comparator that violates this condition should clearly indicate this fact. The recommended language is "Note: this comparator imposes orderings that are inconsistent with equals."
That's how responsibilities are defined by scientists. Maybe you did as me and stopped reading that complicated text at about 50%. Then you missed the important part that partially explains what happens in the TreeMap
gotcha.
It is generally the case, but not strictly required that (compare(x, y) == 0) == (x.equals(y)).
What does that mean? You'll find clarity only when you debug TreeMap.put()
. Here is an excerpt of its source code:
int cmp = cpr.compare(key, t.key);
if (cmp < 0)
t = t.left;
else if (cmp > 0)
t = t.right;
else
return t.setValue(value);
You see that this implementation uses the comparator (variable cpr
) to detect equal keys. A Map
does not allow duplicate keys. Thus it replaces the old value of the key by the new one. And - as a side effect - you lost one element!
As recommended above, you should have added some JavaDoc to your comparator:
"Note: this comparator imposes orderings that are inconsistent with equals."
But would this really have helped :-?
You can find a discussion of this gotcha also on stackoverflow. As I said, it will happen when you use Comparator
implementations in different contexts:
ArrayList
that allows duplicates, the comparator will work, TreeMap
that allows no duplicates, it might not work.This "might" is the stuff that unreproducible bugs are made of.
Following is a workaround:
final Comparator<String> comparator = new Comparator<String>() {
@Override
public int compare(String s1, String s2) {
final int result = removeAllPrefixes(s1).compareTo(removeAllPrefixes(s2));
return (result == 0 && s1.equals(s2) == false) ? 1 : result;
}
.....
};
This implementation of compare()
avoids to return zero when the given arguments are not equal. Thus "job.name" would not be considered to be equal to "employee.name", and not be removed as duplicate. Nevertheless the sort order is sufficient now:
- "end"
- "employee.name"
- "job.name"
- "start"
ɔ⃝ Fritz Ritzberger, 2016-12-04