Java: The Object Equality Problem

December 18, 2020
java oop responsibility-decoupling

post cover

I was writing some Java test code when I faced up the voracity of the equals method. It presents, despite its ostensible simplicity, a tricky problem.

I’d like to emphasize that this is not a specific Java issue. For example C# has a homologous way.

A bit of Java context

The Object class is the root of every class. It defines various methods and equals is one of them. By default this method has a simple behavior: an object x is only equals to itself. Any other object is different.

Obviously the equals method has the common properties of an equivalence relation. It is reflexive: x is equal to x; It is symmetric: if x is equal to y, then y is equal to x; and so on…

Furthermore a logic relationship links equals and the hashCode method. The latter returns the object hash. In this context it’s an integer representation of the object. So if two object are equal, then their hash should be equal too.

The problem

the ripple problem

I’ll use a simple code example to highlight the main issue. Here is the starting point:

interface Book {
  String title();

  String author();
}

final class DbBook implements Book {
  ...
}

We can ask to a Book instance its title and its author. A DbBook instance represents a book stored in a database.

As I said the Object class is the root of every class. This means that DbBook also inherits the equals method with its default behavior. So we should override equals to implement a custom equivalence relation. To respect the aforesaid logic implicatio we should also override the hashCode method.

Now suppose that in our context two books are equal if they have the same title. This seems the equals goal and here is a common implementation (I generated it with the IDE):

final class DbBook implements Book {
  ...
  @Override
  public boolean equals(final Object o) {
    if (this == o) return true;
    if (!(o instanceof DbBook)) return false;
    final DbBook dbBook = (DbBook) o;
    return title().equals(dbBook.title());
  }

  @Override
  public int hashCode() {
    return Objects.hash(title());
  }
  ...
}

However because the instanceof check, aDbBook is only comparable to anotherDbBook. This means that an AnotherBookImplementation instance is always different from aDbBook, though having the same title.

The problem seems the instanceof check. So we can weak it a bit:

final class DbBook implements Book {
  ...
  @Override
  public boolean equals(final Object o) {
    if (this == o) return true;
    if (!(o instanceof Book)) return false;
    final Book book = (Book) o;
    return title().equals(book.title());
  }
  ...
}

In this way we are restricting o to be a Book instance.

The effect of this change is the destruction of our software. As I anticipated the equals method should be reflexive. This means that every Book implementation must exhibit this equals behavior. In other words: every Book implementation is high-coupled with each other. Definitely it’s a really bad approach.

A different approach

decoupling responsibility

The main issue of the equals approach is that responsibilities aren’t decoupled correctly. There should be another object responsible of the comparison. There should be another object that represents the comparison.

There may be various implementation of this approach. I’ll suggest one called representation-based and another behavior-based.

Representation-based equality

The first one derives from this article. Basically an object (like aDbBook) can gives us a representation of itself. Then a Comparison<R> object represents a comparison between two R representation. In this way a representation is similar to the hash returned by hashCode. But it’s more generic because it could be based on bytes, strings and so on…

However this means that aCat could be equal to aDog, if they have the same representation. I consider this as the main drawback of this approach.

Behavior-based equality

The behavior-based is born from an observation. I think that the only valid discriminating factor about objects is their behavior. It’s exposed through the methods. Or, more formally, through the messages the object supports. The protocol or interface is the collection of the supported messages.

For this reason the first step to define equality should be based on interfaces. Then an Equality object will represent the equality between two objects with the same interface.

In this way aCat will be always different from aDog because the different interfaces. Presumably the former implements a Cat interface, the latter a Dog interface. Nonetheless, thanks to polymorphism, if they both implement a Pet interface, then they could be equal. This could be possibile with anEquality limited to the Pet interface.

Here is an example related to the initial Book example. I defined two Equality classes to stress out that equality is not a Book responsibility.

interface Equality {
  Boolean equals();
}

final class TitleBasedEquality implements Equality {
  TitleBasedEquality(final Book book, final Book anotherBook) {
    this.book = book;
    this.anotherBook = anotherBook;
  }

  @Override
  public Boolean equals() {
    return book.title().equals(anotherBook.title());
  }

  private final Book book;
  private final Book anotherBook;
}

final class PrefixBasedEquality implements Equality {
  PrefixBasedEquality(final Book book, final Book anotherBook, final Integer length) {
    this.book = book;
    this.anotherBook = anotherBook;
    this.length = length;
  }

  @Override
  public Boolean equals() {
    var first = book.title().substring(length);
    var second = anotherBook.title().substring(length);
    return first.equals(second);
  }

  private final Book book;
  private final Book anotherBook;
  private final Integer length;
}

So a TitleBasedEquality object compares the full title. A PrefixBasedEqualitycompares only a prefix.

We gather a lot of flexibility. And we can choose the correct equality comparison based on the context. This is possible thanks to the responsibility decoupling.

However, as you can see, I’m using String.equals. I could replace it with StringEquality. But I consider this case as a reasonable compromise forced by the programming language.

A possible drawback of this approach regards an interface with only void methods. In this case each pair of instances are always equal. But this means that these type of objects are only and always comparated on their interface. I find it coherent and I find it respectful towards the objects.

Conclusion

Definitely the object equality problem is a tough problem. I think that the major issue is that we think equality in terms of data. But objects are not data. This is the reason because I support the idea of some sort of behavior-based comparison. After all the exhibited behavior is what distinguishes one object from another one. Nothing more.

Playwright on Steroids: Overcoming Limits With Object-Oriented Programming

November 14, 2023
playwright oop object thinking multithreading performance java

Uncover the Alias Pattern

May 11, 2022
oop design pattern object thinking

Object Thinking, Boundaries and Reality

January 29, 2022
oop object thinking java