30장 Object Equality - codeport/scala GitHub Wiki

Equality in Scala

==, !=, equals : Natural equality
eq, ne : Reference equality

==는 override할 수 없다:

final def == (that: Any): Boolean =
  if (null eq this) {null eq that} else {this equals that}

Writing an equality method

Pitfall #1: Defining equals with the wrong signature.

class Point(val x: Int, val y: Int) {
  // An utterly wrong definition of equals
  def equals(other: Point): Boolean = this.x == other.x && this.y == other.y
}

이 equals 메소드는 Any.equals를 override 한 게 아니라 overload되록한 것이다. 그래서 Any.equals의 결과와 Point.equals의 결과가 다르다. Collections에서는 Any.equals을 사용할 것이기 때문에 모두 오작동한다.

override할 거면 아래처럼 해야 한다:

// A better definition, but still not perfect
override def equals(other: Any) = other match {
  case that: Point => this.x == that.x && this.y == that.y
  case _ => false
}

def ==(other: Point):Boolean 같은 메소드를 만들어도 overload된다. 절대로 이렇게 하지 말아야 하고 final def ==(other: Any):Boolean은 final이라 override할 수도 없다.

Pitfall #2: Changing equals without also changing hashCode

Hash Bucket 구조로 된 Collection은 equals과 hashCode는 함께 바꿔야 한다:

class Point(val x: Int, val y: Int) {
  override def hashCode = 41 * (41 + x) + y
  override def equals(other: Any) = other match {
    case that: Point => this.x == that.x && this.y == that.y
    case _ => false
  }
}

41이라는 소수를 더하고 곱한 것은 Distribution을 좋게 하려고 하는 건데, 자세한 건 밑에서 설명한다고(실제로도 설명하지만 나는 이해가 안된다ㅜㅜ)

Pitfall #3: Defining equals in terms of mutable fields

아래처럼 var로 만들지 말 것:

case class A(var x: Int)

val hs = HashSet[A]()
val a = A(1)
hs += a

println( hs.contains(a) ) // true
a.x = 2
println( hs.contains(a) ) // false

Pitfall #4: Failing to define equals as an equivalence relation

'not null' 객체에 대해서 Equivalence Relation이 돼야 한다.:

It is reflexive: for any non-null value x , the expression x.equals(x) should return true.
It is symmetric: for any non-null values x and y , x.equals(y) should return true if and only if y.equals(x) returns true.
It is transitive: for any non-null values x , y , and z , if x.equals(y) returns true and y.equals(z) returns true , then x.equals(z) should return true.
It is consistent: for any non-null values x and y , multiple invocations of x.equals(y) should consistently return true or consistently return false , provided no information used in equals comparisons on the objects is modified.
For any non-null value x , x.equals(null) should return false.

이 원칙이 지켜지는 것으로 구현한다. 그 외는 생략….

Defining equality for parameterize types

아래와 같은 Tree를 만든다고 하면:

trait Tree[+T] {
  def elem: T
  def left: Tree[T]
  def right: Tree[T]
}

class Branch[T](
  val elem: T,
  val left: Tree[T],
  val right: Tree[T]) extends Tree[T] {

  ...

}

equals은 모든 엘리먼트가 포함되게 한다:

override def equals(other: Any) = other match {
  case that: Branch[_] => (that canEqual this) &&
    this.elem == that.elem &&
    this.left == that.left &&
    this.right == that.right
  case _ => false
}

Runtime에 엘리먼트 타입을 알 수 없기 때문에(Type Erasure) Pattern Matching에서 Branch[T]같은 엘리먼트 타입은 의미가 없다. Scala 컴파일러가 보여주는 unchecked Warning은 Branch[_]로 하면 사라진다.

canEqual은 아래처럼 만든다:

def canEqual(other: Any) = other.isInstanceOf[Branch[_]]

canEqual을 하는 이유는 타입도 비교해야 하기 때문이다. 아래와 같은 예를 보면 이해가 쉽다:

class Cranch[T](elem: T) extends Branch[T](elem, EmptyTree, EmptyTree){}
class Dranch[T](elem: T) extends Branch[T](elem, EmptyTree, EmptyTree){}

scala> val b1 = new Cranch(1)
b1: Cranch[Int] = Cranch@22c898d4

scala> val b2 = new Dranch(1)
b2: Dranch[Int] = Dranch@22c898d4

scala> b1 == b2
res0: Boolean = true

b1과 b2는 타입이 다른데 true라고 나온다. 아래 와 같이 canEqual을 다시 구현하면 된다:

class Cranch[T](elem: T) extends Branch[T](elem, EmptyTree, EmptyTree){
  override def canEqual(other: Any) = other.isInstanceOf[Cranch[_]]
}
class Dranch[T](elem: T) extends Branch[T](elem, EmptyTree, EmptyTree){
  override def canEqual(other: Any) = other.isInstanceOf[Dranch[_]]
}

이쯤 되면 왜 하필 Branch[_]로 표기하는지 궁금하다. Type Erasure로 어차피 타입이 사라지고 Branch[T], Branch도 있는데 Branch[_]를 사용한다. Branch[_]는 Existential Type이고, 간단히 말해서 엘리먼트 타입을 모를 때 _로 표기한다는 것…. 같은데…. (응?!)

책에 나오는 Existential Type의 정의:

An existential type includes references to type variables that are unknown. For example, Array[T] forSome { type T } is an existential type. It is an array of T , where T is some completely unknown type. All that is assumed about T is that it exists at all. This assumption is weak, but it means at least that an Array[T] forSome { type T } is indeed an array and not a banana.

hashCode는 소수를 곱하고 더해서 만든다. 빠진 필드 없이 곱하고 더한다. Overflow시 정보 손실을 적게 하려고 소수(odd prime number)를 이용한다는데 이것도 모르겠다:

override def hashCode: Int =
  41 * (
    41 * (
      41 + elem.hashCode
    ) + left.hashCode
  ) + right.hashCode