13. Collections - RobertMakyla/scalaWiki GitHub Wiki

Collection Traits

Traversable   (foreach, ++, map, flatmap, collect, toList/Set/Seq/Map, is/nonEmpty, head/last/headOption/tail, find)
      Iterable (iterator)
          Seq (has length, elements have fixed index starting from 0)
              LinearSeq
                  MutableList
                  List (very efficient if algorithm processes only head)
                  Stream
              IndexedSeq
                  Vector (gives efficient access to other elements, not only head)
                  Range
              Buffer  (allows adding/updating elements)
                  ListBuffer (backed by List, supports efficient conversion to List)
                  ArrayBuffer (backed by Array, supports efficient conversion to Array)
          Set (unordered, guarantees no duplicates)
              SortedSet
                  TreeSet
              LinkedHashSet
              HashSet (immutable)
              HashSet (mutable)
              BitSet
              EmptySet, Set1, Set2, Set3, Set4
          Map
              SortedMap
                  TreeMap
              ListMap (immutable) more efficient from standard map only if it's constructed in a way that first element is listed much more often than others.
              LinkedHashMap (mutable)
              HashMap (immutable)
              HashMap (mutable)
              EmptyMap, Map1, Map2, Map3, Map4

Iterable - all collections have iterator

         val it = Seq(1,2,3).iterator
         while( it.hasNext ){
             print( it.next )
         }

Each Collection has companion object: List(1) instead of new List(1)

Mutable and Immutable Collections

 Immutable collections can be passed as arguments safely.

                          scala.collection.Map
                   _______________|______________
                  |                              |
     scala.collection.mutable.Map     scala.collection.immutable.Map

 Scala prefers immutable collections:

      Map(1->"a", 2->"b")     // gives scala.collection.immutable.Map[Int, String]

 Don't need to import it because where are links to immutable collections: Predef.Map
                                                                           Predef.Set
 When I import mutable package :

     import scala.collection.mutable

     val myMap = Map()                //despite import, still immutable map (from Predef.Map)
     val myMap = mutable.Map          //mutable Map

Vector

 - immutable
 - shallow tree (efficient access to all elements, not only head as in list)
 - up to 32 elements can be represented in a single node

Range

 - immutable
 - integer sequence
 - stores only first element, last element and increment value
 - e.g.: 1 to 3          // 1, 2, 3
 - e.g.: 1 until 3       // 1, 2
 - e.g.: 5 to (1,-2)     // 5, 3, 1

Lists

List supports fast addition to head or splitting to head/tail as it's binary tree (says M. Odersky) That's why it's perfect for pattern matching in recursive functions

(Note: in Pattern matching it's always h :: tail)

   list
   / \
  0  / \
    1  / \
      2   3

List supports fast access to head.

List doesn't support fast access to last element as it requires iterating over all elements linearly.

A list is either empty, or it has a head and a tail which is again a list (empty or not).

     List(1,2,3).head         // --> 1
     List(1,2,3).tail         // --> (2,3)
     List(1,2,3).reverse      // --> (3,2,1)

     1 :: List(2,3)           // List (1,2,3)

     1 ::  2 ::  3 :: Nil     // List (1,2,3)      , where 'Nil' is empty List
     1 :: (2 :: (3 :: Nil))   // exactly the same

Note: Each operator ending with ':' is a right-side operator

     1 ::  2 ::  3 :: Nil    is exactly    Nil.::(3).::(2).::(1)

 Scala way of counting elements is recursion (in java it was iterating through list) :

     def mySum(lst: List[Int]):Int  =  if(lst == Nil) 0 else lst.head + mySum(lst.tail)
     mySum( 1::2::3::Nil )              // --> 6
     mySum( List(1, 2, 3) )             // --> 6

 or with tail recursion:

     def mySumTailRec(lst: List[Int], total:Int = 0):Int  =  if(lst == Nil) total else mySumTailRec(lst.tail, lst.head + total)

     mySumTailRec ( 1::2::3::Nil)    // --> 6

 or just

     1::2::3::Nil sum        // --> 6
     List(1, 2, 3).sum       // --> 6

List - pattern Matching

  def reverse[T](xs: List[T]): List[T] = xs match {
    case Nil => xs
    case h :: tail => reverse(tail) ::: List(h) // tail can be Nil and it's OK
  }

List covariance

The List type in Scala is covariant. This means that for each pair of types S and T, if S is a subtype of T, then List[S] is a subtype of List[T].

List - fill()

scala> List.fill(3)("hello")
res1: List[String] = List(hello, hello, hello)

scala> List.fill(2, 3)('b')
res2: List[List[Char]] = List(List(b, b, b), List(b, b, b))

List - tabulate()

scala> val squares = List.tabulate(5)(index => index * index)
squares: List[Int] = List(0, 1, 4, 9, 16)

Tail Recursion

Functions which call themselves in their last action (in tail) are called tail recursive.

To achieve it, there must be no other operation near this last call.

This is Scala way of providing optimized repetition. It' almost as fast as imperative way, but without ugly vars :

var total = 0  // Iiicckkk :-/
var tmp = lst  // GRRRRHHH :-/
while( tmp.size > 0) {
   total += tmp.head
   tmp = tmp.tail
}

Scala compiles is to the same byte code as while loop, which doesn't consume stack frames for each iteration. This prevents StackOverflow when there's many recursive calls.

Limits of Tail Recursion

The use of tail recursion in Scala is limited because implementing more advanced forms of tail recursion is very difficult in JVM.

Scala only optimizes directly recursive calls back to the same function making the call. If the recursion is indirect, no optimization is possible:

// here we don't profit from optimal tail recursion
def isEven(x: Int): Boolean = if (x == 0) true else isOdd(x - 1)
def isOdd(x: Int): Boolean = if (x == 0) false else isEven(x - 1)

You also won't get a tail-call optimization if the final call goes to a function value:

val funValue = myFun _ // function value, which wraps myFun
def myFun(x: Int) : Unit = { 
  if (x != 0) { println(x); funValue(x - 1) }
}

Scala compiler performs a tail-call optimization, when last operation is the call to the recursive function without going through function value or some other intermediary which makes the call indirect

To make sure that Scala compiler does optimization, we can use @tailrec annotation which will cause compiler error if optimization is impossible.

Arrays (mutable, size don't change)

Arrays are mutable (holding elements under 0-based indexes):

val myArray = Array(0,1,2)
myArray(0) = 9 //now myArray is (9,1,2)

ArrayBuffer (mutable, size can change)

Has all functionality of Array, but additionally it can add/remove elements to the beginning/end of sequence. As it wraps Array, it's a bit slower.

val arrbuf = scala.collection.mutable.ArrayBuffer.empty[Int]
buf += 1  // ArrayBuffer(1)

When you implement ArrayBuffer you don't need to specify length (in case of Array you must specify it)

ListBuffer (mutable, size can change)

List has fast access to first element but slow access to last element, so when I construct a list by appending (to the end) it's better to pre-pend (to the beginning) each element and reverse whole list at the end.

To avoid reversing I can use mutable ListBuffer which is more efficient for appending. Timing of appending/prepending to ListBuffer is constant. At the end just call.toList() to get a list.

ListBuffer is like an array buffer except that it uses a linked list internally instead of an array. If you plan to convert any buffer to a list once it is built up, use a list buffer instead of an array buffer.

Mutable Lists

 Works exactly like immutable with the difference that I can change head or tail, using elem/next:

    val lst = scala.collection.mutable.LinkedList(1, 2, 3)

    lst.elem = 100                                                  // --> lst is now (100,2,3)
    lst.next = scala.collection.mutable.LinkedList(200, 300)        // --> lst is now (100,200,300)

 Changing negative values to zero (possible to use elem/next to read from it, instead of head/tail)

    val lst = scala.collection.mutable.LinkedList(-1, 2, 3)
    var cur = lst                            // lst (-1, 2, 3)   cur (-1, 2, 3)
    while (cur != Nil) {
        if (cur.elem < 0) cur.elem = 0       // lst (0, 2, 3)   cur (0, 2, 3) - modifying the list hold by 2 refs
        cur = cur.next                       // lst (0, 2, 3)   cur (2, 3)    - cur.next returns reduced list
    }                                        // and lst holds modified but complete list

 At the end, cur is empty, and lst is (0, 2, 3)

 Simpler version:

     val lst = collection.mutable.LinkedList(-1, 2, 3)
     for(i <- 0 until lst.length) if(lst(i)<0) lst(i)=0
     lst

Sets (unordered)

 Trying to add an element that already exists - MAKES NO EFFECT

     Set(2, 0, 1) + 1                       //  -->  (2, 0, 1)

 Iterating over Set(1, 2, 3, 4, 5, 6)  gives  5 1 6 2 3 4  cause hash sets re-order elements for faster iterating

 That's why iterating over hash set - much faster than over list/array

 To have a set sorted, use:

     scala.collection.immutable.SortedSet(1, 2, 3, 4, 5, 6)

 (Scala 2.9 doesn't have mutable sorted set)


     Set(1, 2, 3) contains 1              // --> true

     Set(1) subsetOf Set(1, 2, 3)         // --> true

     Set(1) union Set(2)                  // Set (1,2)
     Set(1) |     Set(2)                  // Set (1,2)    the same
     Set(1) ++    Set(2)                  // Set (1,2)    the same

     Set(1, 2) diff Set(2)                // Set (1)
     Set(1, 2) &~   Set(2)                // Set (1)    the same
     Set(1, 2) --   Set(2)                // Set (1)    the same

     Set(1, 2, 3) intersect Set(2, 9)     // Set (2)
     Set(1, 2, 3) &         Set(2, 9)     // Set (2)    the same

Map and Set - Implementations

To maximize performance, for map/set smaller than 5 elements scala doesn't use hash tables. When set/map has 5 or more elements then the implementation uses hash tables.

Number of elements	Implementation
0	scala.collection.immutable.EmptySet
1	scala.collection.immutable.Set1
2	scala.collection.immutable.Set2
3	scala.collection.immutable.Set3
4	scala.collection.immutable.Set4
5 or more	scala.collection.immutable.HashSet

Number of elements	Implementation
0	scala.collection.immutable.EmptyMap
1	scala.collection.immutable.Map1
2	scala.collection.immutable.Map2
3	scala.collection.immutable.Map3
4	scala.collection.immutable.Map4
5 or more	scala.collection.immutable.HashMap

Operators for Adding or Removing Elements

 coll :+ elem              Seq(Vector/Range/List)  append to collection (creating a new one)
 elem +: coll                                      prepend before collection

 coll + elem               Seq/Map                 returns collection with additional elem
 coll + (e1,e2,..)

 coll - elem               Seq/Map/ArrayBuffer     returns collection without elem
 coll - (e1,e2,..)

 coll ++ coll2             Iterable                creating collection with elements from both
                                                   ( the same as   coll ++: coll2)

 coll -- coll2             Seq/Map/ArrayBuffer     creating collection with elements from coll but not coll2

 elem :: lst               List                    PREPENDING elem to list
 1 :: 2 :: Nil             List                    PREPENDING elem to list
 lst2 ::: lst              List                    ( the same as   lst2 ++: list   or   lst +: list )

 set | set                 Set                     the same as    union          or   ++
 set &~ set                                        the same as    diff           or   --
 set & set                                         the same as    intersect

Mutations

 coll += elem             Mutable Collections     mutates collection by APPENDING
 coll += (e1,e2,..)
 coll ++= coll2

 coll -= elem
 coll -= (e1,e2,..)
 coll --= coll2

elem +=: coll ArrayBuffer mutates collection by PREPENDING elem/coll2
coll2 ++=: coll

Methods of Iterable

 Iterable(1,2,3).head         // --> 1
 Iterable(1,2,3).tail         // --> (2,3)

 Iterable(1,2,3).init         // --> (1,2)
 Iterable(1,2,3).last         // --> 3

 length  isEmpty
 sum  max  min
 product                      // all elements multiplied

 map(f), foreach(f), flatMap(f) collect(f)    //applies function to each element

     Map(1->100, 2->200 )   map ((t:Tuple2[Int,Int]) => t._1 + t._2)   // List(101, 202)
     Map(1->100, 2->200     map ( t                  => t._1 + t._2)   // the same

     Iterable("a","b","c")  map (_.toUpperCase)     // A B C
     Iterable(1,2,3)        map (_*2)              // 1,4,6


 collect() // filter + map 

     List("a", 1, true, "b") collect { case s: String => s.toUpperCase } // List("A", "B")
     "-3+4".collect { case '+' => 1 ; case '-' => -1 } // Vector(-1, 1)

 reduceLeft(op) reduceRight(op)      // applies operation e.g. (a:Int,b:Int)=>a+b to each elem in given order

     List(1, 7, 2, 9).reduceLeft(_ - _)       // ((1 - 7) - 2) - 9
     List(1, 7, 2, 9).reduceRight(_ - _)      // 1 - (7 - (2 - 9))

 foldLeft(op) foldRight(op) -- to start with something else

     List(1, 7, 2, 9).foldLeft(0)(_ - _)      // 0 - 1 - 7 - 2 - 9
     List(1, 7, 2, 9).foldLeft("")(_ + _)    // 1729

 scanLeft() scanRight() -- combines mapping and folding

     (1 to 10).scanLeft(0)(_ + _)     //   Vector(0, 1, 3, 6, 10, 15, 21, 28, 36, 45, 55)
                                     //   e.g.             6 = 0+1+2+3

 filter(cond) filterNot(cond)       // filters under condition, e.g. Iterable(1,2,3) filter( _ > 2)

 take(n) drop(n) takeRight(n) dropRight(n)

 slice( index_first_included, index_first_excluded )

 zip(coll)

     val a = List("a","b","c")
     val b = List(1,2,3)
     a zip b                           //  List((a,1), (b,2), (c,3))

     a zip b map ( p => p._1 * p._2 )  // applying function to each pair:    List(a, bb, ccc)
     a zip b map( (p:(String,Int)) => p._1 * p._2 )   // the same but with declared type

 mkString(before, separator, after)

     Iterable("a","b","c") mkString("<","-",">")        //   <a-b-c>

Iterators

     val myIter = Iterable(1,2,3).iterator           // iterator
     while (myIter.hasNext) println(myIter.next() )


     val myIter = Iterable(1,2,3).iterator           // iterator
     for( i <- myIter)  println(i)

     val myColl = Iterable(1,2,3)                    // collection
     for( i <- myColl)  println(i)



 After one iteration, Iterator is EMPTY !!!

 After calling a method such as map, filter, count, sum, or even length,
 the iterator is at the end of the collection, and you can't use it again.

 I can use a method such as toArray, toIterable, toSeq, toSet, or toMap to copy the values into a collection

     val toArray = Iterable(1,2,3).iterator toArray    // array
     for (i <- toArray) println(i)
     for (i <- toArray) println(i)
     for (i <- toArray) println(i)                    // many times, gets the same results (array never gets empty)

     val myColl = Iterable(1,2,3)           // collection
     for( i <- myColl)  println(i)
     for( i <- myColl)  println(i)          // many times, gets the same results (collection never gets empty)

Streams - immutable List, only the first element is evaluated

 Is an immutable alternative to iterators.

     val myTmpStream = 1 #:: 2 #:: 3 #:: 40 #:: 500 #:: Stream.empty       //  #:: is like :: for lists but it creates streams

     def createStream(n: BigInt): Stream[BigInt] = if(n < 0) Stream.empty else n #:: createStream(n - 1)

     val myStream = createStream(10)

 Stream is immutable List which is computed lazily. Only the first element is evaluated

     myStream               // Stream(10, ?)  stream not computed at all

 Stream evaluate everything till the element that is asked for. Every evaluation until this element is cached

     myStream.take(4).force // Stream(10, 9, 8, 7)

     myStream               // Stream(10, 9, 8, 7, ?)  stream not computed from 5th element

     myStream force         // Stream(10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0)

 Stream is immutable List in which the tail is computed lazily.

     myStream tail          // Stream(9, ?)  tail not computed yet

     myStream.tail.force    // Stream(9, 8, 7, 6, 5, 4, 3, 2, 1, 0)  computing the tail
     myStream.tail.toList   // another way of computing

     myStream tail          // Stream(9, 8, 7, 6, 5, 4, 3, 2, 1, 0)

     myStream               // Stream(10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0)

Lazy Views

 It's like streams but totally unevaluated. Even the first element is unevaluated

     val myView = (0 to 10).view.map( _ * 2 )      // nothing evaluated yet

     myView(5)           // just pow(10,5) is computed, the rest is not.
                         // in Streams it was computed and cached for every element until 5th

 Views evaluate only the element that is asked for. Nothing before is evaluated or cached.

 Views are good for making things optimal :

     (0 to 10).map( math.pow(10,_) ).map( 1 / _ )

     (0 to 10).view.map( math.pow(10,_) ).map( 1 / _ ).force

In first way, first operation is applied to every element, and the second on the result collection.
In second way, both operations are applied to each element, without building an intermediate collection.
So second is better !!!

It's the same as:

    (0 to 10).map(x => math.pow(10, -x))

 Although sometimes it's not easy to make 1 function out of two.

Interoperability with Java Collections - each scala collection (mutable or not) can be converted to java and the other way around.

     import scala.collection.JavaConversions._

*Threadsafe Collections

 SynchronizedBuffer
 SynchronizedMap
 SynchronizedPriorityQueue
 SynchronizedQueue
 SynchronizedSet
 SynchronizedStack

 e.g. :

     val scores = new scala.collection.mutable.HashMap[String, Int] with
                      scala.collection.mutable.SynchronizedMap[String, Int]

*Parallel Collections

     (1 to 10).par.sum      // parallelize the sum of elements among different threads
                           //and add results at the end

     (1 to 10).par.count(_ % 2 == 0)  //  parallelize the count of every element
                                     // and adds the results at the end

 par() method produces parallel implementation of collection. This implementation extends one of the 3 :

             << trait >>
             ParIterable
    ______________|______________
    |             |             |
    |             |             |

<< trait >> << trait >> << trait >> ParSeq ParSet ParMap

     for (i <- (0 until 10).par) print(i + " ")     // different output every time e.g. : 5 0 1 7 8 2 9 3 6 4
                                                   // here, different threads are iterating different parts

     for (i <- (0 until 10).par) yield i + " "    // the result is always the same:  1 2 3 4 5 6 7 8 9
                                                 //  here, the result is assembled in order

 Parallel computations should not mutate shared variables - then the result is unpredictable.

     var count = 0
     for (i <- (0 until 10).par) { if (i % 2 == 0) count += 1 }        // count value is never sure at the end

Martin Odersky exercise:

  // last element
  def myLast[T] (xs: List[T] ): T = {
    xs match {
      case Nil => throw new Error ("no such element")
      case List(x) => x
      case y :: ys => myLast (ys)
    }
  } // complexity = n

  // everything without last element
  def myInit[T] (xs: List[T] ): T = {
    xs match {
      case Nil => throw new Error ("no such element")
      case List(_) => Nil
      case y :: ys => y :: myInit (ys)
    }
  }

  // adding 2 lists - concat
  def myConcat[T] (xs: List[T], ys: List[T] ): List[T]  = {
    xs match {
      case Nil => ys
      case z :: zs => z :: myConcat (zs, ys)
    }
  }  // complexity = n


  def myReverse[T](xs: List[T]): List[T] = {
    xs match {
      case Nil => xs
      case z :: zs => myConcat(myReverse(zs),List(z))
    }
  } // complexity = n^2   cause we use concat(of complexity n) and we call myReverse recursively