13. Collections - RobertMakyla/scalaWiki GitHub Wiki
Collection Traits
Traversable (foreach, ++, map, flatmap, collect, toList/Set/Seq/Map, is/nonEmpty, head/last/headOption/tail, find)
Iterable (iterator)
Seq (has length, elements have fixed index starting from 0)
LinearSeq
MutableList
List (very efficient if algorithm processes only head)
Stream
IndexedSeq
Vector (gives efficient access to other elements, not only head)
Range
Buffer (allows adding/updating elements)
ListBuffer (backed by List, supports efficient conversion to List)
ArrayBuffer (backed by Array, supports efficient conversion to Array)
Set (unordered, guarantees no duplicates)
SortedSet
TreeSet
LinkedHashSet
HashSet (immutable)
HashSet (mutable)
BitSet
EmptySet, Set1, Set2, Set3, Set4
Map
SortedMap
TreeMap
ListMap (immutable) more efficient from standard map only if it's constructed in a way that first element is listed much more often than others.
LinkedHashMap (mutable)
HashMap (immutable)
HashMap (mutable)
EmptyMap, Map1, Map2, Map3, Map4
Iterable - all collections have iterator
val it = Seq(1,2,3).iterator
while( it.hasNext ){
print( it.next )
}
Each Collection has companion object: List(1) instead of new List(1)
Mutable and Immutable Collections
Immutable collections can be passed as arguments safely.
scala.collection.Map
_______________|______________
| |
scala.collection.mutable.Map scala.collection.immutable.Map
Scala prefers immutable collections:
Map(1->"a", 2->"b") // gives scala.collection.immutable.Map[Int, String]
Don't need to import it because where are links to immutable collections: Predef.Map
Predef.Set
When I import mutable package :
import scala.collection.mutable
val myMap = Map() //despite import, still immutable map (from Predef.Map)
val myMap = mutable.Map //mutable Map
Seq << trait >> Seq |_________________________________________ | | << trait >> | IndexedSeq other immutable sequences _| (List, Stack, Queue, LinkedList, ...) | | Vector Range
- immutable
- shallow tree (efficient access to all elements, not only head as in list)
- up to 32 elements can be represented in a single node
- immutable
- integer sequence
- stores only first element, last element and increment value
- e.g.: 1 to 3 // 1, 2, 3
- e.g.: 1 until 3 // 1, 2
- e.g.: 5 to (1,-2) // 5, 3, 1
List supports fast addition to head or splitting to head/tail as it's binary tree (says M. Odersky) That's why it's perfect for pattern matching in recursive functions
(Note: in Pattern matching it's always h :: tail)
list
/ \
0 / \
1 / \
2 3
List supports fast access to head.
List doesn't support fast access to last element as it requires iterating over all elements linearly.
A list is either empty, or it has a head and a tail which is again a list (empty or not).
List(1,2,3).head // --> 1
List(1,2,3).tail // --> (2,3)
List(1,2,3).reverse // --> (3,2,1)
1 :: List(2,3) // List (1,2,3)
1 :: 2 :: 3 :: Nil // List (1,2,3) , where 'Nil' is empty List
1 :: (2 :: (3 :: Nil)) // exactly the same
Note: Each operator ending with ':' is a right-side operator
1 :: 2 :: 3 :: Nil is exactly Nil.::(3).::(2).::(1)
Scala way of counting elements is recursion (in java it was iterating through list) :
def mySum(lst: List[Int]):Int = if(lst == Nil) 0 else lst.head + mySum(lst.tail)
mySum( 1::2::3::Nil ) // --> 6
mySum( List(1, 2, 3) ) // --> 6
or with tail recursion:
def mySumTailRec(lst: List[Int], total:Int = 0):Int = if(lst == Nil) total else mySumTailRec(lst.tail, lst.head + total)
mySumTailRec ( 1::2::3::Nil) // --> 6
or just
1::2::3::Nil sum // --> 6
List(1, 2, 3).sum // --> 6
def reverse[T](xs: List[T]): List[T] = xs match {
case Nil => xs
case h :: tail => reverse(tail) ::: List(h) // tail can be Nil and it's OK
}
The List type in Scala is covariant. This means that for each pair of types S and T, if S is a subtype of T, then List[S] is a subtype of List[T].
scala> List.fill(3)("hello")
res1: List[String] = List(hello, hello, hello)
scala> List.fill(2, 3)('b')
res2: List[List[Char]] = List(List(b, b, b), List(b, b, b))
scala> val squares = List.tabulate(5)(index => index * index)
squares: List[Int] = List(0, 1, 4, 9, 16)
Functions which call themselves in their last action (in tail) are called tail recursive.
To achieve it, there must be no other operation near this last call.
This is Scala way of providing optimized repetition. It' almost as fast as imperative way, but without ugly vars :
var total = 0 // Iiicckkk :-/
var tmp = lst // GRRRRHHH :-/
while( tmp.size > 0) {
total += tmp.head
tmp = tmp.tail
}
Scala compiles is to the same byte code as while loop, which doesn't consume stack frames for each iteration. This prevents StackOverflow when there's many recursive calls.
The use of tail recursion in Scala is limited because implementing more advanced forms of tail recursion is very difficult in JVM.
Scala only optimizes directly recursive calls back to the same function making the call. If the recursion is indirect, no optimization is possible:
// here we don't profit from optimal tail recursion
def isEven(x: Int): Boolean = if (x == 0) true else isOdd(x - 1)
def isOdd(x: Int): Boolean = if (x == 0) false else isEven(x - 1)
You also won't get a tail-call optimization if the final call goes to a function value:
val funValue = myFun _ // function value, which wraps myFun
def myFun(x: Int) : Unit = {
if (x != 0) { println(x); funValue(x - 1) }
}
Scala compiler performs a tail-call optimization, when last operation is the call to the recursive function without going through function value or some other intermediary which makes the call indirect
To make sure that Scala compiler does optimization, we can use @tailrec annotation which will cause compiler error if optimization is impossible.
Arrays are mutable (holding elements under 0-based indexes):
val myArray = Array(0,1,2)
myArray(0) = 9 //now myArray is (9,1,2)
Has all functionality of Array, but additionally it can add/remove elements to the beginning/end of sequence. As it wraps Array, it's a bit slower.
val arrbuf = scala.collection.mutable.ArrayBuffer.empty[Int]
buf += 1 // ArrayBuffer(1)
When you implement ArrayBuffer you don't need to specify length (in case of Array you must specify it)
List has fast access to first element but slow access to last element, so when I construct a list by appending (to the end) it's better to pre-pend (to the beginning) each element and reverse whole list at the end.
To avoid reversing I can use mutable ListBuffer which is more efficient for appending. Timing of appending/prepending to ListBuffer is constant. At the end just call.toList() to get a list.
ListBuffer is like an array buffer except that it uses a linked list internally instead of an array. If you plan to convert any buffer to a list once it is built up, use a list buffer instead of an array buffer.
Works exactly like immutable with the difference that I can change head or tail, using elem/next:
val lst = scala.collection.mutable.LinkedList(1, 2, 3)
lst.elem = 100 // --> lst is now (100,2,3)
lst.next = scala.collection.mutable.LinkedList(200, 300) // --> lst is now (100,200,300)
Changing negative values to zero (possible to use elem/next to read from it, instead of head/tail)
val lst = scala.collection.mutable.LinkedList(-1, 2, 3)
var cur = lst // lst (-1, 2, 3) cur (-1, 2, 3)
while (cur != Nil) {
if (cur.elem < 0) cur.elem = 0 // lst (0, 2, 3) cur (0, 2, 3) - modifying the list hold by 2 refs
cur = cur.next // lst (0, 2, 3) cur (2, 3) - cur.next returns reduced list
} // and lst holds modified but complete list
At the end, cur is empty, and lst is (0, 2, 3)
Simpler version:
val lst = collection.mutable.LinkedList(-1, 2, 3)
for(i <- 0 until lst.length) if(lst(i)<0) lst(i)=0
lst
Sets (unordered)
Trying to add an element that already exists - MAKES NO EFFECT
Set(2, 0, 1) + 1 // --> (2, 0, 1)
Iterating over Set(1, 2, 3, 4, 5, 6) gives 5 1 6 2 3 4 cause hash sets re-order elements for faster iterating
That's why iterating over hash set - much faster than over list/array
To have a set sorted, use:
scala.collection.immutable.SortedSet(1, 2, 3, 4, 5, 6)
(Scala 2.9 doesn't have mutable sorted set)
Set(1, 2, 3) contains 1 // --> true
Set(1) subsetOf Set(1, 2, 3) // --> true
Set(1) union Set(2) // Set (1,2)
Set(1) | Set(2) // Set (1,2) the same
Set(1) ++ Set(2) // Set (1,2) the same
Set(1, 2) diff Set(2) // Set (1)
Set(1, 2) &~ Set(2) // Set (1) the same
Set(1, 2) -- Set(2) // Set (1) the same
Set(1, 2, 3) intersect Set(2, 9) // Set (2)
Set(1, 2, 3) & Set(2, 9) // Set (2) the same
To maximize performance, for map/set smaller than 5 elements scala doesn't use hash tables. When set/map has 5 or more elements then the implementation uses hash tables.
Number of elements | Implementation |
---|---|
0 | scala.collection.immutable.EmptySet |
1 | scala.collection.immutable.Set1 |
2 | scala.collection.immutable.Set2 |
3 | scala.collection.immutable.Set3 |
4 | scala.collection.immutable.Set4 |
5 or more | scala.collection.immutable.HashSet |
Number of elements | Implementation |
---|---|
0 | scala.collection.immutable.EmptyMap |
1 | scala.collection.immutable.Map1 |
2 | scala.collection.immutable.Map2 |
3 | scala.collection.immutable.Map3 |
4 | scala.collection.immutable.Map4 |
5 or more | scala.collection.immutable.HashMap |
Operators for Adding or Removing Elements
coll :+ elem Seq(Vector/Range/List) append to collection (creating a new one)
elem +: coll prepend before collection
coll + elem Seq/Map returns collection with additional elem
coll + (e1,e2,..)
coll - elem Seq/Map/ArrayBuffer returns collection without elem
coll - (e1,e2,..)
coll ++ coll2 Iterable creating collection with elements from both
( the same as coll ++: coll2)
coll -- coll2 Seq/Map/ArrayBuffer creating collection with elements from coll but not coll2
elem :: lst List PREPENDING elem to list
1 :: 2 :: Nil List PREPENDING elem to list
lst2 ::: lst List ( the same as lst2 ++: list or lst +: list )
set | set Set the same as union or ++
set &~ set the same as diff or --
set & set the same as intersect
Mutations
coll += elem Mutable Collections mutates collection by APPENDING
coll += (e1,e2,..)
coll ++= coll2
coll -= elem
coll -= (e1,e2,..)
coll --= coll2
- elem +=: coll ArrayBuffer mutates collection by PREPENDING elem/coll2
- coll2 ++=: coll
Methods of Iterable
Iterable(1,2,3).head // --> 1
Iterable(1,2,3).tail // --> (2,3)
Iterable(1,2,3).init // --> (1,2)
Iterable(1,2,3).last // --> 3
length isEmpty
sum max min
product // all elements multiplied
map(f), foreach(f), flatMap(f) collect(f) //applies function to each element
Map(1->100, 2->200 ) map ((t:Tuple2[Int,Int]) => t._1 + t._2) // List(101, 202)
Map(1->100, 2->200 map ( t => t._1 + t._2) // the same
Iterable("a","b","c") map (_.toUpperCase) // A B C
Iterable(1,2,3) map (_*2) // 1,4,6
collect() // filter + map
List("a", 1, true, "b") collect { case s: String => s.toUpperCase } // List("A", "B")
"-3+4".collect { case '+' => 1 ; case '-' => -1 } // Vector(-1, 1)
reduceLeft(op) reduceRight(op) // applies operation e.g. (a:Int,b:Int)=>a+b to each elem in given order
List(1, 7, 2, 9).reduceLeft(_ - _) // ((1 - 7) - 2) - 9
List(1, 7, 2, 9).reduceRight(_ - _) // 1 - (7 - (2 - 9))
foldLeft(op) foldRight(op) -- to start with something else
List(1, 7, 2, 9).foldLeft(0)(_ - _) // 0 - 1 - 7 - 2 - 9
List(1, 7, 2, 9).foldLeft("")(_ + _) // 1729
scanLeft() scanRight() -- combines mapping and folding
(1 to 10).scanLeft(0)(_ + _) // Vector(0, 1, 3, 6, 10, 15, 21, 28, 36, 45, 55)
// e.g. 6 = 0+1+2+3
filter(cond) filterNot(cond) // filters under condition, e.g. Iterable(1,2,3) filter( _ > 2)
take(n) drop(n) takeRight(n) dropRight(n)
slice( index_first_included, index_first_excluded )
zip(coll)
val a = List("a","b","c")
val b = List(1,2,3)
a zip b // List((a,1), (b,2), (c,3))
a zip b map ( p => p._1 * p._2 ) // applying function to each pair: List(a, bb, ccc)
a zip b map( (p:(String,Int)) => p._1 * p._2 ) // the same but with declared type
mkString(before, separator, after)
Iterable("a","b","c") mkString("<","-",">") // <a-b-c>
Iterators
val myIter = Iterable(1,2,3).iterator // iterator
while (myIter.hasNext) println(myIter.next() )
val myIter = Iterable(1,2,3).iterator // iterator
for( i <- myIter) println(i)
val myColl = Iterable(1,2,3) // collection
for( i <- myColl) println(i)
After one iteration, Iterator is EMPTY !!!
After calling a method such as map, filter, count, sum, or even length,
the iterator is at the end of the collection, and you can't use it again.
I can use a method such as toArray, toIterable, toSeq, toSet, or toMap to copy the values into a collection
val toArray = Iterable(1,2,3).iterator toArray // array
for (i <- toArray) println(i)
for (i <- toArray) println(i)
for (i <- toArray) println(i) // many times, gets the same results (array never gets empty)
val myColl = Iterable(1,2,3) // collection
for( i <- myColl) println(i)
for( i <- myColl) println(i) // many times, gets the same results (collection never gets empty)
Streams - immutable List, only the first element is evaluated
Is an immutable alternative to iterators.
val myTmpStream = 1 #:: 2 #:: 3 #:: 40 #:: 500 #:: Stream.empty // #:: is like :: for lists but it creates streams
def createStream(n: BigInt): Stream[BigInt] = if(n < 0) Stream.empty else n #:: createStream(n - 1)
val myStream = createStream(10)
Stream is immutable List which is computed lazily. Only the first element is evaluated
myStream // Stream(10, ?) stream not computed at all
Stream evaluate everything till the element that is asked for. Every evaluation until this element is cached
myStream.take(4).force // Stream(10, 9, 8, 7)
myStream // Stream(10, 9, 8, 7, ?) stream not computed from 5th element
myStream force // Stream(10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0)
Stream is immutable List in which the tail is computed lazily.
myStream tail // Stream(9, ?) tail not computed yet
myStream.tail.force // Stream(9, 8, 7, 6, 5, 4, 3, 2, 1, 0) computing the tail
myStream.tail.toList // another way of computing
myStream tail // Stream(9, 8, 7, 6, 5, 4, 3, 2, 1, 0)
myStream // Stream(10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0)
Lazy Views
It's like streams but totally unevaluated. Even the first element is unevaluated
val myView = (0 to 10).view.map( _ * 2 ) // nothing evaluated yet
myView(5) // just pow(10,5) is computed, the rest is not.
// in Streams it was computed and cached for every element until 5th
Views evaluate only the element that is asked for. Nothing before is evaluated or cached.
Views are good for making things optimal :
(0 to 10).map( math.pow(10,_) ).map( 1 / _ )
(0 to 10).view.map( math.pow(10,_) ).map( 1 / _ ).force
In first way, first operation is applied to every element, and the second on the result collection.
In second way, both operations are applied to each element, without building an intermediate collection.
So second is better !!!
It's the same as:
(0 to 10).map(x => math.pow(10, -x))
Although sometimes it's not easy to make 1 function out of two.
Interoperability with Java Collections - each scala collection (mutable or not) can be converted to java and the other way around.
import scala.collection.JavaConversions._
*Threadsafe Collections
SynchronizedBuffer
SynchronizedMap
SynchronizedPriorityQueue
SynchronizedQueue
SynchronizedSet
SynchronizedStack
e.g. :
val scores = new scala.collection.mutable.HashMap[String, Int] with
scala.collection.mutable.SynchronizedMap[String, Int]
*Parallel Collections
(1 to 10).par.sum // parallelize the sum of elements among different threads
//and add results at the end
(1 to 10).par.count(_ % 2 == 0) // parallelize the count of every element
// and adds the results at the end
par() method produces parallel implementation of collection. This implementation extends one of the 3 :
<< trait >>
ParIterable
______________|______________
| | |
| | |
<< trait >> << trait >> << trait >> ParSeq ParSet ParMap
for (i <- (0 until 10).par) print(i + " ") // different output every time e.g. : 5 0 1 7 8 2 9 3 6 4
// here, different threads are iterating different parts
for (i <- (0 until 10).par) yield i + " " // the result is always the same: 1 2 3 4 5 6 7 8 9
// here, the result is assembled in order
Parallel computations should not mutate shared variables - then the result is unpredictable.
var count = 0
for (i <- (0 until 10).par) { if (i % 2 == 0) count += 1 } // count value is never sure at the end
Martin Odersky exercise:
// last element
def myLast[T] (xs: List[T] ): T = {
xs match {
case Nil => throw new Error ("no such element")
case List(x) => x
case y :: ys => myLast (ys)
}
} // complexity = n
// everything without last element
def myInit[T] (xs: List[T] ): T = {
xs match {
case Nil => throw new Error ("no such element")
case List(_) => Nil
case y :: ys => y :: myInit (ys)
}
}
// adding 2 lists - concat
def myConcat[T] (xs: List[T], ys: List[T] ): List[T] = {
xs match {
case Nil => ys
case z :: zs => z :: myConcat (zs, ys)
}
} // complexity = n
def myReverse[T](xs: List[T]): List[T] = {
xs match {
case Nil => xs
case z :: zs => myConcat(myReverse(zs),List(z))
}
} // complexity = n^2 cause we use concat(of complexity n) and we call myReverse recursively