Set - sivakrsna/Java-Interview GitHub Wiki

Set

  • A Set is a Collection that cannot contain duplicate elements.
  • Sets contain no pair of elements e1 and e2 such that e1.equals(e2)
  • At most one null element.
  • Set also adds a stronger contract on the behavior of the equals and hashCode operations, allowing Set instances to be compared meaningfully even if their implementation types differ.

General-Purpose Set Implementations

  1. HashSet
  2. TreeSet
  3. LinkedHashSet

SortedSet

  • A SortedSet is a Set that maintains its elements in ascending order. The elements are ordered using their natural ordering, or by a Comparator typically provided at sorted set creation time.
  • All elements inserted into a sorted set must implement the Comparable interface
  • Furthermore, all such elements must be mutually comparable: e1.compareTo(e2) (or comparator.compare(e1, e2)) must not throw a ClassCastException for any elements e1 and e2 in the sorted set.

NavigableSet

  • A NavigableSet may be accessed and traversed in either ascending or descending order.
  • A SortedSet extended with navigation methods reporting closest matches for given search targets.
  • Methods lower, floor, ceiling, and higher return elements respectively less than, less than or equal, greater than or equal, and greater than a given element, returning null if there is no such element.
  • The return values of navigation methods may be ambiguous in implementations that permit null elements. However, even in this case the result can be disambiguated by checking contains(null). To avoid such issues, implementations of this interface are encouraged to not permit insertion of null elements.

HashSet

  • Duplicate values are not allowed, Contains unique elements only.
  • It makes no guarantees as to the iteration order of the set; in particular
  • This class permits the null element.
  • This class implements the Set interface, backed by a hash table (actually a HashMap instance)
  • That this implementation is not synchronized.

Set s = Collections.synchronizedSet(new HashSet(...));

How HashSet internally work?

All the classes of Set interface internally backed up by Map. HashSet uses HashMap for storing its object internally. You must be wondering that to enter a value in HashMap we need a key-value pair, but in HashSet we are passing only one value. Actually the value we insert in HashSet acts as key to the map Object and for its value java uses a constant variable. So in key-value pair all the keys will have same value.

TreeSet

  • Elements are sorted by keys.
  • TreeSet implements the SortedSet interface
  • Duplicate values are not allowed, Contains unique elements only
  • TreeSet does not preserve the insertion order of elements
  • The elements are ordered using their natural ordering, or by a Comparator provided at set creation time, depending on which constructor is used.
  • This implementation provides guaranteed log(n) time cost for the basic operations (add, remove and contains).
  • Note that the ordering maintained by a set (whether or not an explicit comparator is provided) must be consistent with equals if it is to correctly implement the Set interface.
  • Fail-fast iterators throw ConcurrentModificationException on a best-effort basis.
  • This implementation is not synchronized.

SortedSet s = Collections.synchronizedSortedSet(new TreeSet(...));

LinkedHashSet

  • Maintains insertion order.
  • Duplicate values are not allowed, Contains unique elements only
  • Hash table and linked list implementation of the Set interface.
  • This implementation differs from HashSet in that it maintains a doubly-linked list running through all of its entries.
  • This linked list defines the iteration ordering, which is the order in which elements were inserted into the set (insertion-order).
  • Fail-fast iterators throw ConcurrentModificationException on a best-effort basis.
  • This implementation is not synchronized.

Set s = Collections.synchronizedSet(new LinkedHashSet(...));

HashSet is much faster than TreeSet (constant-time versus log-time for most operations) but offers no ordering guarantees. If you need to use the operations in the SortedSet interface, or if value-ordered iteration is required, use TreeSet; otherwise, use HashSet. It's a fair bet that you'll end up using HashSet most of the time.

Special-Purpose Set Implementations

  1. EnumSet
  2. CopyOnWriteArraySet

EnumSet

  • EnumSet is a high-performance Set implementation for enum types.
  • All of the members of an enum set must be of the same enum type. For example, given the enum declaration for the days of the week, you can iterate over the weekdays. The EnumSet class provides a static factory that makes it easy.
    for (Day d : EnumSet.range(Day.MONDAY, Day.FRIDAY))
        System.out.println(d);

Enum sets also provide a rich, typesafe replacement for traditional bit flags.

    EnumSet.of(Style.BOLD, Style.ITALIC)

CopyOnWriteArraySet

  • CopyOnWriteArraySet is a Set implementation backed up by a copy-on-write array.
  • It is thread-safe.
  • All mutative operations, such as add, set, and remove, are implemented by making a new copy of the array; no locking is ever required.
  • Even iteration may safely proceed concurrently with element insertion and deletion.
  • It is best suited for applications in which set sizes generally stay small, read-only operations vastly outnumber mutative operations, and you need to prevent interference among threads during traversal.

HashSet Vs TreeSet Vs LinkedHashSet

Performance and Speed : First difference between them comes in terms of speed. HashSet is fastest, LinkedHashSet is second on performance or almost similar to HashSet but TreeSet is bit slower because of sorting operation it needs to perform on each insertion. TreeSet provides guaranteed O(log(n)) time for common operations like add, remove and contains, while HashSet and LinkedHashSet offer constant time performance e.g. O(1) for add, contains and remove given hash function uniformly distribute elements in bucket.

Ordering : HashSet does not maintain any order while LinkedHashSet maintains insertion order of elements much like List interface and TreeSet maintains sorting order or elements.

Internal Implementation : HashSet is backed by an HashMap instance, LinkedHashSet is implemented using HashSet and LinkedList while TreeSet is backed up by NavigableMap in Java and by default it uses TreeMap.

null : Both HashSet and LinkedHashSet allows null but TreeSet doesn't allow null but TreeSet doesn't allow null and throw java.lang.NullPointerException when you will insert null into TreeSet. Since TreeSet uses compareTo() method of respective elements to compare them which throws NullPointerException while comparing with null, here is an example:

TreeSet cities
Exception in thread "main" java.lang.NullPointerException
        at java.lang.String.compareTo(String.java:1167)
        at java.lang.String.compareTo(String.java:92)
        at java.util.TreeMap.put(TreeMap.java:545)
        at java.util.TreeSet.add(TreeSet.java:238)

Comparison : HashSet and LinkedHashSet uses equals() method in Java for comparison but TreeSet uses compareTo() method for maintaining ordering. That's why compareTo() should be consistent to equals in Java. failing to do so break general contact of Set interface i.e. it can permit duplicates.

Similarities

  1. Duplicates : All three implements Set interface means they are not allowed to store duplicates.
  2. Thread safety : HashSet, TreeSet and LinkedHashSet are not thread-safe, if you use them in multi-threading environment where at least one Thread modifies Set you need to externally synchronize them.
  3. Fail-Fast Iterator : Iterator returned by TreeSet, LinkedHashSet and HashSet are fail-fast Iterator. i.e. If Iterator is modified after its creation by any way other than Iterators remove() method, it will throw ConcurrentModificationException with best of effort. read more about fail-fast vs fail-safe Iterator here