9.4 Union Find, Trie, Segment Tree, Index Tree - swchen1234/Algorithm GitHub Wiki

Union Find

应用

  • 原生操作:
    • 判断两个元素是否在同一群体
    • 合并多个群体
  • 派生操作:
    • 返回当前点所在群体的元素个数

    添加一个size array, 合并时只需更新parent size, 因为只关心parent size

    • 返回当前共有多少个群体

    添加一个count变量, 合并时只需count--

模版

    def countComponents(self, n, edges):
      """
        self.parents = [i for i in xrange(n)]
        # the height of the tree, always connect low height tree to high height tree, in order to keep the tree height low
        self.ranks = [1 for i in xrange(n)] 
        self.count = n

        for edge in edges:
            self.union(edge[0], edge[1])
        return self.count               
        
    def union(self, a, b):
        roota = self.find(a)
        rootb = self.find(b)
        if roota != rootb :
            if ranks[a] < ranks[b]:
                a, b = b, a
            self.parents[rootb] = roota
            if ranks[a] == ranks[b]:
               ranks[a] += 1
            self.count -= 1
        
    def find(self, a):
        if self.parents[a] == a:
            return a
        self.parents[a] = self.find(self.parents[a]) # path compression, s.t. future find takes close to O(constant time)
        return self.parents[a]

Problems

200. Number of Islands用bfs解更容易
305. Number of Islands II每次一块区域变成小岛,若仍用bfs,则O(mnk);而若用uf, 则O(mn+4k),其中O(mn)为初始化所需要的时间
827. Making A Large Island1)先遍历每个点,uion小岛。2)统计每个father的孩子数 3)遍历每个0,若邻居有岛, 加上节点数,用set去重,不同邻居有相同父节点
1258. Synonymous Sentences用UF建立每个单词的所有近义词,再用backtracking
2709. Greatest Common Divisor Traversal对每个数字找到其prime factor(PF),for i in rangen(n):seen[PF]存prime factor之前出现的数字index;若之前出现过,则union(seen[PF],i)
2092. Find All People With Secret建立meeting_at_same_time={t:[(x,y)]}.由小到大遍历t,每次将t所对应的所有[(x,y)]union起来,再逐个判断其是否与0相连,为了处理a,b开完会后a知道秘密的情况,每个t遍历结束时,将所有不与0相连的人的parents重新设为parents[x]=x\

number-of-connected-components-in-an-undirected-graph

Friend CircleThis problem can be solved both through DFS and Union-Find. When use DFS, use a visited node to keep track of node visited; When use Union-Find, initiate the group number to be the number of nodes, reduce the group number by one when doing union.

Graph Valid Tree Solved using almost the same algo.

longest-consecutive-sequence Union the index of the array, although can also be solved using a simple hashing trick by only searching from the beginning of the sequence.

most-stones-removed-with-same-row-or-column

take rownum and colnum are seperate elements, and each point connects its row and col.

Trie

适用情况:

  • 适用于查找前缀
  • 一个一个字母查找
  • 节省空间

Trie vs Hash

  • Trie 和 hash的时间复杂度都为O(n), 但空间复杂度上trie要明显优于hash.
Hash Trie
时间 O(len(word)) O(len(word))
空间 更多 更少

应用:

  1. 搜索
  2. 设计

模版

from collections import defaultdict

class TrieNode(object):

    def __init__(self, char, wordIdx):
        self.char = char ## 可有可无
        self.wordEnd= wordEnd
        self.kids = defaultdict(TrieNode) # 或者用TrieNode[256]存放<char, TrieNode>
        
class WordFilter(object):
    def __init__(self, words):
        self.root = TrieNode('', False)

    def insert(self, wordx):
        node = self.root
        for c in word:
            if c not in node.kids.keys():
                node.kids[c] = TrieNode(c, False)
            node = node.kids[c]
        node.wordEnd = True
                
    def find(self, word):
        node = self.root
        for c in word:
            if c not in node.kids.keys():
                return False
            node = node.kids[c]
        return node.wordEnd

    def startWith(self, word):
        node = self.root
        for c in word:
            if c not in node.kids.keys():
                return False
            node = node.kids[c]
        return True

Problems

1. 搜索

79. Word Searchbacktracking,O(N⋅3^L) where N is the number of cells in the board and L is the length of the word to be matched.
212. Word Search II因为需要查找多个单词,所以对于所有单词建立Trie,遍历grid中每一个点,若能match trieRoot children,则dfs
425. Word SquaresTrie+backtracking
720. Longest Word in Dictionary1)将所有单词加入trie,再bfs/dfs tire, O(mn) 2)排序+hash table, O(nlgn*m))
1233. Remove Sub-Folders from the Filesystem用trie, O(N*L);也可以用sort, 然后逐个比较每个path是否startWith last path\

2. 设计

208. Implement Trie (Prefix Tree)同上述模版
211. Design Add and Search Words Data Structure用trie实现;若遇见‘.', 对每个字符用recursion遍历
1166. Design File SystemTrieNode():self.kids={folder:TrieNode()}, self.val, TrieNode中存folder name, 而非char
588. Design In-Memory File System1)可以TrieNode{isFile:bool, child={folderName:TrieNode},content:str} 2)也可以直接用dict嵌套dict,ls最后一层判断是否为str,还是dict;方法1)更清晰
642. Design Search Autocomplete SystemTrieNode中存kids={c:TrieNode}和{sentence:cnt},返回前k个时, 用heap.nsmallest(得到cnt最多的sentences);AutocompleteSystem中,放temp str,当读到#时,将整个temp放入trie
1268. Search Suggestions System类似与642,用trie.curr表示trie的指针现在指哪里;TrieNode中存{kids:{}, 和所有有此前缀的[words], 提供top3()函数,返回给定前缀的最小3个words\

word-abbreviation

groups words by their greediest abbreviation, for group with more than one element, build a trie, and for each word within the group, search from root until the count of subtree is one. O(n) time and space.

word-squares

DFS + use trie to find possible words with given prefix.


Binary Index Tree

理论

Binary Index Tree use bit feature to store the sum up to index i.

  • 又名:Fenwick Tree 中文名:树状数组 简写:BIT 基于“前缀和”信息来实现—— Log(n) 修改任意位置值 Log(n) 查询任意区间和
getPrefixSum(16) = BIT[16]
getPrefixSum(15) = BIT[15] + BIT[14] + BIT[12] + BIT[8] 
getPrefixSum(14) = BIT[14] + BIT[12] + BIT[8] 
getPrefixSum(13) = BIT[13] + BIT[12] + BIT[8]
更改了A[1],受到影响的是 BIT[1], BIT[2], BIT[4], BIT[8], BIT[16] ... 
更改了A[5],受到影响的是 BIT[5], BIT[6], BIT[8], BIT[16] ... 
更改了A[9],受到影响的是 BIT[9], BIT[10], BIT[12], BIT[16] 

Binary Indexed Tree 事实上就是一个有部分区段累加和数组 把原先我们累加的方式从: for (int i = index; i >= 0; i = i - 1) sum += arr[i]; 改成了 for (int i = index+1; i >= 1; i = i - lowbit(i)) sum += bit[i];

模版

** BIT[0]不存信息,BIT[1] 存的是nums[0]的信息, 所以无论是update还是prefixSum, BIT index都从i+1开始。

    def _getLowBit(self, i):
        return i & (-i)
    
    def update(self, i, val):
        j = i + 1
        diff = val - self.nums[i]
        self.nums[i] = val
        while j < len(self.bit):
            self.bit[j] += diff
            j += self._getLowBit(j)

    def getPrefix(i):
        res = 0
        k = i + 1
        while k >= 1:
            res += self.bit[k]
            k -= self._getLowBit(k)
        return res      
  
    def sumRange(self, i, j):
        return getPrefix(j) - getPrefix(i - 1)

Problems

Range Sum Query - Mutable

Range Sum Query 2D - Mutable This is 2D vairants.

bit[i][j]记录第i行,构建的bit的第j index的指。 所以当计算从原点开始直至i,j所组成的长方形的和时, 分别将每行bit至index j的和相加。

count-of-smaller-numbers-after-self

self.bits[i] stores how many the count of items <= i, traverse from back, count the prefix sum of bit[:nums[i]], and then update val by 1.

Segment Tree

理论

  • Segment Tree store the sum with range [i, j].
  • 线段树(Segment Tree)特别万能, 基本不考,但是如果能掌握,可以一口气解决一大堆问题.
  • 线段树是基于分治法实现的,可以作为很好的分治法的练习

Problems

2407. Longest Increasing Subsequence II用dp会超时\