9.4 Union Find, Trie, Segment Tree, Index Tree - swchen1234/Algorithm GitHub Wiki

Union Find

应用

原生操作：
- 判断两个元素是否在同一群体
- 合并多个群体
派生操作：
- 返回当前点所在群体的元素个数
添加一个size array，合并时只需更新parent size, 因为只关心parent size
- 返回当前共有多少个群体
添加一个count变量，合并时只需count--

模版

    def countComponents(self, n, edges):
      """
        self.parents = [i for i in xrange(n)]
        # the height of the tree, always connect low height tree to high height tree, in order to keep the tree height low
        self.ranks = [1 for i in xrange(n)] 
        self.count = n

        for edge in edges:
            self.union(edge[0], edge[1])
        return self.count               
        
    def union(self, a, b):
        roota = self.find(a)
        rootb = self.find(b)
        if roota != rootb :
            if ranks[a] < ranks[b]:
                a, b = b, a
            self.parents[rootb] = roota
            if ranks[a] == ranks[b]:
               ranks[a] += 1
            self.count -= 1
        
    def find(self, a):
        if self.parents[a] == a:
            return a
        self.parents[a] = self.find(self.parents[a]) # path compression, s.t. future find takes close to O(constant time)
        return self.parents[a]

Problems

200. Number of Islands用bfs解更容易
305. Number of Islands II每次一块区域变成小岛，若仍用bfs，则O（mnk)；而若用uf, 则O（mn+4k)，其中O（mn)为初始化所需要的时间
827. Making A Large Island1)先遍历每个点，uion小岛。2）统计每个father的孩子数 3）遍历每个0,若邻居有岛，加上节点数，用set去重，不同邻居有相同父节点
1258. Synonymous Sentences用UF建立每个单词的所有近义词，再用backtracking
2709. Greatest Common Divisor Traversal对每个数字找到其prime factor(PF)，for i in rangen(n):seen[PF]存prime factor之前出现的数字index;若之前出现过，则union(seen[PF],i)
2092. Find All People With Secret建立meeting_at_same_time={t:[(x,y)]}.由小到大遍历t，每次将t所对应的所有[(x,y)]union起来，再逐个判断其是否与0相连，为了处理a,b开完会后a知道秘密的情况，每个t遍历结束时，将所有不与0相连的人的parents重新设为parents[x]=x\

number-of-connected-components-in-an-undirected-graph

Friend CircleThis problem can be solved both through DFS and Union-Find. When use DFS, use a visited node to keep track of node visited; When use Union-Find, initiate the group number to be the number of nodes, reduce the group number by one when doing union.

Graph Valid Tree Solved using almost the same algo.

longest-consecutive-sequence Union the index of the array, although can also be solved using a simple hashing trick by only searching from the beginning of the sequence.

most-stones-removed-with-same-row-or-column

take rownum and colnum are seperate elements, and each point connects its row and col.

Trie

适用情况：

适用于查找前缀
一个一个字母查找
节省空间

Trie vs Hash

Trie 和 hash的时间复杂度都为O(n), 但空间复杂度上trie要明显优于hash.

	Hash	Trie
时间	O(len(word))	O(len(word))
空间	更多	更少

应用：

搜索
设计

模版

from collections import defaultdict

class TrieNode(object):

    def __init__(self, char, wordIdx):
        self.char = char ## 可有可无
        self.wordEnd= wordEnd
        self.kids = defaultdict(TrieNode) # 或者用TrieNode[256]存放<char, TrieNode>
        
class WordFilter(object):
    def __init__(self, words):
        self.root = TrieNode('', False)

    def insert(self, wordx):
        node = self.root
        for c in word:
            if c not in node.kids.keys():
                node.kids[c] = TrieNode(c, False)
            node = node.kids[c]
        node.wordEnd = True
                
    def find(self, word):
        node = self.root
        for c in word:
            if c not in node.kids.keys():
                return False
            node = node.kids[c]
        return node.wordEnd

    def startWith(self, word):
        node = self.root
        for c in word:
            if c not in node.kids.keys():
                return False
            node = node.kids[c]
        return True

Problems

1. 搜索

79. Word Searchbacktracking，O(N⋅3^L) where N is the number of cells in the board and L is the length of the word to be matched.
212. Word Search II因为需要查找多个单词，所以对于所有单词建立Trie，遍历grid中每一个点，若能match trieRoot children，则dfs
425. Word SquaresTrie+backtracking
720. Longest Word in Dictionary1）将所有单词加入trie，再bfs/dfs tire， O(mn) 2)排序+hash table, O(nlgn*m))
1233. Remove Sub-Folders from the Filesystem用trie， O（N*L)；也可以用sort，然后逐个比较每个path是否startWith last path\

2. 设计

208. Implement Trie (Prefix Tree)同上述模版
211. Design Add and Search Words Data Structure用trie实现；若遇见‘.'，对每个字符用recursion遍历
1166. Design File SystemTrieNode():self.kids={folder:TrieNode()}, self.val, TrieNode中存folder name，而非char
588. Design In-Memory File System1）可以TrieNode{isFile:bool, child={folderName:TrieNode},content:str} 2）也可以直接用dict嵌套dict，ls最后一层判断是否为str，还是dict；方法1）更清晰
642. Design Search Autocomplete SystemTrieNode中存kids={c:TrieNode}和{sentence:cnt},返回前k个时，用heap.nsmallest(得到cnt最多的sentences);AutocompleteSystem中,放temp str,当读到#时，将整个temp放入trie
1268. Search Suggestions System类似与642,用trie.curr表示trie的指针现在指哪里；TrieNode中存{kids:{}, 和所有有此前缀的[words], 提供top3()函数，返回给定前缀的最小3个words\

word-abbreviation

groups words by their greediest abbreviation, for group with more than one element, build a trie, and for each word within the group, search from root until the count of subtree is one. O(n) time and space.

word-squares

DFS + use trie to find possible words with given prefix.

Binary Index Tree

理论

Binary Index Tree use bit feature to store the sum up to index i.

又名：Fenwick Tree 中文名：树状数组简写：BIT 基于“前缀和”信息来实现—— Log(n) 修改任意位置值 Log(n) 查询任意区间和

getPrefixSum(16) = BIT[16]
getPrefixSum(15) = BIT[15] + BIT[14] + BIT[12] + BIT[8] 
getPrefixSum(14) = BIT[14] + BIT[12] + BIT[8] 
getPrefixSum(13) = BIT[13] + BIT[12] + BIT[8]
更改了A[1]，受到影响的是 BIT[1], BIT[2], BIT[4], BIT[8], BIT[16] ... 
更改了A[5]，受到影响的是 BIT[5], BIT[6], BIT[8], BIT[16] ... 
更改了A[9]，受到影响的是 BIT[9], BIT[10], BIT[12], BIT[16]

Binary Indexed Tree 事实上就是一个有部分区段累加和数组把原先我们累加的方式从: for (int i = index; i >= 0; i = i - 1) sum += arr[i]; 改成了 for (int i = index+1; i >= 1; i = i - lowbit(i)) sum += bit[i];

模版

** BIT[0]不存信息，BIT[1] 存的是nums[0]的信息，所以无论是update还是prefixSum, BIT index都从i+1开始。

    def _getLowBit(self, i):
        return i & (-i)
    
    def update(self, i, val):
        j = i + 1
        diff = val - self.nums[i]
        self.nums[i] = val
        while j < len(self.bit):
            self.bit[j] += diff
            j += self._getLowBit(j)

    def getPrefix(i):
        res = 0
        k = i + 1
        while k >= 1:
            res += self.bit[k]
            k -= self._getLowBit(k)
        return res      
  
    def sumRange(self, i, j):
        return getPrefix(j) - getPrefix(i - 1)

Problems

Range Sum Query - Mutable

Range Sum Query 2D - Mutable This is 2D vairants.

bit[i][j]记录第i行，构建的bit的第j index的指。所以当计算从原点开始直至i，j所组成的长方形的和时，分别将每行bit至index j的和相加。

count-of-smaller-numbers-after-self

self.bits[i] stores how many the count of items <= i, traverse from back, count the prefix sum of bit[:nums[i]], and then update val by 1.

Segment Tree

理论

Segment Tree store the sum with range [i, j].
线段树（Segment Tree）特别万能, 基本不考，但是如果能掌握，可以一口气解决一大堆问题.
线段树是基于分治法实现的，可以作为很好的分治法的练习

Problems

2407. Longest Increasing Subsequence II用dp会超时\