9.4 Union Find, Trie, Segment Tree, Index Tree - swchen1234/Algorithm GitHub Wiki
Union Find
应用
- 原生操作:
- 判断两个元素是否在同一群体
- 合并多个群体
- 派生操作:
- 返回当前点所在群体的元素个数
添加一个size array, 合并时只需更新parent size, 因为只关心parent size
- 返回当前共有多少个群体
添加一个count变量, 合并时只需count--
模版
def countComponents(self, n, edges):
"""
self.parents = [i for i in xrange(n)]
# the height of the tree, always connect low height tree to high height tree, in order to keep the tree height low
self.ranks = [1 for i in xrange(n)]
self.count = n
for edge in edges:
self.union(edge[0], edge[1])
return self.count
def union(self, a, b):
roota = self.find(a)
rootb = self.find(b)
if roota != rootb :
if ranks[a] < ranks[b]:
a, b = b, a
self.parents[rootb] = roota
if ranks[a] == ranks[b]:
ranks[a] += 1
self.count -= 1
def find(self, a):
if self.parents[a] == a:
return a
self.parents[a] = self.find(self.parents[a]) # path compression, s.t. future find takes close to O(constant time)
return self.parents[a]
Problems
200. Number of Islands用bfs解更容易
305. Number of Islands II每次一块区域变成小岛,若仍用bfs,则O(mnk);而若用uf, 则O(mn+4k),其中O(mn)为初始化所需要的时间
827. Making A Large Island1)先遍历每个点,uion小岛。2)统计每个father的孩子数 3)遍历每个0,若邻居有岛, 加上节点数,用set去重,不同邻居有相同父节点
1258. Synonymous Sentences用UF建立每个单词的所有近义词,再用backtracking
2709. Greatest Common Divisor Traversal对每个数字找到其prime factor(PF),for i in rangen(n):seen[PF]存prime factor之前出现的数字index;若之前出现过,则union(seen[PF],i)
2092. Find All People With Secret建立meeting_at_same_time={t:[(x,y)]}.由小到大遍历t,每次将t所对应的所有[(x,y)]union起来,再逐个判断其是否与0相连,为了处理a,b开完会后a知道秘密的情况,每个t遍历结束时,将所有不与0相连的人的parents重新设为parents[x]=x
\
number-of-connected-components-in-an-undirected-graph
Friend CircleThis problem can be solved both through DFS and Union-Find. When use DFS, use a visited node to keep track of node visited; When use Union-Find, initiate the group number to be the number of nodes, reduce the group number by one when doing union.
Graph Valid Tree Solved using almost the same algo.
longest-consecutive-sequence
Union the index of the array, although can also be solved using a simple hashing trick by only searching from the beginning of the sequence.
most-stones-removed-with-same-row-or-column
take rownum and colnum are seperate elements, and each point connects its row and col.
Trie
适用情况:
- 适用于查找前缀
- 一个一个字母查找
- 节省空间
Trie vs Hash
- Trie 和 hash的时间复杂度都为O(n), 但空间复杂度上trie要明显优于hash.
Hash | Trie | |
---|---|---|
时间 | O(len(word)) | O(len(word)) |
空间 | 更多 | 更少 |
应用:
- 搜索
- 设计
模版
from collections import defaultdict
class TrieNode(object):
def __init__(self, char, wordIdx):
self.char = char ## 可有可无
self.wordEnd= wordEnd
self.kids = defaultdict(TrieNode) # 或者用TrieNode[256]存放<char, TrieNode>
class WordFilter(object):
def __init__(self, words):
self.root = TrieNode('', False)
def insert(self, wordx):
node = self.root
for c in word:
if c not in node.kids.keys():
node.kids[c] = TrieNode(c, False)
node = node.kids[c]
node.wordEnd = True
def find(self, word):
node = self.root
for c in word:
if c not in node.kids.keys():
return False
node = node.kids[c]
return node.wordEnd
def startWith(self, word):
node = self.root
for c in word:
if c not in node.kids.keys():
return False
node = node.kids[c]
return True
Problems
1. 搜索
79. Word Searchbacktracking,O(N⋅3^L) where N is the number of cells in the board and L is the length of the word to be matched.
212. Word Search II因为需要查找多个单词,所以对于所有单词建立Trie,遍历grid中每一个点,若能match trieRoot children,则dfs
425. Word SquaresTrie+backtracking
720. Longest Word in Dictionary1)将所有单词加入trie,再bfs/dfs tire, O(mn) 2)排序+hash table, O(nlgn*m))
1233. Remove Sub-Folders from the Filesystem用trie, O(N*L);也可以用sort, 然后逐个比较每个path是否startWith last path
\
2. 设计
208. Implement Trie (Prefix Tree)同上述模版
211. Design Add and Search Words Data Structure用trie实现;若遇见‘.', 对每个字符用recursion遍历
1166. Design File SystemTrieNode():self.kids={folder:TrieNode()}, self.val, TrieNode中存folder name, 而非char
588. Design In-Memory File System1)可以TrieNode{isFile:bool, child={folderName:TrieNode},content:str} 2)也可以直接用dict嵌套dict,ls最后一层判断是否为str,还是dict;方法1)更清晰
642. Design Search Autocomplete SystemTrieNode中存kids={c:TrieNode}和{sentence:cnt},返回前k个时, 用heap.nsmallest(得到cnt最多的sentences);AutocompleteSystem中,放temp str,当读到#时,将整个temp放入trie
1268. Search Suggestions System类似与642,用trie.curr表示trie的指针现在指哪里;TrieNode中存{kids:{}, 和所有有此前缀的[words], 提供top3()函数,返回给定前缀的最小3个words
\
groups words by their greediest abbreviation, for group with more than one element, build a trie, and for each word within the group, search from root until the count of subtree is one. O(n) time and space.
DFS + use trie to find possible words with given prefix.
Binary Index Tree
理论
Binary Index Tree use bit feature to store the sum up to index i.
- 又名:Fenwick Tree 中文名:树状数组 简写:BIT 基于“前缀和”信息来实现—— Log(n) 修改任意位置值 Log(n) 查询任意区间和
getPrefixSum(16) = BIT[16]
getPrefixSum(15) = BIT[15] + BIT[14] + BIT[12] + BIT[8]
getPrefixSum(14) = BIT[14] + BIT[12] + BIT[8]
getPrefixSum(13) = BIT[13] + BIT[12] + BIT[8]
更改了A[1],受到影响的是 BIT[1], BIT[2], BIT[4], BIT[8], BIT[16] ...
更改了A[5],受到影响的是 BIT[5], BIT[6], BIT[8], BIT[16] ...
更改了A[9],受到影响的是 BIT[9], BIT[10], BIT[12], BIT[16]
Binary Indexed Tree 事实上就是一个有部分区段累加和数组 把原先我们累加的方式从: for (int i = index; i >= 0; i = i - 1) sum += arr[i]; 改成了 for (int i = index+1; i >= 1; i = i - lowbit(i)) sum += bit[i];
模版
** BIT[0]不存信息,BIT[1] 存的是nums[0]的信息, 所以无论是update还是prefixSum, BIT index都从i+1开始。
def _getLowBit(self, i):
return i & (-i)
def update(self, i, val):
j = i + 1
diff = val - self.nums[i]
self.nums[i] = val
while j < len(self.bit):
self.bit[j] += diff
j += self._getLowBit(j)
def getPrefix(i):
res = 0
k = i + 1
while k >= 1:
res += self.bit[k]
k -= self._getLowBit(k)
return res
def sumRange(self, i, j):
return getPrefix(j) - getPrefix(i - 1)
Problems
Range Sum Query 2D - Mutable This is 2D vairants.
bit[i][j]记录第i行,构建的bit的第j index的指。 所以当计算从原点开始直至i,j所组成的长方形的和时, 分别将每行bit至index j的和相加。
count-of-smaller-numbers-after-self
self.bits[i] stores how many the count of items <= i, traverse from back, count the prefix sum of bit[:nums[i]], and then update val by 1.
Segment Tree
理论
- Segment Tree store the sum with range [i, j].
- 线段树(Segment Tree)特别万能, 基本不考,但是如果能掌握,可以一口气解决一大堆问题.
- 线段树是基于分治法实现的,可以作为很好的分治法的练习