Duplicate Sections of the Hotel - codepath/compsci_guides GitHub Wiki

TIP102 Unit 9 Session 2 Advanced (Click for link to problem statements)

Problem Highlights

💡 Difficulty: Hard
⏰ Time to complete: 30-35 mins
🛠️ Topics: Trees, Subtree Identification, Hashing

1: U-nderstand

Understand what the interviewer is asking for by using test cases and questions about the problem.

Established a set (2-3) of test cases to verify their own solution later.

Established a set (1-2) of edge cases to verify their solution handles complexities.

Have fully understood the problem and have no clarifying questions.

Have you verified any Time/Space Constraints for this problem?

What is the structure of the tree?
- The tree is a binary tree where each node represents a room in the hotel.
What operation needs to be performed?
- The function needs to identify and return the roots of all duplicate subtrees in the tree.
What should be returned?
- The function should return a list of TreeNode objects that are the roots of duplicate subtrees.

HAPPY CASE
Input: 
    rooms = ["Lobby", 101, 123, 201, None, 101, 201, None, None, 201]
Output: 
    Subtree 1   Subtree 2
       101         201
       /
     201
Explanation: 
    There are two duplicate subtrees in the given tree. 
    The first duplicate subtree is rooted at node 101 with a left child 201.
    The second duplicate subtree is rooted at node 201.

EDGE CASE
Input: 
    rooms = ["Lobby", None, None]
Output: 
    []
Explanation: 
    There are no duplicate subtrees in the given tree as the tree only has one node.

2: M-atch

Match what this problem looks like to known categories of problems, e.g., Linked List or Dynamic Programming, and strategies or patterns in those categories.

For Subtree Identification problems, we want to consider the following approaches:

Hash Maps: Use a hash map to store serialized forms of subtrees, allowing for quick identification of duplicate subtrees.
Tree Serialization: Serialize each subtree and use the serialized string as a key in the hash map to track occurrences.

3: P-lan

Plan the solution with appropriate visualizations and pseudocode.

General Idea:

Use a postorder traversal to serialize each subtree.
Store the serialized form of each subtree in a hash map.
If a serialized subtree appears more than once, it is a duplicate, and its root should be added to the result list.

1) Define a helper function `get_id(node)`:
    - If `node` is `None`, return 0 (identifier for null nodes).
    - Serialize the current subtree by creating a tuple `key` from the node's value and the identifiers of its left and right subtrees.
    - Check if `key` is already in the hash map:
        - If it is, increment its count.
        - If the count reaches 2, add the node to the duplicates list.
    - If `key` is not in the hash map, assign it a new identifier and add it to the map.
    - Return the identifier for the current subtree.

2) Initialize an empty dictionary `tree_map` to store subtree identifiers and their counts.
3) Initialize an empty list `duplicates` to store the roots of duplicate subtrees.
4) Call `get_id` on the root node to start the traversal.
5) Return the `duplicates` list.

⚠️ Common Mistakes

Forgetting to properly serialize the subtree structure, leading to incorrect identification of duplicates.
Not handling the case where the tree has no duplicates.

4: I-mplement

Implement the code to solve the algorithm.

class TreeNode:
    def __init__(self, value, left=None, right=None):
        self.val = value
        self.left = left
        self.right = right

def find_duplicate_subtrees(hotel):
    def get_id(node):
        if not node:
            return 0  # Use 0 as the identifier for null nodes
        
        # Create a unique key for the current subtree
        key = (node.val, get_id(node.left), get_id(node.right))
        
        if key in tree_map:
            tree_id, count = tree_map[key]
            tree_map[key] = (tree_id, count + 1)
        else:
            tree_id = len(tree_map) + 1
            tree_map[key] = (tree_id, 1)
        
        # If this subtree has appeared twice, add it to duplicates
        if tree_map[key][1] == 2:
            duplicates.append(node)
        
        return tree_map[key][0]
    
    tree_map = {}
    duplicates = []
    get_id(hotel)
    
    return duplicates
    
# Example Usage:
rooms = ["Lobby", 101, 123, 201, None, 101, 201, None, None, 201]
hotel = build_tree(rooms)
subtree_lst = find_duplicate_subtrees(hotel)
for subtree in subtree_lst:
    print_tree(subtree)

5: R-eview

Review the code by running specific example(s) and recording values (watchlist) of your code's variables along the way.

- Example 1:
    - Input: 
        `rooms = ["Lobby", 101, 123, 201, None, 101, 201, None, None, 201]`
    - Execution: 
        - Traverse the tree in postorder and serialize each subtree.
        - Use the serialized form as a key in the hash map to track duplicates.
    - Output: 
        [Subtree 1: 101 -> 201, Subtree 2: 201]

- Example 2:
    - Input: 
        `rooms = ["Lobby", None, None]`
    - Execution: 
        - Traverse the single node tree.
        - No duplicates found.
    - Output: 
        []

6: E-valuate

Evaluate the performance of your algorithm and state any strong/weak or future potential work.

Time Complexity:

Time Complexity: O(N) where N is the number of nodes in the tree.
- Explanation: Each node is visited once during the postorder traversal, and each subtree is serialized and stored in constant time.

Space Complexity:

Space Complexity: O(N) where N is the number of nodes in the tree.
- Explanation: The hash map stores identifiers for each unique subtree, and in the worst case, all nodes have unique subtrees.