Protein Folding Loop Detection - codepath/compsci_guides GitHub Wiki

TIP102 Unit 6 Session 1 Advanced (Click for link to problem statements)

Problem Highlights

  • 💡 Difficulty: Medium
  • Time to complete: 20-30 mins
  • 🛠️ Topics: Linked Lists, Cycle Detection, Slow and Fast Pointer Technique

1: U-nderstand

Understand what the interviewer is asking for by using test cases and questions about the problem.

  • Q: What does the problem ask for?
    • A: The problem asks to detect if a cycle exists in a linked list and return an array of the values of the nodes involved in the cycle.
  • Q: What approach can be used?
    • A: The slow and fast pointer technique can be used to detect the cycle and then identify the nodes in the cycle.
HAPPY CASE
Input: protein_head = Node('Ala', Node('Gly', Node('Leu', Node('Val')))), with Val pointing back to Gly
Output: ['Gly', 'Leu', 'Val']
Explanation: The linked list has a cycle that includes the nodes 'Gly', 'Leu', and 'Val'.

EDGE CASE
Input: protein_head = None
Output: []
Explanation: An empty linked list has no cycle.

EDGE CASE
Input: protein_head = Node('Ala')
Output: []
Explanation: A single-node list with no cycle should return an empty list.

2: M-atch

Match what this problem looks like to known categories of problems, e.g. Linked List or Dynamic Programming, and strategies or patterns in those categories.

For Linked List problems involving Cycle Detection and Cycle Node Identification, we want to consider the following approaches:

  • Two Pointers (Slow and Fast Pointer Technique): Use two pointers to first detect the cycle and then identify the nodes involved in the cycle.

3: P-lan

Plan the solution with appropriate visualizations and pseudocode.

General Idea: We will use the slow and fast pointer technique to first detect if a cycle exists. Once a cycle is detected, we will determine the start of the cycle and then collect all the nodes involved in the cycle.

1) Initialize two pointers, `slow` and `fast`, both pointing to the `head` of the list.
2) Traverse the list:
    a) Move the `slow` pointer by one step.
    b) Move the `fast` pointer by two steps.
    c) If `slow` and `fast` pointers meet, a cycle is detected.
3) If a cycle is detected:
    a) Reset one pointer (`slow`) to the start of the list.
    b) Move both pointers one step at a time until they meet; this meeting point is the start of the cycle.
4) Traverse the cycle starting from the meeting point and collect the values of all `nodes` in the cycle.
5) Return the list of cycle `node` values.
6) If no cycle is detected, return an empty list.

⚠️ Common Mistakes

  • Failing to handle cases where the list is empty or contains only one node.
  • Incorrectly identifying or collecting the nodes involved in the cycle.

4: I-mplement

Implement the code to solve the algorithm.

class Node:
    def __init__(self, value, next=None):
        self.value = value
        self.next = next

def cycle_length(protein):
    if not protein:
        return []

    slow = protein
    fast = protein

    # Step 1: Detect the cycle using slow and fast pointers
    while fast and fast.next:
        slow = slow.next
        fast = fast.next.next

        if slow == fast:
            break
    else:
        return []  # No cycle detected

    # Step 2: Find the start of the cycle
    slow = protein
    while slow != fast:
        slow = slow.next
        fast = fast.next

    # Step 3: Collect all nodes in the cycle
    cycle_nodes = []
    start_of_cycle = slow
    while True:
        cycle_nodes.append(slow.value)
        slow = slow.next
        if slow == start_of_cycle:
            break

    return cycle_nodes

5: R-eview

Review the code by running specific example(s) and recording values (watchlist) of your code's variables along the way.

  • Example: Use the provided protein_head linked list to verify that the function correctly identifies and returns the nodes involved in the cycle.

6: E-valuate

Evaluate the performance of your algorithm and state any strong/weak or future potential work.

Assume N represents the number of nodes in the linked list.

  • Time Complexity: O(N) because each node is visited at most twice.
  • Space Complexity: O(1) for cycle detection, but the space to store the cycle nodes is O(K), where K is the length of the cycle.