Advanced Data Structures - CameronAuler/python-devops GitHub Wiki

Table of Contents

Collections Module

The collections module provides optimized alternatives to Python’s built-in data structures.

namedtuple (Immutable Lightweight Objects)

namedtuple is a subclass of tuple with named fields and is useful when working with structured data without creating a class. It is mainly used for replacing lightweight classes for read-only data.

from collections import namedtuple

# Define a namedtuple called 'Point' with fields 'x' and 'y'
Point = namedtuple("Point", ["x", "y"])

p = Point(3, 5)  # Creating an instance
print(p.x, p.y)  # Accessing values by name
# Output:
3 5

deque (Fast Append and Pop)

deque (double-ended queue) allows fast appends and pops from both ends (O(1) time complexity) and is more efficient than lists for queue-like behavior. It is mainly used for implementing queues, stacks, and sliding window algorithms efficiently.

NOTE: O(1) time complexity refers to the process speed of methods used at both ends of a data structure. Lists only offer fast appends and pops at the end since poppoing or appending at the beginning shifts the position of every other element.

from collections import deque

dq = deque([1, 2, 3])
dq.append(4)     # Add to the right
dq.appendleft(0) # Add to the left
dq.pop()         # Remove from the right
dq.popleft()     # Remove from the left

print(dq)
# Output:
deque([1, 2])

Counter (Counting Element in a Collection)

Counter is a dictionary subclass used for counting hashable objects. It is mainly used for word frequency analysis, histogram generation, and duplicate detection.

from collections import Counter

text = "banana"
counter = Counter(text)  # Counts occurrences of each letter
print(counter)
print(counter["a"])  # Access count of 'a'
# Output:
Counter({'a': 3, 'n': 2, 'b': 1})
3

OrderedDict (Dictionary That Remembers Order)

In Python 3.7+, regular dictionaries maintain order, but OrderedDict is still useful for compatibility with older versions and extra ordering features. It is mainly used for maintaining ordered key-value pairs where order matters.

from collections import OrderedDict

od = OrderedDict()
od["a"] = 1
od["b"] = 2
od["c"] = 3

print(od)  # Maintains insertion order
# Output:
OrderedDict([('a', 1), ('b', 2), ('c', 3)])

defaultdict (Dictionary with Default Values)

defaultdict returns a default value when accessing a non-existent key instead of raising a KeyError. It is mainly used for avoiding manual key existence checks in counting and grouping operations.

from collections import defaultdict

dd = defaultdict(int)  # Default value of int() is 0
dd["a"] += 1  # No KeyError!

print(dd["a"])  # 1
print(dd["b"])  # 0 (default value)
# Output:
1
0

Heap Queue (heapq Module)

heapq provides an efficient min-heap (priority queue) implementation. The smallest element is always at index 0.

Pushing and Popping from a Heap

Mainly used for priority queues, scheduling tasks, and graph algorithms (e.g., Dijkstra's Algorithm).

NOTE: A heap is a specialized tree-based data structure that satisfies the heap property. In a min-heap, for any given node, the value of the node is less than or equal to the values of its children, ensuring the smallest element is always at the root.

import heapq

nums = [3, 1, 4, 1, 5]
heapq.heapify(nums)  # Convert list into a heap

heapq.heappush(nums, 2)  # Push new element
smallest = heapq.heappop(nums)  # Remove smallest element

print(smallest)  # 1
print(nums)  # Remaining heap structure
# Output:
1
[1, 2, 4, 3, 5]

Retrieving the nth Smallest or Largest Elements

Mainly used for finding top n elements efficiently.

import heapq

nums = [3, 1, 4, 1, 5, 9, 2, 6]
print(heapq.nsmallest(3, nums))  # Get 3 smallest elements
print(heapq.nlargest(3, nums))  # Get 3 largest elements
# Output:
[1, 1, 2]
[9, 6, 5]

Binary Search (bisect Module)

The bisect module is used for efficiently searching and inserting in sorted lists (O(log n) complexity).

Using bisect_left & bisect_right

bisect_left(arr, x) finds the leftmost index where x can be inserted, while bisect_right(arr, x) finds the rightmost index where x can be inserted. They are mainly used for efficiently finding insertion points in sorted data.

import bisect

nums = [1, 3, 3, 7]
index1 = bisect.bisect_left(nums, 3)  # Finds first position of 3
index2 = bisect.bisect_right(nums, 3)  # Finds last position of 3

print(index1, index2)
# Output:
1 3

Inserting While Maintaining Order

insort_left(arr, x) inserts x at the correct position while keeping the list sorted. It is mainly used for maintaining sorted lists while inserting dynamically.

import bisect

nums = [1, 3, 7]
bisect.insort_left(nums, 5)  # Insert 5 while maintaining order
print(nums)
# Output:
[1, 3, 5, 7]