Python Notes - LogeshVel/learning_resources GitHub Wiki

Enum

Enums are the safe way of defining the fixed set of constant values.

In Python, an enum (short for "enumeration") is a class that defines a set of symbolic names bound to unique, constant values. Enums are used to represent fixed sets of related constants in a readable and maintainable way.

Enums are defined using the Enum class from the enum module, introduced in Python 3.4.

Why Use Enums?

  • Readability: Replace magic numbers or strings with meaningful names.
  • Immutability: Enum members are constant and cannot be changed.
  • Type Safety: Prevents invalid values from being used.
  • Namespace Control: Groups related constants together.

Syntax To create an enum, subclass the Enum class:

from enum import Enum

class Color(Enum):
    RED = 1
    GREEN = 2
    BLUE = 3

Key Features

  1. Accessing Enum Members Enum members can be accessed by name or value:

    print(Color.RED)       # Output: Color.RED
    print(Color.RED.name)  # Output: RED
    print(Color.RED.value) # Output: 1
  2. Iteration You can iterate through all members of an enum:

    for color in Color:
        print(color)
    # Output:
    # Color.RED
    # Color.GREEN
    # Color.BLUE
  3. Comparisons Enum members are unique and can be compared:

    if Color.RED == Color.RED:  # True
        print("Same color")
    if Color.RED != Color.GREEN:  # True
        print("Different colors")
  4. Restricting Arbitrary Values Enums prevent assigning invalid values:

    try:
        invalid_color = Color(4)  # Raises ValueError
    except ValueError as e:
        print(e)  # Output: 4 is not a valid Color
  5. Some more

    red = Color.RED
    green = Color(2)
    
    print(f'{red=}')        # red=<Color.RED: 1>
    print(f'{red.name=}')   # red.name='RED'
    print(f'{red.value=}')  # red.value=1
    
    print(f'{green=}')       # green=<Color.GREEN: 2>
    print(f'{green.name=}')  # green.name='GREEN'
    print(f'{green.value=}') # green.value=2

Example: Days of the Week

from enum import Enum

class Weekday(Enum):
    MONDAY = 1
    TUESDAY = 2
    WEDNESDAY = 3
    THURSDAY = 4
    FRIDAY = 5
    SATURDAY = 6
    SUNDAY = 7

# Accessing members
print(Weekday.MONDAY)       # Output: Weekday.MONDAY
print(Weekday.MONDAY.value) # Output: 1

# Iteration
for day in Weekday:
    print(day)

Advanced Features

  1. Auto Value Generation Use auto() to automatically assign values:

    from enum import Enum, auto
    
    class Status(Enum):
        NEW = auto()
        IN_PROGRESS = auto()
        COMPLETED = auto()
    
    print(Status.NEW.value)  # Output: 1
    print(Status.IN_PROGRESS.value)  # Output: 2
  2. Custom Methods Enums can have custom methods:

    class Shape(Enum):
        CIRCLE = 1
        SQUARE = 2
    
        def describe(self):
            return f"This shape is a {self.name.lower()}."
    
    print(Shape.CIRCLE.describe())  # Output: This shape is a circle.
  3. Multiple Enums Different enums are independent:

    class Color(Enum):
        RED = 1
    
    class Size(Enum):
        SMALL = 1
    
    print(Color.RED == Size.SMALL)  # Output: False
  4. Iterating by Value Convert enum to a list for iteration or value-based access:

    colors = list(Color)
    print(colors)  # Output: [<Color.RED: 1>, <Color.GREEN: 2>, <Color.BLUE: 3>]

Practical Use Cases

  1. State Machines:

    class TrafficLight(Enum):
        RED = 1
        YELLOW = 2
        GREEN = 3
  2. Configuration Constants:

    class LogLevel(Enum):
        DEBUG = 1
        INFO = 2
        WARNING = 3
        ERROR = 4
  3. API Response Status:

    class ResponseStatus(Enum):
        SUCCESS = 200
        NOT_FOUND = 404
        SERVER_ERROR = 500

Limitations

  • Enum members are immutable once defined.
  • Enums cannot have duplicate names, but duplicate values are allowed with a special decorator (@unique).

Enums make code cleaner and less error-prone when dealing with fixed sets of values.

In Python, enums can have duplicate values, but duplicate names are not allowed. However, we can enforce unique values using the @unique decorator from the enum module.


Example 1: Duplicate Names Are Not Allowed If you define two members with the same name in an enum, Python will raise a SyntaxError.

from enum import Enum

class Color(Enum):
    RED = 1
    GREEN = 2
    RED = 3  # Duplicate name

# This will raise an error:
# SyntaxError: duplicate member 'RED' in <enum 'Color'>

Explanation:

  • Enum member names (like RED) must be unique. The enum cannot have two members with the same name, as this would cause ambiguity.

Example 2: Duplicate Values Are Allowed By default, enums allow different members to share the same value. When this happens, only the first member with that value is considered the "canonical" name, and others become aliases.

from enum import Enum

class Color(Enum):
    RED = 1
    GREEN = 2
    BLUE = 1  # Duplicate value, alias for RED

# Accessing members
print(Color.RED)    # Output: Color.RED
print(Color.BLUE)   # Output: Color.RED (alias)
print(Color.BLUE is Color.RED)  # Output: True (aliases point to the same member)

# Iterating members
for color in Color:
    print(color)
# Output:
# Color.RED
# Color.GREEN

Explanation:

  • Color.BLUE is an alias for Color.RED because they share the same value (1).
  • When iterating over Color, only the first occurrence (Color.RED) is included.

Enforcing Unique Values with @unique To prevent duplicate values in an enum, use the @unique decorator from the enum module. This decorator raises a ValueError if duplicate values are found.

from enum import Enum, unique

@unique
class Color(Enum):
    RED = 1
    GREEN = 2
    BLUE = 1  # Duplicate value is not allowed

# This will raise an error:
# ValueError: duplicate values found in <enum 'Color'>: BLUE -> RED

Explanation:

  • The @unique decorator ensures that no two members of the enum have the same value.
  • If duplicate values are present, a ValueError is raised at runtime.

When to Allow Duplicate Values Allowing duplicate values (aliases) is useful when you want multiple names for the same constant for backward compatibility or clarity.

Example with Aliases:

from enum import Enum

class HttpStatus(Enum):
    OK = 200
    SUCCESS = 200  # Alias for OK
    NOT_FOUND = 404

print(HttpStatus.OK)       # Output: HttpStatus.OK
print(HttpStatus.SUCCESS)  # Output: HttpStatus.OK
print(HttpStatus.SUCCESS is HttpStatus.OK)  # Output: True

Here:

  • SUCCESS is an alias for OK, and both point to the value 200.
  • This can be helpful when maintaining compatibility with older code that uses SUCCESS but allows the use of OK in new code.

Summary

  • Duplicate Names: Not allowed; causes a SyntaxError.
  • Duplicate Values: Allowed by default; creates aliases.
  • Enforcing Unique Values: Use @unique to ensure no duplicate values exist in the enum.

The auto() function in Python's enum module is used to automatically assign values to members of an enumeration. While auto() by default assigns sequential integers starting from 1, you can customize its behavior by defining a custom Enum class with a modified _generate_next_value_ method.


Default Behavior of auto()

When used without customization, auto() generates values as follows:

  • The first member gets the value 1.
  • Subsequent members are assigned incrementing integer values (2, 3, and so on).

Example:

from enum import Enum, auto

class Color(Enum):
    RED = auto()
    GREEN = auto()
    BLUE = auto()

print(Color.RED.value)    # Output: 1
print(Color.GREEN.value)  # Output: 2
print(Color.BLUE.value)   # Output: 3

How It Works Internally

When auto() is used, the Enum class calls the _generate_next_value_ method to compute the value for each member. The default implementation generates an incrementing integer starting from 1.

Default _generate_next_value_:

def _generate_next_value_(name, start, count, last_values):
    return count + start
  • name: Name of the enum member (e.g., RED).
  • start: The starting value (default is 1).
  • count: The number of members defined before this one (0 for the first, 1 for the second, etc.).
  • last_values: The list of previously assigned values.

For the first member, count = 0 and start = 1, so 1 + 0 = 1.


Customizing auto() Behavior

You can override _generate_next_value_ in your custom enum class to generate values other than sequential integers. This allows more flexibility, such as generating strings, tuples, or custom sequences.

Example 1: Custom Sequential Values Generate multiples of 10:

from enum import Enum, auto

class CustomEnum(Enum):
    def _generate_next_value_(name, start, count, last_values):
        return (count + 1) * 10

class Scale(CustomEnum):
    LOW = auto()
    MEDIUM = auto()
    HIGH = auto()

print(Scale.LOW.value)     # Output: 10
print(Scale.MEDIUM.value)  # Output: 20
print(Scale.HIGH.value)    # Output: 30

Example 2: Use Enum Member Names as Values Make auto() assign the enum member name as its value (as a string):

from enum import Enum, auto

class StringEnum(Enum):
    def _generate_next_value_(name, start, count, last_values):
        return name

class Animal(StringEnum):
    DOG = auto()
    CAT = auto()
    BIRD = auto()

print(Animal.DOG.value)   # Output: DOG
print(Animal.CAT.value)   # Output: CAT
print(Animal.BIRD.value)  # Output: BIRD

Example 3: Generate Tuples Assign tuples with a counter and the name:

from enum import Enum, auto

class TupleEnum(Enum):
    def _generate_next_value_(name, start, count, last_values):
        return (count + 1, name)

class Status(TupleEnum):
    NEW = auto()
    IN_PROGRESS = auto()
    COMPLETED = auto()

print(Status.NEW.value)          # Output: (1, 'NEW')
print(Status.IN_PROGRESS.value)  # Output: (2, 'IN_PROGRESS')
print(Status.COMPLETED.value)    # Output: (3, 'COMPLETED')

Summary of auto():

  1. Default Behavior: Sequential integers starting from 1.
  2. Custom Behavior: You can override _generate_next_value_ to define custom value generation logic.
  3. Common Use Cases:
    • Simple sequential numbering.
    • Using member names as values.
    • Generating custom data structures (e.g., strings, tuples, or calculated values).

To compare numbers

isclose()

With the absolute tolerence we can change the +- value and compare how close the numbers are

abs_tol is value based

image

With the relative tolerance we say that this much percentage tolernce is allowed above that its False.

rel_tol is percentage based image

image

Don't loop and modify

image

image

The looping is based on the index so if any modification happens to the list while looping it will affect the loops.

Private or Name mangling

Resource link

Python Class - Doc

Variable named with __ double underscore in the beginning is only accessbile with that name inside that class only. Not even the subclass or the class object can access them. The only other it can be accessed it via the manled name.

_ClassName__varname

The reason for using the double underscore

  • To make that variable only accessible inside the class alone
  • To avoid the name clashes while doing the multiple inheritances

image

image

Single and Double underscore

Single

  • To name it as protected.
  • Accessible inside the class, sub class, using objects and class
  • Just the hint to dev that its for internal purpose but not enforce.
  • import * won't import the var or func or class starts with _.

Double

  • Double underscore is for private
  • Accessible directly inside the class
  • Name mangling to prevent clash.
  • Can be accessed using the mangled name

import *

When you use the import * syntax, Python looks for an all list in the module. This list specifies the names of objects that should be imported when the * is used. If all is not defined, Python will import everything that doesn't start with an underscore (_), but will skip items that begin with a single underscore.

Here's a breakdown: Without all defined:

Python will import all variables, functions, or classes that do not start with an underscore. It will skip variables or methods that start with a single underscore (e.g., _internal_var, _helper_function, etc.). With all defined:

Python will import only the names explicitly listed in the all list, regardless of whether they start with an underscore or not.

eval()

The eval() function in Python evaluates a string as a Python expression and returns the result of that evaluation.

(function) def eval(
    source: str | ReadableBuffer | CodeType,
    globals: dict[str, Any] | None = None,
    locals: Mapping[str, object] | None = None,
    /
) -> Any

Evaluate the given source in the context of globals and locals.

  • The source may be a string representing a Python expression or a code object as returned by compile().
  • The globals must be a dictionary and locals can be any mapping, defaulting to the current globals and locals.
  • If only globals is given, locals defaults to it.
x = 10
result = eval("x + 5")  # result will be 15

Why Do We Need eval()?

eval() can be useful in situations where:

  • Dynamic Evaluation: You need to execute Python code that's constructed as a string at runtime. Example: In interactive interpreters or applications where users can provide input that needs to be evaluated.
  • Flexible Logic: It allows dynamic expression parsing and computation without hardcoding all possible cases. Example: Custom formula calculations or user-defined mathematical expressions in applications.
  • Shortcuts for Prototyping: It can simplify certain dynamic code requirements during the prototyping phase.

Risks and Considerations eval() can be extremely dangerous if not used carefully, especially with untrusted input, due to the following reasons:

  • Security Vulnerability: It can execute arbitrary code, leading to severe consequences like data leaks or system compromise.
eval("__import__('os').system('rm -rf /')")  # Example of malicious code.
  • Unintended Side Effects: It can alter program state, delete files, or perform unintended operations.
  • Performance Overhead: Parsing and executing strings dynamically is slower than native code execution.

Example of Safe eval() Usage

allowed_globals = {"__builtins__": None}
allowed_locals = {"x": 10, "y": 20}

expression = "x + y"
try:
    result = eval(expression, allowed_globals, allowed_locals)
    print(result)  # Output: 30
except Exception as e:
    print(f"Error: {e}")

Use Alternatives When Possible

Use ast.literal_eval() for evaluating strings containing only literals (numbers, strings, tuples, lists, etc.). It is safer than eval().

import ast
result = ast.literal_eval("[1, 2, 3]")  # Safely evaluates a list literal.

exec()

The exec() function in Python dynamically executes Python code. Unlike eval(), which evaluates a single expression, exec() can execute entire blocks of code, including statements, definitions, loops, and imports.

(function) def exec(
    source: str | ReadableBuffer | CodeType,
    globals: dict[str, Any] | None = None,
    locals: Mapping[str, object] | None = None,
    /,
    *,
    closure: tuple[CellType, ...] | None = None
) -> None

Execute the given source in the context of globals and locals.

  • The source may be a string representing one or more Python statements or a code object as returned by compile().
  • The globals must be a dictionary and locals can be any mapping, defaulting to the current globals and locals.
  • If only globals is given, locals defaults to it.
  • The closure must be a tuple of cellvars, and can only be used when source is a code object requiring exactly that many cellvars.
code = """
x = 10
y = 20
result = x + y
"""
exec(code)
print(result)  # Output: 30

Why Do We Need exec()? exec() is useful in situations where:

  • Dynamic Code Execution: You need to run dynamically generated or loaded Python code. Example: Executing a Python script stored in a database or user-provided script.
  • Flexible Program Design: It enables execution of Python code defined outside the program at runtime. Example: Plugins or runtime scripting for applications.
  • Prototyping and Testing: Allows for quick testing of dynamically generated Python code.

Best Practices for Using exec()

  • Avoid Untrusted Input: Never pass unvalidated user input to exec().
  • Use a Restricted Environment: Limit access to sensitive resources using controlled globals and locals. Disable built-in functions by passing {"__builtins__": None}.

Limitations of exec()

  • Lack of Isolation: Code executed with exec() runs in the same interpreter, affecting or being affected by global and local states.

  • Complexity in Debugging: Tracing errors and bugs in dynamically executed code can be challenging.

  • Performance: exec() incurs runtime overhead and is generally slower than pre-compiled or static code.

  • No Return Value: Unlike eval(), which returns the result of the expression, exec() returns None. Any results must be extracted via the locals or globals dictionaries.

How to Overcome the Limitations

  • Restricted Execution Environment: Use restricted globals and locals to control variable and resource access.

  • Compile for Validation: Use compile() to pre-compile code for syntax validation before executing.

code = """
x = 10
y = 20
result = x + y
"""
compiled_code = compile(code, '<string>', 'exec')
exec(compiled_code)
print(result)  # Output: 30

Safe Example of exec() Usage

safe_globals = {"__builtins__": None}
safe_locals = {"x": 10, "y": 20}

code = """
result = x + y
"""
try:
    exec(code, safe_globals, safe_locals)
    print(safe_locals["result"])  # Output: 30
except Exception as e:
    print(f"Error: {e}")

When to Use exec() vs eval()

Use eval() when:

You need to evaluate a single Python expression and return its value. Example: result = eval("x + y", {"x": 10, "y": 20})

Use exec() when:

You need to execute more complex or multi-line Python code, such as loops, function definitions, or imports.

code = """
def greet():
    return "Hello, World!"
message = greet()
"""
exec(code)
print(message)  # Output: Hello, World!

Both functions should be used cautiously and only when necessary, with a strong emphasis on security and input validation.

Default Behavior of Equality Comparison (==) Without eq

When two instances of a class are compared using the equality operator (==) and the class does not implement the __eq__ method, Python falls back to the default behavior inherited from the object class. Here's what happens:

  • Identity Comparison:

By default, == checks whether the two instances are the same object in memory. This is equivalent to using the is operator. If the two instances have the same memory address (i.e., they are the same object), the comparison returns True. Otherwise, it returns False.

  • No Attribute-Based Comparison:

Python does not compare the attributes of the instances unless __eq__ is explicitly implemented.

  • Fallback to is:

Essentially, without a custom __eq__, a == b is the same as a is b.

class MyClass:
    pass

obj1 = MyClass()
obj2 = MyClass()

# Comparing two different instances
print(obj1 == obj2)  # Output: False (default is identity comparison)

# Assigning one reference to another
obj3 = obj1
print(obj1 == obj3)  # Output: True (same object in memory)

How to Customize Equality Behavior

To customize equality comparison, you can implement the eq method in your class. This allows you to define specific rules for comparing instances based on their attributes or other logic.

class MyClass:
    def __init__(self, value):
        self.value = value

    def __eq__(self, other):
        if isinstance(other, MyClass):
            return self.value == other.value
        return False

obj1 = MyClass(10)
obj2 = MyClass(10)
obj3 = MyClass(20)

print(obj1 == obj2)  # Output: True (values are equal)
print(obj1 == obj3)  # Output: False (values are different)

Key Points to Remember

  • Default Behavior:

Without __eq__, == checks for object identity (whether they are the same object in memory). This is equivalent to a is b.

  • Customizing Comparison:

Implementing __eq__ allows you to define equality based on attributes or logic.

  • Implementation Best Practices:

Ensure __eq__ handles comparisons with objects of other types gracefully. Typically, you should implement __eq__ alongside __hash__ if instances of the class are to be used as dictionary keys or in sets. This ensures consistent behavior.

hash()

Dict storing and lookup

Lookups in Python's dict are incredibly fast because of its underlying hash table implementation. Here’s a step-by-step detailed explanation of how lookups work under the hood in a Python dictionary.


1. Hash Table Basics

A hash table is the data structure behind Python’s dict. It:

  • Maps keys to values using hashing.
  • Organizes data in an array-like structure where each index is a "bucket."
  • Uses a hash function to calculate an index based on the key.

Key Terms:

  • Hash Function: Computes a hash value for a key. Python uses the built-in hash() function for this.
  • Bucket: A slot in the hash table where key-value pairs are stored.
  • Collision: When two different keys produce the same hash value, requiring additional logic to distinguish them.

2. Anatomy of a Python Dictionary

  • A dict is backed by a dynamic array of fixed-size slots.

  • Each slot holds:

    1. The hash of the key.
    2. The key itself.
    3. The associated value.
  • This combination ensures that even in cases of hash collisions, Python can check for equality (__eq__) to find the right key.


3. The Lookup Process

Let’s break down how Python finds the value for a given key in a dict:

Step 1: Compute the Hash

  • Python calls the hash() function on the key to compute its hash value.
  • Example:
    key = "example"
    hash_value = hash(key)  # Example output: 123456789

Step 2: Map the Hash to a Bucket Index

  • The hash value is converted to a bucket index using the modulo operation:
    index = hash_value % len(hash_table)
  • This ensures the hash value fits within the size of the hash table.

Step 3: Probe the Bucket

  • Python looks at the computed index in the hash table.
  • If the bucket contains the hash, key, and value, and the key matches, the lookup succeeds.

Step 4: Handle Collisions

  • If there’s a collision (two keys hash to the same index), Python resolves it:
    • Python uses open addressing with probing:
      • It probes subsequent buckets (linear, quadratic, or another strategy) to find an empty or matching slot.
    • During probing, Python compares the hash and key (__eq__) to ensure it matches the desired key.

Step 5: Return the Value

  • Once the correct key is found, Python retrieves the associated value.

4. Example Walkthrough

# Create a dictionary
my_dict = {"name": "Alice", "age": 25}

# Lookup: my_dict["name"]
key = "name"
  1. Compute the Hash:

    hash("name")  # Example hash value: 452398476
  2. Map to a Bucket: Suppose the hash table size is 8:

    index = 452398476 % 8  # Example index: 4
  3. Probe the Bucket:

    • Python checks the bucket at index 4:
      • If it finds ("name", hash("name"), "Alice"), the lookup succeeds.
  4. Resolve Collisions (if necessary):

    • If there’s a collision at index 4:
      • Python probes nearby buckets (e.g., index 5, 6, etc.) until it finds the matching key or an empty slot.
  5. Return the Value:

    • Once the correct key is found, Python retrieves "Alice".

5. Handling Key Errors

If the key doesn’t exist:

  • Python probes the hash table as described but doesn’t find the key.
  • A KeyError is raised.
my_dict = {"name": "Alice"}
my_dict["age"]  # Raises KeyError

6. Hash Collisions and Probing

What is a Collision?

  • Two distinct keys may produce the same hash value.
  • Example:
    key1 = "abc"
    key2 = "xyz"
    hash(key1) == hash(key2)  # Possible, though rare

Collision Resolution in Python

  • Python resolves collisions using open addressing:
    • It searches for the next available bucket when a collision occurs.

7. Hash Table Resizing

When the dictionary grows too full, Python resizes the hash table to maintain efficiency.

Resizing Process:

  1. A new, larger hash table is allocated (typically double the size).
  2. All existing key-value pairs are rehashed and moved to the new table.
  3. This ensures lookups remain fast.

Load Factor:

  • Python resizes the hash table when the load factor (ratio of items to table size) exceeds a threshold (usually ~2/3 full).

8. Efficiency

  • Best Case:
    • Lookup, insertion, and deletion are O(1) on average.
  • Worst Case:
    • In the rare event of many collisions, lookup can degrade to O(n), but this is mitigated by good hash functions and resizing.

9. Advantages of Python's Dict Implementation

  1. Speed:
    • Lookups, insertions, and deletions are incredibly fast on average (O(1)).
  2. Dynamic Resizing:
    • Ensures performance is maintained even as the dictionary grows.
  3. Collision Handling:
    • Efficiently manages hash collisions with open addressing and probing.
  4. Memory Efficiency:
    • Combines hash values, keys, and values in a compact structure.

Summary of Lookup Workflow

  1. Compute the hash of the key using hash().
  2. Map the hash to a bucket index using hash % table_size.
  3. Probe the bucket for the matching key and value.
  4. Resolve collisions if needed.
  5. Return the value if found or raise a KeyError if not.

This design makes Python's dict one of the most efficient and versatile data structures for key-value storage.

Why Use hash % table_size to Map to a Bucket Index?

The % operator (modulo) is used in hash table implementations to ensure that a hash value is mapped to a valid index within the bounds of the hash table. This approach, known as modulo hashing, is both simple and effective for distributing keys across buckets. Let’s break it down in detail:


1. Why Do We Need to Map Hash Values to Buckets?

The hash() function in Python generates a large integer (possibly negative) based on the key. However:

  • The hash table (an array-like structure) has a finite size, typically much smaller than the range of possible hash values.
  • To place the key-value pair into the hash table, the large hash value must be reduced to fit within the table's size (e.g., an array of size n).

The challenge:

  • We need to map the potentially huge range of hash values to the limited range [0, table_size - 1].

2. How Modulo Operation Helps

The modulo operator % computes the remainder of a division. When applied to a hash value:

index = hash_value % table_size
  • This operation guarantees that the resulting index is always between 0 and table_size - 1, ensuring it fits within the array bounds.
  • For example:
    • If hash_value = 12345 and table_size = 8, then index = 12345 % 8 = 1.
    • If hash_value = -12345 and table_size = 8, then Python adjusts the result to keep the index positive (e.g., index = 3).

3. Why Modulo Specifically?

Uniform Distribution

  • Modulo distributes hash values uniformly across the available buckets, minimizing clustering.
  • Even if the hash values are large or arbitrary, % table_size ensures a fair spread of keys.

Efficiency

  • The modulo operation is computationally inexpensive, making it ideal for real-time lookups.
  • It avoids more complex or slower methods of mapping hash values to indices.

Wrap-Around Behavior

  • Modulo inherently "wraps around" the range of possible indices. For example:
    • If table_size = 8, valid indices are [0, 1, 2, 3, 4, 5, 6, 7]. Any hash value is reduced into this range.

4. An Example Walkthrough

Let’s consider a hash table with 8 buckets (table_size = 8):

Keys and Their Hash Values

keys = ["apple", "banana", "cherry"]
hash_values = [hash("apple"), hash("banana"), hash("cherry")]

Calculating Bucket Indexes

bucket_indices = [hash(key) % 8 for key in keys]

Suppose the hash values are:

  • hash("apple") = 12345
  • hash("banana") = 67890
  • hash("cherry") = 13579

The modulo operation maps these to indices:

  • 12345 % 8 = 1 → "apple" is placed in bucket 1.
  • 67890 % 8 = 2 → "banana" is placed in bucket 2.
  • 13579 % 8 = 7 → "cherry" is placed in bucket 7.

5. Why Not Use Other Methods Instead of Modulo?

Direct Hash Value Using the raw hash value as the index would not work because:

  • Hash values can be arbitrarily large (or negative), so they would exceed the bounds of the table.
  • It would lead to excessive memory usage if the table had to accommodate all possible hash values.

Multiplication or Bitwise Operations While alternatives like multiplication or bitwise masking are also used in some hash table implementations (e.g., power-of-2 sizes), modulo is straightforward and versatile. However:

  • Bitwise AND (&) can replace % if the table size is a power of 2 (e.g., table_size = 8):
    index = hash_value & (table_size - 1)
    This is faster than modulo but less flexible because it requires a power-of-2 table size.

6. Limitations of Modulo Hashing

  1. Clustering

    • If table_size is poorly chosen (e.g., not a prime number), certain patterns in hash values can cause uneven bucket distribution (e.g., many keys end up in the same bucket).
    • To mitigate this, many hash table implementations choose a table size that is a prime number.
  2. Performance on Resize

    • When the table grows or shrinks, all hash values need to be re-mapped to new bucket indices.

7. A Deeper Dive into Python’s Implementation

Python’s dictionary implementation is slightly more sophisticated:

  • Instead of directly using hash % table_size, Python uses additional techniques (like a mask for power-of-2 table sizes) to improve performance and minimize clustering.
  • The hash table size is dynamically adjusted as the dictionary grows, maintaining a load factor below a threshold (e.g., ~2/3 full).

8. Summary

Using hash % table_size ensures:

  1. Index Validity: The resulting index is always within the bounds of the hash table.
  2. Uniformity: It spreads keys evenly across buckets, minimizing collisions.
  3. Simplicity: The modulo operation is efficient and easy to implement.

Without %, the hash value would either:

  • Be too large to fit in the hash table.
  • Cause indexing errors or memory wastage.

By using % table_size, Python ensures that its dictionaries remain fast, efficient, and reliable for key lookups.

Yes, exactly! When storing a key-value pair in a Python dictionary, the same hashing and modulo calculation is performed to determine the index (bucket) where the pair will be stored. Let’s break this process down step by step:


1. Storing a Key-Value Pair in a Python Dictionary

Suppose you want to store a key-value pair like this:

my_dict["name"] = "Alice"

Step-by-Step Process

  1. Compute the Hash of the Key

    • Python computes the hash of the key using the hash() function.
    • For example:
      key = "name"
      hash_value = hash(key)  # Example output: 123456789
  2. Calculate the Bucket Index

    • Using the modulo operation, the hash value is mapped to a bucket index in the hash table:
      index = hash_value % table_size
    • If the hash table has 8 buckets (table_size = 8), the calculation might look like this:
      index = 123456789 % 8  # Example output: 5
  3. Store the Key-Value Pair in the Bucket

    • Python checks the bucket at the calculated index (index = 5).
    • If the bucket is empty:
      • The key-value pair is stored directly in the bucket.
    • If the bucket is already occupied (due to a hash collision):
      • Python uses probing (e.g., linear probing or quadratic probing) to find the next available slot.
      • The key and its hash are stored along with the value to allow proper collision resolution later.

2. Example Walkthrough

Let’s store multiple key-value pairs in a dictionary:

my_dict = {}
my_dict["name"] = "Alice"
my_dict["age"] = 25
my_dict["city"] = "New York"

Key-Value Pair: "name": "Alice"

  1. Compute the Hash:

    hash("name") = 123456789
  2. Calculate the Index:

    index = 123456789 % 8  # Example table_size = 8
    index = 5
  3. Store in Bucket:

    • The pair ("name", hash("name"), "Alice") is stored in bucket 5.

Key-Value Pair: "age": 25

  1. Compute the Hash:

    hash("age") = 987654321
  2. Calculate the Index:

    index = 987654321 % 8
    index = 1
  3. Store in Bucket:

    • The pair ("age", hash("age"), 25) is stored in bucket 1.

Key-Value Pair: "city": "New York"

  1. Compute the Hash:

    hash("city") = 567890123
  2. Calculate the Index:

    index = 567890123 % 8
    index = 7
  3. Store in Bucket:

    • The pair ("city", hash("city"), "New York") is stored in bucket 7.

3. Handling Collisions While Storing

When two keys produce the same hash value or map to the same index, Python uses collision resolution techniques to ensure that both key-value pairs can coexist.

Collision Example

If we add a new key "job" with hash("job") = 123456789, it would map to the same index as "name" (index = 5).

Steps to Handle the Collision:

  1. Python detects that bucket 5 is already occupied.
  2. It compares the new key's hash (hash("job")) with the existing hash in the bucket.
  3. If the hashes differ, Python probes (checks nearby buckets) to find the next available slot.
  4. The new pair ("job", hash("job"), "Engineer") is stored in a different bucket.

4. Why Hash Calculation Happens Both During Storage and Lookup

The same hashing and modulo operation are used for both storage and lookup to ensure consistency:

  • During Storage: The hash and modulo calculation determine the bucket where the key-value pair is stored.
  • During Lookup: The hash and modulo calculation are repeated to find the same bucket and locate the value.

This consistency is key to the efficiency of hash tables.


5. Dynamic Resizing of the Hash Table

When the dictionary becomes too full:

  1. Python allocates a larger hash table (usually doubling its size).
  2. All existing key-value pairs are rehashed and redistributed into the new table.

This resizing ensures that:

  • Collisions remain rare.
  • Performance remains efficient (O(1) average time complexity for operations).

6. Summary

  • During Storage:

    1. Compute the hash of the key.
    2. Map the hash to a bucket index using % table_size.
    3. Store the key, its hash, and the value in the bucket.
    4. Resolve collisions if needed.
  • During Lookup:

    • The same hash and index calculations are repeated to locate the correct bucket.

This approach ensures fast and reliable storage and retrieval of key-value pairs in Python dictionaries.

When two keys hash to the same index due to a hash collision (i.e., different hash values but the same bucket index from hash % table_size), Python uses a strategy called open addressing with probing to resolve the collision. Let’s break down how this works during storage and retrieval:


1. Storing Key-Value Pairs with Collisions

Example Scenario:

  • Hash table size (table_size) = 8.
  • Key 1:
    • hash(1) = 12345
    • index = hash(1) % 8 = 3
    • Stored in bucket 3.
  • Key 2:
    • hash(2) = 23456
    • index = hash(2) % 8 = 3 (collision with key 1).

How Python Resolves the Collision During Storage:

  1. Detect the Collision:

    • When trying to store key 2, Python finds bucket 3 already occupied (by key 1).
  2. Probe for the Next Free Bucket:

    • Python probes the next bucket (index 4, then 5, etc.) until it finds an empty bucket.
    • The probing strategy is typically linear probing (incrementing the index) or quadratic probing (skipping more buckets as the probing progresses).
    • Let’s assume Python finds bucket 5 is free.
  3. Store the Key-Value Pair in the Free Bucket:

    • Python stores ("key 2", hash(2), value_2) in bucket 5.

2. Retrieving Key-Value Pairs After a Collision

Now, when retrieving key 2, Python must locate it despite the collision. The process involves probing again to find the correct key.


How Python Resolves Collisions During Retrieval:

  1. Start at the Calculated Index:

    • Python computes the hash and initial index for key 2:
      index = hash(2) % 8 = 3
    • Python checks bucket 3.
  2. Check the Key in the Bucket:

    • Bucket 3 contains ("key 1", hash(1), value_1).
    • Python compares key 2 with the key stored in bucket 3:
      • The hash (hash(2)) does not match the stored hash (hash(1)), or
      • The keys are not equal (key 2 != key 1).
  3. Probe the Next Bucket:

    • Python continues probing to the next bucket (index 4, index 5, etc.).
    • At bucket 5, Python finds:
      • The hash matches (hash(2)) and the keys are equal (key 2 == key 2).
  4. Return the Value:

    • Python retrieves the value associated with key 2.

3. Why This Works:

  • During both storage and retrieval, Python uses the same probing sequence:

    • The initial index is always calculated as hash % table_size.
    • If a collision occurs, Python probes the table in a predictable order.
  • This ensures consistency:

    • The same key always maps to the same index and follows the same probing path.

4. Handling Multiple Collisions

If there are multiple collisions (e.g., keys 1, 2, and 3 all hash to index 3):

  • Python probes further to find empty buckets for storage.
  • During retrieval, it probes until the correct key is found or an empty bucket is encountered (indicating the key does not exist).

5. Efficiency of Probing

  • Average Case: Probing is efficient when the hash table is sparsely populated. Most lookups resolve in a few probes (O(1) on average).
  • Worst Case: If the table is densely populated, multiple collisions can lead to longer probing chains (O(n) in the worst case). To mitigate this:
    • Python resizes the table when it gets too full, ensuring a low load factor (typically ≤ 2/3).

6. Summary

  • When a collision occurs during storage:
    • Python detects it and probes for the next available bucket to store the new key-value pair.
  • During retrieval:
    • Python starts at the calculated index and probes in the same sequence until the correct key is found or an empty bucket is reached.
  • This method ensures that all keys are retrievable even when collisions occur, while maintaining consistent performance through resizing and efficient probing strategies.

You're absolutely correct! While Python dictionaries are designed to achieve O(1) average time complexity for lookups, their actual performance depends on how densely the hash table is populated and the presence of collisions. This means that:

  • Lookups are average-case O(1) when collisions are rare or minimal.
  • Lookups can degrade to O(1 + k) where ( k ) is the number of probes required to find the desired key in the case of collisions.

Let’s explain this in more detail:


1. How Probing Affects Lookup Time

When a collision occurs during key lookup:

  1. The hash value of the key is used to calculate the initial index with hash % table_size.
  2. If the key is not in the bucket at that index, Python probes (checks other buckets) to locate the key.
  3. The number of probes required is determined by the collision resolution strategy (e.g., linear probing, quadratic probing).

Key Insight: The lookup time becomes ( O(1 + k) ), where ( k ) is the number of probes required to find the key. The value of ( k ) depends on:

  • The number of collisions.
  • The load factor of the hash table.

2. Factors Affecting ( k ):

a. Load Factor The load factor is the ratio of the number of entries in the dictionary to the total number of buckets in the hash table: [ \text{Load Factor} = \frac{\text{Number of Entries}}{\text{Table Size}} ]

  • Low Load Factor: Fewer collisions, ( k ) is small, performance is close to O(1).
  • High Load Factor: More collisions, ( k ) increases, performance degrades.

Python resizes the hash table when the load factor exceeds a certain threshold (typically ( \sim2/3 )), keeping ( k ) small.


b. Quality of the Hash Function

  • A good hash function ensures an even distribution of keys across the table, reducing clustering and minimizing ( k ).
  • Poor hash functions may produce many collisions, increasing ( k ).

c. Probing Strategy

  • Linear Probing: ( k ) grows linearly with the number of collisions.
    • For example: If index 3 is full, check 4, then 5, etc.
  • Quadratic Probing: ( k ) grows quadratically, spreading probes out to reduce clustering.
    • For example: If index 3 is full, check ( 3+1^2, 3+2^2, 3+3^2, \ldots ).
  • Double Hashing: Uses a second hash function to determine the probing step size, reducing ( k ).

3. Best Case vs. Worst Case Lookup Time

Best Case (O(1)):

  • The key maps directly to a bucket with no collisions.
  • The value is retrieved immediately.

Worst Case (O(1 + n)):

  • Every key hashes to the same index, causing ( n ) collisions.
  • Probing traverses all buckets in the table.
  • Python mitigates this by resizing the hash table when the load factor becomes too high.

4. Python's Optimizations to Keep ( k ) Small

Python dictionaries are highly optimized to keep ( k ) low:

  1. Dynamic Resizing:

    • When the table gets too full, it is resized (typically doubled in size), and all keys are rehashed and redistributed.
    • This keeps the load factor low, reducing collisions.
  2. Sparse Tables:

    • Python deliberately leaves the table sparsely populated (load factor ( \leq 2/3 )) to minimize collisions and ( k ).
  3. Hash Function Design:

    • Python's hash() function ensures good key distribution to reduce clustering.
  4. Efficient Probing:

    • Python uses a combination of techniques (including linear and secondary probing) to efficiently resolve collisions.

5. Practical Implications

While theoretical performance can degrade to ( O(n) ) in the worst case (e.g., extremely poor hash function or high load factor), Python's dictionary implementation ensures that:

  • ( k ) remains very small in most practical scenarios.
  • Lookup performance stays close to ( O(1) ) for well-designed use cases.

6. Summary

Yes, dictionary lookups in Python can be O(1 + k), where ( k ) is the number of probes required in case of collisions. However:

  • Python's design keeps ( k ) small by maintaining a low load factor and using a good hash function.
  • In practice, Python dictionaries are extremely efficient and perform close to ( O(1) ) in almost all cases.

“Banker's Rounding” or “round half to even”

The outputs of the code:

print(round(7.5)) # 8
print(round(6.5)) # 6

Explanation: Python's built-in round() function rounds numbers to the nearest integer. When the fractional part of a number is exactly 0.5, Python follows a specific rule to break the tie: it rounds to the nearest even integer. This rule is called "round half to even" or "bankers' rounding" and is used to reduce rounding bias in large datasets.

  1. round(7.5):

    • The fractional part is 0.5.
    • The nearest integers are 7 and 8.
    • Since 8 is even, Python rounds 7.5 to 8.
  2. round(6.5):

    • The fractional part is 0.5.
    • The nearest integers are 6 and 7.
    • Since 6 is even, Python rounds 6.5 to 6.

This behavior ensures that rounding errors are distributed evenly rather than consistently rounding up or down.

Instead of the traditional method, Python uses what's known as “Banker's Rounding” or “round half to even”. In this method, if a number is equidistant between two others, it rounds to the nearest even number

Dataclasses

from dataclasses import dataclass, field

@dataclass
class Person:
    name: str
    age: int
    height: float
    weight: float
    # vip: bool = False # default can be provided like this as well for the immutable data types
    vip: bool = field(default=False)
    friends_list: list[str] = field(default_factory=list) # default value for the mutable data types must be provided like this

p1 = Person("Logesh", 24, 6, 70, True, ['all', 'in', 'all'])

print(p1)

post_init

Doc

Ex: this allows for initializing field values that depend on one or more other fields. For example:

image

**The __post_init__ run only once after the __init__.

Initvar

Init-only variables

If a field is an InitVar, it is considered a pseudo-field called an init-only field. As it is not a true field, it is not returned by the module-level fields() function. Init-only fields are added as parameters to the generated init() method, and are passed to the optional post_init() method. They are not otherwise used by dataclasses.

image

Initvar is meant to be only used for init not as attributes. Since is_rare is made as Initvar with the default value as 1 and created instance by passing the value as 1000 to is_rare. Getting the value from the instance is still the default value that we assign, not the value that we passed.

property

Like normal class we can also specify the property decorator in data class

image

asyncio

Python Asyncio

Cooperative multitasking

TaskGroup is a high-level construct in Python's asyncio library introduced in Python 3.11. It simplifies working with multiple asynchronous tasks by managing their lifecycle in a structured and safe manner, similar to how asyncio.gather or asyncio.create_task work but with improved error handling.

Key Features of TaskGroup

  1. Structured Concurrency: Tasks are managed as a group, ensuring that their lifecycle is bound to the TaskGroup context.
  2. Automatic Cancellation: If one task fails, all other tasks in the group are automatically canceled, ensuring that the program doesn't continue in an inconsistent state.
  3. Error Propagation: Errors are automatically propagated and aggregated, making debugging easier.

Example: Using TaskGroup

import asyncio

async def task1():
    await asyncio.sleep(1)
    print("Task 1 done")

async def task2():
    await asyncio.sleep(2)
    print("Task 2 done")

async def main():
    async with asyncio.TaskGroup() as tg:
        tg.create_task(task1())
        tg.create_task(task2())

asyncio.run(main())

Key Methods

  • create_task(coro): Schedules a coroutine as a task within the TaskGroup.

Exception Handling Example

async def faulty_task():
    await asyncio.sleep(1)
    raise ValueError("Something went wrong!")

async def main():
    try:
        async with asyncio.TaskGroup() as tg:
            tg.create_task(task1())
            tg.create_task(faulty_task())
    except Exception as e:
        print(f"Caught an exception: {e}")

asyncio.run(main())

Why Use TaskGroup?

  • Simpler Code: Easier management of multiple tasks.
  • Better Error Handling: Automatic cancellation of related tasks on failure.
  • Cleaner API: More readable and maintainable than manual task management.

image

image

image

Walrus Operator

The walrus operator (:=), officially known as the assignment expression, was introduced in Python 3.8. It allows you to assign a value to a variable within an expression.

name := expression

Benefits:

  • Reduces code length: Combines assignment and evaluation into one step.
  • Improves readability: Keeps related expressions together.
  • Eliminates repetitive operations: Avoids calculating the same expression multiple times.

without Walrus operator

n = len(a)
if n > 10:
    print(f"List is too long ({n} elements, expected <= 10)")

with Walrus operator

if (n := len(a)) > 10:
    print(f"List is too long ({n} elements, expected <= 10)")

Generators

Stackoverflow

Python Wiki

Match-case

In Python, the match case is a control flow statement introduced in Python 3.10, used for pattern matching. It's similar to switch statements in other languages but is much more powerful and flexible, enabling you to match against complex data structures and conditions.

Syntax

match subject:
    case pattern1:
        # code block for pattern1
    case pattern2:
        # code block for pattern2
    case _:
        # code block for default case

Key Points

  1. match subject: The subject is the value being tested.
  2. case pattern: Patterns are matched against the subject. If a pattern matches, the corresponding code block runs.
  3. Wildcard _: The _ is a wildcard that matches anything (acts as a "default" case).

Examples

Basic Example

def http_status(status):
    match status:
        case 200:
            return "OK"
        case 404:
            return "Not Found"
        case 500:
            return "Internal Server Error"
        case _:
            return "Unknown Status"

print(http_status(200))  # Output: OK
print(http_status(403))  # Output: Unknown Status

Matching Data Structures

def process_data(data):
    match data:
        case {"type": "error", "code": code}:
            return f"Error with code {code}"
        case {"type": "success", "value": value}:
            return f"Success with value {value}"
        case _:
            return "Unknown data structure"

print(process_data({"type": "error", "code": 404}))  # Output: Error with code 404
print(process_data({"type": "success", "value": 42}))  # Output: Success with value 42

Nested Structures

def analyze_point(point):
    match point:
        case (0, 0):
            return "Origin"
        case (x, 0):
            return f"Point on the x-axis at {x}"
        case (0, y):
            return f"Point on the y-axis at {y}"
        case (x, y):
            return f"Point at ({x}, {y})"
        case _:
            return "Not a point"

print(analyze_point((0, 0)))  # Output: Origin
print(analyze_point((3, 0)))  # Output: Point on the x-axis at 3

Using Guards Guards allow conditional matching using if.

def describe_number(num):
    match num:
        case x if x < 0:
            return "Negative number"
        case x if x == 0:
            return "Zero"
        case x if x > 0:
            return "Positive number"

print(describe_number(-5))  # Output: Negative number

When to Use match

  • For Structured Data: Works great with dictionaries, tuples, lists, and custom objects.
  • Cleaner Alternatives: When many if-elif statements make the code less readable.
  • Complex Matching: When you need to test patterns or conditions beyond basic equality.

This makes match versatile and suitable for modern, pattern-matching-based programming paradigms.

The success of the case statement is based on two things

  • Pattern matching to the case
  • [Optional] If we have any condition (guard clause) then pattern matching should happen first and then guard clause is executed. if it success then only case statement get executed.

The variable in the case is first assigned with the value from the match if it macthes the pattern and then it checks the condition

image

If the default case is matched then all the case variable before that, will get assigned with that value regardless of the condition is success or not but it should match the pattern.

Once the pattern is macthed then only the condition is evaluated (guard condition) if any provided. Else pattern match is enough to make the case execute.

  • If the first case is matched successfully then only that case variable will get assigned with the value. The rest is not assigned, since rest of the cases were not executed.

image

  • If the second case condition is success, then first and second case var will get assigned with that match value if the first has matched the pattern but failes the gaurd clause.

image

Its not always neccessary to use different var name for different case. But if we do so then the match value will be assigned to those vars when they match the pattern (regardless of whether the condition is success or not).


The default case or wildcard case should exists only one. here in the below case default_case: and case _: tries to catch the wildcard. So we encounter the error.

SyntaxError: name capture 'default_case' makes remaining patterns unreachable

image

Advanced matching

  • We can just play with the patterns

image


image


image


  • Here we can concluded that pattern matching should happen then only the value will be assigned to that case var. Pattern match and then condition needs to satisfied if any(if statements we provided) inorder to execute that case.

image

Memoization

Memoization is an optimization technique used to speed up programs by storing the results of expensive function calls and reusing those results when the same inputs occur again. In Python, memoization can be implemented manually or by using built-in tools like functools.lru_cache or or functools.cache.


How Memoization Works

  1. Store Results: When a function is called with specific arguments, the result is stored in a data structure (e.g., dictionary).
  2. Reuse Results: If the function is called again with the same arguments, the stored result is returned instead of recomputing it.

This technique is particularly useful for recursive functions or functions with repeated computations.


Manual Implementation of Memoization

You can manually implement memoization using a dictionary to cache results.

Example: Fibonacci with Manual Memoization

def fib(n, memo={}):  # Default argument is a mutable dictionary
    if n in memo:  # Check if the result is already in the cache
        return memo[n]
    if n <= 1:  # Base cases
        return n
    # Compute and store the result in the cache
    memo[n] = fib(n - 1, memo) + fib(n - 2, memo)
    return memo[n]

print(fib(10))  # Output: 55
print(fib(50))  # Output: 12586269025
  • memo is a dictionary that stores results for specific n.
  • When fib(10) is called, results for smaller Fibonacci numbers (like fib(9), fib(8), etc.) are stored in memo.

Using functools.lru_cache

Python provides a built-in way to add memoization using the functools.lru_cache decorator. It automatically caches function results without requiring manual caching logic.

Example: Fibonacci with lru_cache

from functools import lru_cache

@lru_cache(maxsize=None)  # Cache with unlimited size
def fib(n):
    if n <= 1:  # Base cases
        return n
    return fib(n - 1) + fib(n - 2)

print(fib(10))  # Output: 55
print(fib(50))  # Output: 12586269025
  • @lru_cache: Caches results of the decorated function.
  • maxsize: Limits the cache size. If maxsize=None, the cache can grow indefinitely.
  • Efficient and Easy: Eliminates the need to manage a manual cache.

Comparison: Manual vs. lru_cache

Feature Manual Memoization functools.lru_cache
Setup Requires manual implementation Simple decorator syntax
Flexibility Fully customizable Limited to lru_cache features
Cache Size Control Unlimited or custom logic Controlled with maxsize
Ease of Use More effort Easier, concise implementation

Practical Use Cases for Memoization

  1. Recursive Functions:

    • Fibonacci numbers
    • Factorials
    • Dynamic programming problems (e.g., knapsack, longest common subsequence)
  2. Expensive Calculations:

    • Complex mathematical computations
    • Simulations with repeated input parameters
  3. Caching in Web Applications:

    • Cache database or API query results to avoid redundant calls.

Clearing the Cache If you need to clear the cache at runtime (e.g., when inputs change or memory needs to be freed), you can use the .cache_clear() method.

Cache Info

The @lru_cache and cache decorator provides the .cache_info() method, which returns detailed statistics about the cache:

  • hits: Number of times a cached result was used.
  • misses: Number of times a result had to be computed.
  • maxsize: Maximum size of the cache.
  • currsize: Current number of items in the cache.

In Python 3.9 and later, functools.cache is introduced as an alternative to functools.lru_cache for memoization. It provides simple, unlimited caching without the additional features of lru_cache, such as limiting cache size.

How cache Works

  1. The @cache decorator automatically stores the results of function calls in an internal dictionary.
  2. If the function is called again with the same arguments, the stored result is returned directly, skipping the computation.

Comparison: cache vs. lru_cache

Feature functools.cache functools.lru_cache
Cache Size Limit Unlimited Customizable (maxsize)
Eviction Policy None Least Recently Used (LRU)
Simplicity Simpler, no parameters More flexible
Python Version Introduced in Python 3.9 Available since Python 3.2

Advantages of cache

  • Unlimited caching: No need to worry about cache size.
  • Simple usage: No need to specify parameters like maxsize.
  • Good for Static/Fixed Inputs: Ideal when you know your inputs won't grow excessively.

Example

import time
from functools import lru_cache

@lru_cache(maxsize=None)  # Cache with unlimited size
def expensive_func(n):
    print('Expensive operation started...')
    time.sleep(5)
    print('Ended')
    return n*4

print(expensive_func(4))
print(expensive_func(4))
print(expensive_func.cache_info())
print(expensive_func(5))
print(expensive_func.cache_info())

# Output
# Expensive operation started...
# Ended
# 16
# 16
# CacheInfo(hits=1, misses=1, maxsize=None, currsize=1)
# Expensive operation started...
# Ended
# 20
# CacheInfo(hits=1, misses=2, maxsize=None, currsize=2)
import time
from functools import cache

@cache
def expensive_func(n):
    print('Expensive operation started...')
    time.sleep(5)
    print('Ended')
    return n*4

print(expensive_func(4))
print(expensive_func(4))
print(expensive_func.cache_info())
print(expensive_func(5))
print(expensive_func.cache_info())

# Output
# Expensive operation started...
# Ended
# 16
# 16
# CacheInfo(hits=1, misses=1, maxsize=None, currsize=1)
# Expensive operation started...
# Ended
# 20
# CacheInfo(hits=1, misses=2, maxsize=None, currsize=2)

@cached_property

@cached_property is a decorator in Python, introduced in Python 3.8, that is used to cache the result of a property method the first time it is accessed. After the first computation, the cached value is returned for all subsequent accesses, without recomputing it. It is part of the functools module.


How @cached_property Works

  • When the property is accessed for the first time, the decorated method runs, and its result is stored in the instance's dictionary (__dict__).
  • On subsequent accesses, the cached value is retrieved directly from the instance's dictionary, avoiding the overhead of recalculating the value.

Use Case @cached_property is useful for:

  • Expensive computations that do not change during the object's lifetime.
  • Lazily evaluated attributes that you don't want to calculate until needed.

Example: Basic Usage

from functools import cached_property

class Circle:
    def __init__(self, radius):
        self.radius = radius

    @cached_property
    def area(self):
        print("Calculating area...")
        return 3.14159 * self.radius ** 2

circle = Circle(10)

# First access computes and caches the area
print(circle.area)  # Output: "Calculating area..." followed by 314.159

# Subsequent access retrieves the cached value
print(circle.area)  # Output: 314.159 (no "Calculating area..." message)

Explanation:

  • On the first access to circle.area, the method runs, and the result is stored in circle.__dict__['area'].
  • On subsequent accesses, the cached result is used.

Advantages

  1. Performance: Reduces repeated computations for attributes accessed multiple times.
  2. Convenience: Makes it easy to implement lazily evaluated properties.
  3. Simplifies Code: Eliminates the need to manage manual caching logic.

Clearing or Recomputing Cached Properties If you want to clear or recompute a cached property, you can delete the cached value from the instance's dictionary using del.

Example:

del circle.area  # Remove the cached value
print(circle.area)  # Recomputes and caches again

Example with Mutable Objects If the underlying data that the property depends on changes, you need to clear the cached value manually.

class Person:
    def __init__(self, first_name, last_name):
        self.first_name = first_name
        self.last_name = last_name

    @cached_property
    def full_name(self):
        return f"{self.first_name} {self.last_name}"

person = Person("John", "Doe")
print(person.full_name)  # Output: "John Doe"

# Update first_name
person.first_name = "Jane"

# Cached value remains outdated
print(person.full_name)  # Output: "John Doe"

# Clear the cache
del person.full_name
print(person.full_name)  # Output: "Jane Doe"

Comparison with @property

Feature @property @cached_property
Recomputed Every Time Yes No (cached after first access)
Performance Slower for repeated access Faster for expensive calculations
Cache Management Not applicable Cached in __dict__, can be cleared manually

Limitations

  1. Immutable Usage: It works best with attributes that do not change.
  2. Instance-Specific: The cache is stored per instance, not shared among instances.
  3. Manual Cache Clearing: If the dependent data changes, you need to manage the cache explicitly.

Monkey patching

Monkey patching refers to the dynamic modification or extension of a class or module at runtime. It allows you to alter or replace methods, attributes, or behaviors of existing classes or modules without modifying their source code.

While powerful, monkey patching can lead to unexpected behavior if not used carefully, as it changes the original implementation of a class or module globally.


How Monkey Patching Works

Monkey patching in Python is possible because of Python's dynamic nature, which allows you to:

  • Add new attributes or methods to existing classes.
  • Replace existing methods or functions with new implementations.

Example: Modifying a Class Method

Original Class

class Animal:
    def speak(self):
        return "I make a sound"

Applying a Monkey Patch

# Original behavior
animal = Animal()
print(animal.speak())  # Output: I make a sound

# Monkey patching the `speak` method
def new_speak(self):
    return "I am patched!"

Animal.speak = new_speak

# Behavior after patching
print(animal.speak())  # Output: I am patched!

Here, the speak method of the Animal class is replaced with the new_speak method.


Monkey Patching a Module

You can also modify or extend a module's behavior dynamically.

Example: Monkey Patching the math Module

import math

# Add a new function to the math module
def square(x):
    return x * x

math.square = square

# Use the new function
print(math.square(4))  # Output: 16

Real-World Example: Patching a Library

Example: Patching datetime for Testing Suppose you want to simulate a specific date and time for testing purposes.

import datetime

# Original behavior
print(datetime.datetime.now())  # Outputs current date and time

# Monkey patch the `datetime.now` method
class FixedDateTime(datetime.datetime):
    @classmethod
    def now(cls):
        return cls(2020, 1, 1, 12, 0, 0)

datetime.datetime = FixedDateTime

# Behavior after patching
print(datetime.datetime.now())  # Output: 2020-01-01 12:00:00

Advantages of Monkey Patching

  1. Flexibility: You can dynamically adapt or fix behavior without modifying the original source code.
  2. Testing: Useful for testing or simulating specific behaviors, such as fixed dates or mocked methods.
  3. Prototyping: Quick experimentation with changes or enhancements to a class or module.

Disadvantages of Monkey Patching

  1. Global Impact: Changes apply globally and can lead to unintended side effects.
  2. Readability: It can make the code harder to understand and debug, as the behavior of the patched class or module differs from its source.
  3. Fragility: Updates to the original class or module might break the patch.

Best Practices

  1. Use Sparingly: Avoid monkey patching unless absolutely necessary.
  2. Scope Patching: If possible, apply monkey patches only within the local scope where needed.
  3. Document Changes: Clearly document any monkey patches to avoid confusion for other developers.
  4. Use Libraries: Consider using libraries like unittest.mock for temporary patches during testing instead of permanent monkey patches.

Alternative to Monkey Patching: Mocking For testing purposes, it's better to use the unittest.mock module to temporarily replace functionality.

Example with unittest.mock.patch:

from unittest.mock import patch
import datetime

# Temporarily patch `datetime.datetime.now`
with patch('datetime.datetime.now', return_value=datetime.datetime(2020, 1, 1, 12, 0, 0)):
    print(datetime.datetime.now())  # Output: 2020-01-01 12:00:00

# Outside the patch context, the original behavior is restored
print(datetime.datetime.now())  # Outputs the actual current date and time

Conclusion Monkey patching is a powerful but potentially risky technique in Python. While it allows for dynamic changes to existing code, it's important to use it judiciously, document its use, and consider alternatives like mocking or proper subclassing to avoid unintended side effects.

timeit module

The timeit module in Python is used for measuring the execution time of small code snippets. It is useful for benchmarking and performance tuning. The module provides two primary functions: timeit() and repeat().

The timeit.timeit() function measures the time it takes to execute a code snippet repeatedly for a specified number of times.

Syntax

timeit.timeit(stmt='pass', setup='pass', timer=<default_timer>, number=1000000, globals=None)
  • stmt: The code snippet to time as a string. Defaults to 'pass'.
  • setup: The setup code to run once before timing the main code. Defaults to 'pass'.
  • timer: Timer function to use. Defaults to time.perf_counter for high precision.
  • number: The number of times to execute the code. Defaults to 1,000,000.
  • globals: A dictionary of global variables accessible to stmt and setup.

Example

import timeit

# Measure the time to execute a simple statement
execution_time = timeit.timeit('sum(range(100))', number=100000)
print(f"Execution time: {execution_time} seconds")

timeit.repeat

The timeit.repeat() function is similar to timeit.timeit(), but it runs the timing multiple times and returns a list of results. This is useful for understanding variability in execution time.

Syntax

timeit.repeat(stmt='pass', setup='pass', timer=<default_timer>, repeat=5, number=1000000, globals=None)
  • repeat: The number of times the timing experiment is repeated.
  • Other arguments are the same as in timeit.timeit().

Example

import timeit

# Measure the execution time across multiple runs
times = timeit.repeat('sum(range(100))', repeat=5, number=100000)
print(f"Execution times: {times}")
print(f"Best time: {min(times)} seconds")

Difference Between timeit and repeat

Feature timeit.timeit timeit.repeat
Output Single timing value (float) List of timing values (list of floats)
Use Case Quick timing of a code snippet Measure timing variability over multiple runs
Customizability Number of executions (number) Multiple repetitions of timing experiments (repeat)

Using setup for Imports When benchmarking code that requires imported modules or setup code, use the setup parameter.

Example:

import timeit

# Use setup to import a module
execution_time = timeit.timeit('sqrt(100)', setup='from math import sqrt', number=100000)
print(f"Execution time: {execution_time} seconds")

Comparing Two Code Snippets

You can use timeit to compare the performance of two implementations.

Example: Compare sum() and Loop

import timeit

# Using sum()
time_sum = timeit.timeit('sum(range(1000))', number=100000)

# Using a loop
time_loop = timeit.timeit('total=0\nfor i in range(1000): total += i', number=100000)

print(f"Using sum(): {time_sum} seconds")
print(f"Using loop: {time_loop} seconds")

Real-World Application timeit is ideal for:

  1. Benchmarking algorithms: Compare performance between different algorithms.
  2. Optimizing code: Identify bottlenecks and replace slower code with faster alternatives.
  3. Learning Python internals: Understand how different Python constructs behave in terms of speed.

Using timeit as a CLI Tool The timeit module can also be used directly from the command line.

Example:

python -m timeit "sum(range(100))"

Sample Output:

1000000 loops, best of 5: 0.202 usec per loop

Caveats

  1. Do not include I/O in timing tests: File operations or print statements can skew results.
  2. Warm-up time: Ignore initial timing results due to interpreter warm-up effects.
  3. System load: Run tests in a low-load environment for consistent results.

timeit and repeat are versatile tools for performance testing, and understanding them can help in writing more efficient Python code.

File handling

file reading is exhaustive. Which means once the file.read() reads the file content we can't use file.read() again to read the content. (Like generator)

image

glob

The glob module in Python is used for file and directory path matching. It allows you to search for files and directories based on specific patterns. These patterns can include wildcards like * (matches any number of characters) and ? (matches a single character). The module is particularly useful for tasks such as listing files with specific extensions or finding files that match a naming convention.

Commonly Used Functions in glob

  1. glob.glob(pattern, recursive=False)

    • Returns a list of paths that match the specified pattern.
    • If recursive is set to True, it can match files in all subdirectories (using the ** wildcard).
    import glob
    
    # List all Python files in the current directory
    python_files = glob.glob("*.py")
    print(python_files)
    
    # List all files in subdirectories recursively
    all_files = glob.glob("**/*", recursive=True)
    print(all_files)
  2. glob.iglob(pattern, recursive=False)

    • Similar to glob.glob(), but returns an iterator instead of a list.
    for file in glob.iglob("*.txt"):
        print(file)
  3. glob.escape(pathname)

    • Escapes special characters in a path, making it suitable for literal matching.
    special_path = glob.escape("file[1].txt")
    print(glob.glob(special_path))  # Matches 'file[1].txt' exactly
  4. Using Wildcards in Patterns

    • *: Matches zero or more characters.
    • ?: Matches exactly one character.
    • [seq]: Matches any character in the sequence.
    • [!seq]: Matches any character not in the sequence.
    # Examples
    print(glob.glob("data_*.csv"))  # Matches files like 'data_1.csv', 'data_abc.csv'
    print(glob.glob("file?.txt"))   # Matches 'file1.txt', 'file2.txt', but not 'file10.txt'

Example Use Case

import glob
import os

# List all JPEG and PNG files in a directory
image_files = glob.glob("images/*.[jp][pn]g")
for file in image_files:
    print(f"Found image: {os.path.basename(file)}")

about recursive

from glob import glob

print(glob('*.py', recursive=True))     # current dir ./*.py No effect in using recursive=True. 
# recursive=True will work as expected only with **
print(glob('**/*.py', recursive=False)) # one lvl nested search ./some_dir/*.py
print(glob('**/*.py', recursive=True))  # infinite lvl nested search ./recursive/all/the/way/to/end/*.py

The glob module is efficient for pattern-based file searching, but if you need more advanced features, consider using the os or pathlib modules alongside it.


Pickle

Serialization and deserialization

In Python, the pickle module is used for serializing (converting Python objects into a byte stream) and deserializing (converting byte streams back into Python objects). It is particularly useful for saving Python objects to files or transmitting them over a network.


Key Features of pickle

  • Supports Python-specific objects: Can serialize and deserialize complex Python objects, including custom classes, functions, and recursive structures.
  • Binary format: Encodes data in a compact binary format, making it faster than text-based formats like JSON.
  • Python-specific: Pickle is not cross-language; the serialized data can only be deserialized in Python.

Basic Usage of pickle

  1. Importing pickle

    import pickle
  2. Serialization (Pickling):

    • Converts a Python object into a byte stream.
    • Use pickle.dump() to write the byte stream to a file or pickle.dumps() to get it as a bytes object.

    Example:

    data = {"name": "Alice", "age": 30}
    
    # Serialize to a file
    with open("data.pkl", "wb") as file:
        pickle.dump(data, file)
    
    # Serialize to a bytes object
    byte_stream = pickle.dumps(data)
    print(byte_stream)
  3. Deserialization (Unpickling):

    • Converts a byte stream back into a Python object.
    • Use pickle.load() to read from a file or pickle.loads() to deserialize from a bytes object.

    Example:

    # Deserialize from a file
    with open("data.pkl", "rb") as file:
        loaded_data = pickle.load(file)
    print(loaded_data)
    
    # Deserialize from a bytes object
    loaded_data = pickle.loads(byte_stream)
    print(loaded_data)

Pickle in Custom Classes pickle can also serialize and deserialize instances of custom classes.

Example:

class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age

    def __repr__(self):
        return f"Person(name={self.name}, age={self.age})"

# Create an instance
person = Person("Alice", 30)

# Serialize
with open("person.pkl", "wb") as file:
    pickle.dump(person, file)

# Deserialize
with open("person.pkl", "rb") as file:
    loaded_person = pickle.load(file)
print(loaded_person)  # Output: Person(name=Alice, age=30)

Advantages of pickle

  1. Python-native serialization: Handles Python objects seamlessly, including those not supported by JSON (e.g., sets, tuples, and custom objects).
  2. Compact binary format: Makes it faster for serialization and deserialization compared to text-based formats.

Disadvantages of pickle

  1. Security Risk: Deserializing untrusted data can execute malicious code, leading to security vulnerabilities.
    • Avoid loading untrusted pickle files.
  2. Python-Specific: Pickle files are not portable across languages.
  3. Backward Compatibility: Changes in class definitions or Python versions can make old pickled files unreadable.

Best Practices

  1. Use for internal purposes: Only use pickle for applications where data security and cross-language compatibility are not concerns.
  2. Avoid untrusted sources: Always validate the source of a pickle file before unpickling it.
  3. Consider alternatives for safer serialization: Use formats like JSON for interoperability and simplicity or Protobuf for efficiency and schema validation.

The pickle module in Python provides several methods for working with serialization and deserialization beyond the basic dump and load methods. Here’s a detailed explanation of all the important methods in the pickle module:


Primary Methods

  1. pickle.dump(obj, file, protocol=None, *, fix_imports=True, buffer_callback=None)

    • Serializes a Python object obj and writes it to a binary file-like object file.
    • Parameters:
      • obj: The Python object to serialize.
      • file: A writable binary file-like object (e.g., a file opened in 'wb' mode).
      • protocol: Optional. The serialization protocol to use (see protocols below).
        • Default is HIGHEST_PROTOCOL (currently 5 in Python 3.8+).
      • fix_imports: Handles compatibility for Python 2 pickles (default: True).
      • buffer_callback: Used for out-of-band buffers with pickle5 (advanced usage).
    • Example:
      import pickle
      data = {"name": "Alice", "age": 30}
      with open("data.pkl", "wb") as f:
          pickle.dump(data, f)
  2. pickle.dumps(obj, protocol=None, *, fix_imports=True, buffer_callback=None)

    • Serializes a Python object obj and returns it as a byte stream (in-memory).
    • Same parameters as dump.
    • Example:
      byte_stream = pickle.dumps({"name": "Alice", "age": 30})
      print(byte_stream)
  3. pickle.load(file, *, fix_imports=True, encoding="ASCII", errors="strict", buffers=None)

    • Deserializes a pickled object from a binary file-like object file.
    • Parameters:
      • file: A readable binary file-like object (e.g., a file opened in 'rb' mode).
      • encoding: Used for decoding str objects (default: "ASCII").
      • fix_imports: Same as in dump.
      • errors: Error handling during decoding (default: "strict").
    • Example:
      with open("data.pkl", "rb") as f:
          data = pickle.load(f)
      print(data)
  4. pickle.loads(data, *, fix_imports=True, encoding="ASCII", errors="strict", buffers=None)

    • Deserializes a pickled object from a bytes-like object data.
    • Same parameters as load.
    • Example:
      byte_stream = b'\x80\x05\x95...\x94.'
      data = pickle.loads(byte_stream)
      print(data)

Advanced and Specialized Methods

  1. pickle.HIGHEST_PROTOCOL

    • The highest protocol version available. It is more efficient and supports the latest features.
    • Use this for optimal performance.
    • Example:
      pickle.dumps(data, protocol=pickle.HIGHEST_PROTOCOL)
  2. pickle.DEFAULT_PROTOCOL

    • The default protocol version used by pickle (currently 4 in Python 3.4+).
    • Lower than HIGHEST_PROTOCOL to maintain some backward compatibility.
    • Example:
      pickle.dumps(data, protocol=pickle.DEFAULT_PROTOCOL)
  3. pickle.Pickler

    • A class that allows more control over the pickling process.
    • Usage: Create a Pickler object for advanced serialization.
    • Example:
      with open("data.pkl", "wb") as f:
          pickler = pickle.Pickler(f)
          pickler.dump({"name": "Alice", "age": 30})
  4. pickle.Unpickler

    • A class that allows more control over the unpickling process.
    • Usage: Create an Unpickler object for advanced deserialization.
    • Example:
      with open("data.pkl", "rb") as f:
          unpickler = pickle.Unpickler(f)
          data = unpickler.load()
          print(data)

Customizing Serialization with __reduce__ and __setstate__ For custom classes, you can define how they should be pickled and unpickled by implementing special methods.

  1. __reduce__

    • Defines how an object is reduced to a serializable form.
    • Returns a tuple containing:
      1. A callable to recreate the object.
      2. Arguments to pass to that callable.
    • Example:
      class CustomClass:
          def __init__(self, name):
              self.name = name
      
          def __reduce__(self):
              return (CustomClass, (self.name,))
      
      obj = CustomClass("Alice")
      serialized = pickle.dumps(obj)
      deserialized = pickle.loads(serialized)
      print(deserialized.name)  # Output: Alice
  2. __setstate__ and __getstate__

    • __getstate__: Defines what data should be serialized.
    • __setstate__: Defines how to restore the state during deserialization.
    • Example:
      class CustomClass:
          def __init__(self, name):
              self.name = name
      
          def __getstate__(self):
              return {"name": self.name.upper()}
      
          def __setstate__(self, state):
              self.name = state["name"].lower()
      
      obj = CustomClass("Alice")
      serialized = pickle.dumps(obj)
      deserialized = pickle.loads(serialized)
      print(deserialized.name)  # Output: alice

Pickle Protocol Versions Pickle supports multiple protocols:

  • Protocol 0: Human-readable (ASCII-based), oldest format.
  • Protocol 1: Binary format, introduced in Python 2.3.
  • Protocol 2: More efficient binary format, introduced in Python 2.3.
  • Protocol 3: Introduced in Python 3.0, supports bytes objects.
  • Protocol 4: Introduced in Python 3.4, supports larger objects and out-of-band buffers.
  • Protocol 5: Introduced in Python 3.8, improves performance for large data.

Important Notes

  1. Security Warning: Never unpickle data from untrusted sources. It can execute arbitrary code.
  2. Cross-Version Compatibility: Pickled data may not always be compatible across Python versions, especially if the data involves custom objects.
  3. Alternatives: For safer or cross-language serialization, consider using JSON, MessagePack, or Protobuf.

NOQA

noqa = NO-QA (NO Quality Assurance)

It's generally used in Python code to ignore PEP8 warnings.

Lines with #noqa at the end will be ignored by linter programs and won't raise any warnings.

In Python, # noqa is a comment directive that tells linters and code style checkers like flake8 and pycodestyle to ignore any errors or warnings on that particular line.

x = 1+2  # noqa

Specific error codes: You can also specify which specific error codes you want to ignore:

x = 1+2  # noqa: E226

When to use it:

  • When you have a good reason to deviate from PEP8:

Sometimes, adhering strictly to style rules can make your code less readable. In such cases, you can use # noqa to suppress the warning.

  • When dealing with third-party code:

If you're using a library that doesn't follow PEP8, you can use # noqa to avoid getting flooded with warnings.

  • Temporary fixes:

If you need a quick fix that doesn't follow PEP8, you can use # noqa as a temporary solution until you can refactor the code properly.

⚠️ **GitHub.com Fallback** ⚠️