Python Notes - LogeshVel/learning_resources GitHub Wiki
Enums are the safe way of defining the fixed set of constant values.
In Python, an enum
(short for "enumeration") is a class that defines a set of symbolic names bound to unique, constant values. Enums are used to represent fixed sets of related constants in a readable and maintainable way.
Enums are defined using the Enum
class from the enum
module, introduced in Python 3.4.
Why Use Enums?
- Readability: Replace magic numbers or strings with meaningful names.
- Immutability: Enum members are constant and cannot be changed.
- Type Safety: Prevents invalid values from being used.
- Namespace Control: Groups related constants together.
Syntax
To create an enum, subclass the Enum
class:
from enum import Enum
class Color(Enum):
RED = 1
GREEN = 2
BLUE = 3
Key Features
-
Accessing Enum Members Enum members can be accessed by name or value:
print(Color.RED) # Output: Color.RED print(Color.RED.name) # Output: RED print(Color.RED.value) # Output: 1
-
Iteration You can iterate through all members of an enum:
for color in Color: print(color) # Output: # Color.RED # Color.GREEN # Color.BLUE
-
Comparisons Enum members are unique and can be compared:
if Color.RED == Color.RED: # True print("Same color") if Color.RED != Color.GREEN: # True print("Different colors")
-
Restricting Arbitrary Values Enums prevent assigning invalid values:
try: invalid_color = Color(4) # Raises ValueError except ValueError as e: print(e) # Output: 4 is not a valid Color
-
Some more
red = Color.RED green = Color(2) print(f'{red=}') # red=<Color.RED: 1> print(f'{red.name=}') # red.name='RED' print(f'{red.value=}') # red.value=1 print(f'{green=}') # green=<Color.GREEN: 2> print(f'{green.name=}') # green.name='GREEN' print(f'{green.value=}') # green.value=2
Example: Days of the Week
from enum import Enum
class Weekday(Enum):
MONDAY = 1
TUESDAY = 2
WEDNESDAY = 3
THURSDAY = 4
FRIDAY = 5
SATURDAY = 6
SUNDAY = 7
# Accessing members
print(Weekday.MONDAY) # Output: Weekday.MONDAY
print(Weekday.MONDAY.value) # Output: 1
# Iteration
for day in Weekday:
print(day)
Advanced Features
-
Auto Value Generation Use
auto()
to automatically assign values:from enum import Enum, auto class Status(Enum): NEW = auto() IN_PROGRESS = auto() COMPLETED = auto() print(Status.NEW.value) # Output: 1 print(Status.IN_PROGRESS.value) # Output: 2
-
Custom Methods Enums can have custom methods:
class Shape(Enum): CIRCLE = 1 SQUARE = 2 def describe(self): return f"This shape is a {self.name.lower()}." print(Shape.CIRCLE.describe()) # Output: This shape is a circle.
-
Multiple Enums Different enums are independent:
class Color(Enum): RED = 1 class Size(Enum): SMALL = 1 print(Color.RED == Size.SMALL) # Output: False
-
Iterating by Value Convert enum to a list for iteration or value-based access:
colors = list(Color) print(colors) # Output: [<Color.RED: 1>, <Color.GREEN: 2>, <Color.BLUE: 3>]
Practical Use Cases
-
State Machines:
class TrafficLight(Enum): RED = 1 YELLOW = 2 GREEN = 3
-
Configuration Constants:
class LogLevel(Enum): DEBUG = 1 INFO = 2 WARNING = 3 ERROR = 4
-
API Response Status:
class ResponseStatus(Enum): SUCCESS = 200 NOT_FOUND = 404 SERVER_ERROR = 500
Limitations
- Enum members are immutable once defined.
- Enums cannot have duplicate names, but duplicate values are allowed with a special decorator (
@unique
).
Enums make code cleaner and less error-prone when dealing with fixed sets of values.
In Python, enums can have duplicate values, but duplicate names are not allowed. However, we can enforce unique values using the @unique
decorator from the enum
module.
Example 1: Duplicate Names Are Not Allowed
If you define two members with the same name in an enum, Python will raise a SyntaxError
.
from enum import Enum
class Color(Enum):
RED = 1
GREEN = 2
RED = 3 # Duplicate name
# This will raise an error:
# SyntaxError: duplicate member 'RED' in <enum 'Color'>
Explanation:
- Enum member names (like
RED
) must be unique. The enum cannot have two members with the same name, as this would cause ambiguity.
Example 2: Duplicate Values Are Allowed By default, enums allow different members to share the same value. When this happens, only the first member with that value is considered the "canonical" name, and others become aliases.
from enum import Enum
class Color(Enum):
RED = 1
GREEN = 2
BLUE = 1 # Duplicate value, alias for RED
# Accessing members
print(Color.RED) # Output: Color.RED
print(Color.BLUE) # Output: Color.RED (alias)
print(Color.BLUE is Color.RED) # Output: True (aliases point to the same member)
# Iterating members
for color in Color:
print(color)
# Output:
# Color.RED
# Color.GREEN
Explanation:
-
Color.BLUE
is an alias forColor.RED
because they share the same value (1
). - When iterating over
Color
, only the first occurrence (Color.RED
) is included.
Enforcing Unique Values with @unique
To prevent duplicate values in an enum, use the @unique
decorator from the enum
module. This decorator raises a ValueError
if duplicate values are found.
from enum import Enum, unique
@unique
class Color(Enum):
RED = 1
GREEN = 2
BLUE = 1 # Duplicate value is not allowed
# This will raise an error:
# ValueError: duplicate values found in <enum 'Color'>: BLUE -> RED
Explanation:
- The
@unique
decorator ensures that no two members of the enum have the same value. - If duplicate values are present, a
ValueError
is raised at runtime.
When to Allow Duplicate Values Allowing duplicate values (aliases) is useful when you want multiple names for the same constant for backward compatibility or clarity.
Example with Aliases:
from enum import Enum
class HttpStatus(Enum):
OK = 200
SUCCESS = 200 # Alias for OK
NOT_FOUND = 404
print(HttpStatus.OK) # Output: HttpStatus.OK
print(HttpStatus.SUCCESS) # Output: HttpStatus.OK
print(HttpStatus.SUCCESS is HttpStatus.OK) # Output: True
Here:
-
SUCCESS
is an alias forOK
, and both point to the value200
. - This can be helpful when maintaining compatibility with older code that uses
SUCCESS
but allows the use ofOK
in new code.
Summary
-
Duplicate Names: Not allowed; causes a
SyntaxError
. - Duplicate Values: Allowed by default; creates aliases.
-
Enforcing Unique Values: Use
@unique
to ensure no duplicate values exist in the enum.
The auto()
function in Python's enum
module is used to automatically assign values to members of an enumeration. While auto()
by default assigns sequential integers starting from 1
, you can customize its behavior by defining a custom Enum
class with a modified _generate_next_value_
method.
Default Behavior of auto()
When used without customization, auto()
generates values as follows:
- The first member gets the value
1
. - Subsequent members are assigned incrementing integer values (
2
,3
, and so on).
Example:
from enum import Enum, auto
class Color(Enum):
RED = auto()
GREEN = auto()
BLUE = auto()
print(Color.RED.value) # Output: 1
print(Color.GREEN.value) # Output: 2
print(Color.BLUE.value) # Output: 3
How It Works Internally
When auto()
is used, the Enum
class calls the _generate_next_value_
method to compute the value for each member. The default implementation generates an incrementing integer starting from 1
.
Default _generate_next_value_
:
def _generate_next_value_(name, start, count, last_values):
return count + start
-
name
: Name of the enum member (e.g.,RED
). -
start
: The starting value (default is1
). -
count
: The number of members defined before this one (0 for the first, 1 for the second, etc.). -
last_values
: The list of previously assigned values.
For the first member, count = 0
and start = 1
, so 1 + 0 = 1
.
Customizing auto()
Behavior
You can override _generate_next_value_
in your custom enum class to generate values other than sequential integers. This allows more flexibility, such as generating strings, tuples, or custom sequences.
Example 1: Custom Sequential Values
Generate multiples of 10
:
from enum import Enum, auto
class CustomEnum(Enum):
def _generate_next_value_(name, start, count, last_values):
return (count + 1) * 10
class Scale(CustomEnum):
LOW = auto()
MEDIUM = auto()
HIGH = auto()
print(Scale.LOW.value) # Output: 10
print(Scale.MEDIUM.value) # Output: 20
print(Scale.HIGH.value) # Output: 30
Example 2: Use Enum Member Names as Values
Make auto()
assign the enum member name as its value (as a string):
from enum import Enum, auto
class StringEnum(Enum):
def _generate_next_value_(name, start, count, last_values):
return name
class Animal(StringEnum):
DOG = auto()
CAT = auto()
BIRD = auto()
print(Animal.DOG.value) # Output: DOG
print(Animal.CAT.value) # Output: CAT
print(Animal.BIRD.value) # Output: BIRD
Example 3: Generate Tuples Assign tuples with a counter and the name:
from enum import Enum, auto
class TupleEnum(Enum):
def _generate_next_value_(name, start, count, last_values):
return (count + 1, name)
class Status(TupleEnum):
NEW = auto()
IN_PROGRESS = auto()
COMPLETED = auto()
print(Status.NEW.value) # Output: (1, 'NEW')
print(Status.IN_PROGRESS.value) # Output: (2, 'IN_PROGRESS')
print(Status.COMPLETED.value) # Output: (3, 'COMPLETED')
Summary of auto()
:
-
Default Behavior: Sequential integers starting from
1
. -
Custom Behavior: You can override
_generate_next_value_
to define custom value generation logic. -
Common Use Cases:
- Simple sequential numbering.
- Using member names as values.
- Generating custom data structures (e.g., strings, tuples, or calculated values).
With the absolute tolerence we can change the +- value and compare how close the numbers are
abs_tol is value based
With the relative tolerance we say that this much percentage tolernce is allowed above that its False.
rel_tol is percentage based
The looping is based on the index so if any modification happens to the list while looping it will affect the loops.
Variable named with __
double underscore in the beginning is only accessbile with that name inside that class only. Not even the subclass or the class object can access them. The only other it can be accessed it via the manled name.
_ClassName__varname
The reason for using the double underscore
- To make that variable only accessible inside the class alone
- To avoid the name clashes while doing the multiple inheritances
- To name it as protected.
- Accessible inside the class, sub class, using objects and class
- Just the hint to dev that its for internal purpose but not enforce.
- import * won't import the var or func or class starts with _.
- Double underscore is for private
- Accessible directly inside the class
- Name mangling to prevent clash.
- Can be accessed using the mangled name
When you use the import * syntax, Python looks for an all list in the module. This list specifies the names of objects that should be imported when the * is used. If all is not defined, Python will import everything that doesn't start with an underscore (_), but will skip items that begin with a single underscore.
Here's a breakdown: Without all defined:
Python will import all variables, functions, or classes that do not start with an underscore. It will skip variables or methods that start with a single underscore (e.g., _internal_var, _helper_function, etc.). With all defined:
Python will import only the names explicitly listed in the all list, regardless of whether they start with an underscore or not.
The eval() function in Python evaluates a string as a Python expression and returns the result of that evaluation.
(function) def eval(
source: str | ReadableBuffer | CodeType,
globals: dict[str, Any] | None = None,
locals: Mapping[str, object] | None = None,
/
) -> Any
Evaluate the given source in the context of globals and locals.
- The source may be a string representing a Python expression or a code object as returned by compile().
- The globals must be a dictionary and locals can be any mapping, defaulting to the current globals and locals.
- If only globals is given, locals defaults to it.
x = 10
result = eval("x + 5") # result will be 15
Why Do We Need eval()?
eval() can be useful in situations where:
- Dynamic Evaluation: You need to execute Python code that's constructed as a string at runtime. Example: In interactive interpreters or applications where users can provide input that needs to be evaluated.
- Flexible Logic: It allows dynamic expression parsing and computation without hardcoding all possible cases. Example: Custom formula calculations or user-defined mathematical expressions in applications.
- Shortcuts for Prototyping: It can simplify certain dynamic code requirements during the prototyping phase.
Risks and Considerations eval() can be extremely dangerous if not used carefully, especially with untrusted input, due to the following reasons:
- Security Vulnerability: It can execute arbitrary code, leading to severe consequences like data leaks or system compromise.
eval("__import__('os').system('rm -rf /')") # Example of malicious code.
- Unintended Side Effects: It can alter program state, delete files, or perform unintended operations.
- Performance Overhead: Parsing and executing strings dynamically is slower than native code execution.
Example of Safe eval() Usage
allowed_globals = {"__builtins__": None}
allowed_locals = {"x": 10, "y": 20}
expression = "x + y"
try:
result = eval(expression, allowed_globals, allowed_locals)
print(result) # Output: 30
except Exception as e:
print(f"Error: {e}")
Use Alternatives When Possible
Use ast.literal_eval() for evaluating strings containing only literals (numbers, strings, tuples, lists, etc.). It is safer than eval().
import ast
result = ast.literal_eval("[1, 2, 3]") # Safely evaluates a list literal.
The exec() function in Python dynamically executes Python code. Unlike eval(), which evaluates a single expression, exec() can execute entire blocks of code, including statements, definitions, loops, and imports.
(function) def exec(
source: str | ReadableBuffer | CodeType,
globals: dict[str, Any] | None = None,
locals: Mapping[str, object] | None = None,
/,
*,
closure: tuple[CellType, ...] | None = None
) -> None
Execute the given source in the context of globals and locals.
- The source may be a string representing one or more Python statements or a code object as returned by compile().
- The globals must be a dictionary and locals can be any mapping, defaulting to the current globals and locals.
- If only globals is given, locals defaults to it.
- The closure must be a tuple of cellvars, and can only be used when source is a code object requiring exactly that many cellvars.
code = """
x = 10
y = 20
result = x + y
"""
exec(code)
print(result) # Output: 30
Why Do We Need exec()? exec() is useful in situations where:
- Dynamic Code Execution: You need to run dynamically generated or loaded Python code. Example: Executing a Python script stored in a database or user-provided script.
- Flexible Program Design: It enables execution of Python code defined outside the program at runtime. Example: Plugins or runtime scripting for applications.
- Prototyping and Testing: Allows for quick testing of dynamically generated Python code.
Best Practices for Using exec()
- Avoid Untrusted Input: Never pass unvalidated user input to
exec()
. - Use a Restricted Environment: Limit access to sensitive resources using controlled
globals
andlocals
. Disable built-in functions by passing{"__builtins__": None}
.
Limitations of exec()
-
Lack of Isolation: Code executed with exec() runs in the same interpreter, affecting or being affected by global and local states.
-
Complexity in Debugging: Tracing errors and bugs in dynamically executed code can be challenging.
-
Performance: exec() incurs runtime overhead and is generally slower than pre-compiled or static code.
-
No Return Value: Unlike
eval()
, which returns the result of the expression,exec()
returnsNone
. Any results must be extracted via thelocals
orglobals
dictionaries.
How to Overcome the Limitations
-
Restricted Execution Environment: Use restricted globals and locals to control variable and resource access.
-
Compile for Validation: Use compile() to pre-compile code for syntax validation before executing.
code = """
x = 10
y = 20
result = x + y
"""
compiled_code = compile(code, '<string>', 'exec')
exec(compiled_code)
print(result) # Output: 30
Safe Example of exec() Usage
safe_globals = {"__builtins__": None}
safe_locals = {"x": 10, "y": 20}
code = """
result = x + y
"""
try:
exec(code, safe_globals, safe_locals)
print(safe_locals["result"]) # Output: 30
except Exception as e:
print(f"Error: {e}")
Use eval() when:
You need to evaluate a single Python expression and return its value.
Example: result = eval("x + y", {"x": 10, "y": 20})
Use exec() when:
You need to execute more complex or multi-line Python code, such as loops, function definitions, or imports.
code = """
def greet():
return "Hello, World!"
message = greet()
"""
exec(code)
print(message) # Output: Hello, World!
Both functions should be used cautiously and only when necessary, with a strong emphasis on security and input validation.
When two instances of a class are compared using the equality operator (==
) and the class does not implement the __eq__
method, Python falls back to the default behavior inherited from the object class. Here's what happens:
- Identity Comparison:
By default, ==
checks whether the two instances are the same object in memory. This is equivalent to using the is
operator.
If the two instances have the same memory address (i.e., they are the same object), the comparison returns True. Otherwise, it returns False.
- No Attribute-Based Comparison:
Python does not compare the attributes of the instances unless __eq__
is explicitly implemented.
- Fallback to is:
Essentially, without a custom __eq__
, a == b
is the same as a is b
.
class MyClass:
pass
obj1 = MyClass()
obj2 = MyClass()
# Comparing two different instances
print(obj1 == obj2) # Output: False (default is identity comparison)
# Assigning one reference to another
obj3 = obj1
print(obj1 == obj3) # Output: True (same object in memory)
How to Customize Equality Behavior
To customize equality comparison, you can implement the eq method in your class. This allows you to define specific rules for comparing instances based on their attributes or other logic.
class MyClass:
def __init__(self, value):
self.value = value
def __eq__(self, other):
if isinstance(other, MyClass):
return self.value == other.value
return False
obj1 = MyClass(10)
obj2 = MyClass(10)
obj3 = MyClass(20)
print(obj1 == obj2) # Output: True (values are equal)
print(obj1 == obj3) # Output: False (values are different)
Key Points to Remember
- Default Behavior:
Without __eq__
, ==
checks for object identity (whether they are the same object in memory).
This is equivalent to a is b
.
- Customizing Comparison:
Implementing __eq__
allows you to define equality based on attributes or logic.
- Implementation Best Practices:
Ensure __eq__
handles comparisons with objects of other types gracefully.
Typically, you should implement __eq__
alongside __hash__
if instances of the class are to be used as dictionary keys or in sets. This ensures consistent behavior.
Lookups in Python's dict
are incredibly fast because of its underlying hash table implementation. Here’s a step-by-step detailed explanation of how lookups work under the hood in a Python dictionary.
1. Hash Table Basics
A hash table is the data structure behind Python’s dict
. It:
- Maps keys to values using hashing.
- Organizes data in an array-like structure where each index is a "bucket."
- Uses a hash function to calculate an index based on the key.
Key Terms:
-
Hash Function: Computes a hash value for a key. Python uses the built-in
hash()
function for this. - Bucket: A slot in the hash table where key-value pairs are stored.
- Collision: When two different keys produce the same hash value, requiring additional logic to distinguish them.
2. Anatomy of a Python Dictionary
-
A
dict
is backed by a dynamic array of fixed-size slots. -
Each slot holds:
- The hash of the key.
- The key itself.
- The associated value.
-
This combination ensures that even in cases of hash collisions, Python can check for equality (
__eq__
) to find the right key.
3. The Lookup Process
Let’s break down how Python finds the value for a given key in a dict
:
Step 1: Compute the Hash
- Python calls the
hash()
function on the key to compute its hash value. - Example:
key = "example" hash_value = hash(key) # Example output: 123456789
Step 2: Map the Hash to a Bucket Index
- The hash value is converted to a bucket index using the modulo operation:
index = hash_value % len(hash_table)
- This ensures the hash value fits within the size of the hash table.
Step 3: Probe the Bucket
- Python looks at the computed index in the hash table.
- If the bucket contains the hash, key, and value, and the key matches, the lookup succeeds.
Step 4: Handle Collisions
- If there’s a collision (two keys hash to the same index), Python resolves it:
- Python uses open addressing with probing:
- It probes subsequent buckets (linear, quadratic, or another strategy) to find an empty or matching slot.
- During probing, Python compares the hash and key (
__eq__
) to ensure it matches the desired key.
- Python uses open addressing with probing:
Step 5: Return the Value
- Once the correct key is found, Python retrieves the associated value.
4. Example Walkthrough
# Create a dictionary
my_dict = {"name": "Alice", "age": 25}
# Lookup: my_dict["name"]
key = "name"
-
Compute the Hash:
hash("name") # Example hash value: 452398476
-
Map to a Bucket: Suppose the hash table size is 8:
index = 452398476 % 8 # Example index: 4
-
Probe the Bucket:
- Python checks the bucket at index
4
:- If it finds
("name", hash("name"), "Alice")
, the lookup succeeds.
- If it finds
- Python checks the bucket at index
-
Resolve Collisions (if necessary):
- If there’s a collision at index
4
:- Python probes nearby buckets (e.g., index
5
,6
, etc.) until it finds the matching key or an empty slot.
- Python probes nearby buckets (e.g., index
- If there’s a collision at index
-
Return the Value:
- Once the correct key is found, Python retrieves
"Alice"
.
- Once the correct key is found, Python retrieves
5. Handling Key Errors
If the key doesn’t exist:
- Python probes the hash table as described but doesn’t find the key.
- A
KeyError
is raised.
my_dict = {"name": "Alice"}
my_dict["age"] # Raises KeyError
6. Hash Collisions and Probing
What is a Collision?
- Two distinct keys may produce the same hash value.
- Example:
key1 = "abc" key2 = "xyz" hash(key1) == hash(key2) # Possible, though rare
Collision Resolution in Python
- Python resolves collisions using open addressing:
- It searches for the next available bucket when a collision occurs.
7. Hash Table Resizing
When the dictionary grows too full, Python resizes the hash table to maintain efficiency.
Resizing Process:
- A new, larger hash table is allocated (typically double the size).
- All existing key-value pairs are rehashed and moved to the new table.
- This ensures lookups remain fast.
Load Factor:
- Python resizes the hash table when the load factor (ratio of items to table size) exceeds a threshold (usually ~2/3 full).
8. Efficiency
-
Best Case:
- Lookup, insertion, and deletion are
O(1)
on average.
- Lookup, insertion, and deletion are
-
Worst Case:
- In the rare event of many collisions, lookup can degrade to
O(n)
, but this is mitigated by good hash functions and resizing.
- In the rare event of many collisions, lookup can degrade to
9. Advantages of Python's Dict Implementation
-
Speed:
- Lookups, insertions, and deletions are incredibly fast on average (
O(1)
).
- Lookups, insertions, and deletions are incredibly fast on average (
-
Dynamic Resizing:
- Ensures performance is maintained even as the dictionary grows.
-
Collision Handling:
- Efficiently manages hash collisions with open addressing and probing.
-
Memory Efficiency:
- Combines hash values, keys, and values in a compact structure.
Summary of Lookup Workflow
- Compute the hash of the key using
hash()
. - Map the hash to a bucket index using
hash % table_size
. - Probe the bucket for the matching key and value.
- Resolve collisions if needed.
- Return the value if found or raise a
KeyError
if not.
This design makes Python's dict
one of the most efficient and versatile data structures for key-value storage.
Why Use hash % table_size
to Map to a Bucket Index?
The %
operator (modulo) is used in hash table implementations to ensure that a hash value is mapped to a valid index within the bounds of the hash table. This approach, known as modulo hashing, is both simple and effective for distributing keys across buckets. Let’s break it down in detail:
1. Why Do We Need to Map Hash Values to Buckets?
The hash()
function in Python generates a large integer (possibly negative) based on the key. However:
- The hash table (an array-like structure) has a finite size, typically much smaller than the range of possible hash values.
- To place the key-value pair into the hash table, the large hash value must be reduced to fit within the table's size (e.g., an array of size
n
).
The challenge:
- We need to map the potentially huge range of hash values to the limited range
[0, table_size - 1]
.
2. How Modulo Operation Helps
The modulo operator %
computes the remainder of a division. When applied to a hash value:
index = hash_value % table_size
- This operation guarantees that the resulting index is always between
0
andtable_size - 1
, ensuring it fits within the array bounds. - For example:
- If
hash_value = 12345
andtable_size = 8
, thenindex = 12345 % 8 = 1
. - If
hash_value = -12345
andtable_size = 8
, then Python adjusts the result to keep the index positive (e.g.,index = 3
).
- If
3. Why Modulo Specifically?
Uniform Distribution
- Modulo distributes hash values uniformly across the available buckets, minimizing clustering.
- Even if the hash values are large or arbitrary,
% table_size
ensures a fair spread of keys.
Efficiency
- The modulo operation is computationally inexpensive, making it ideal for real-time lookups.
- It avoids more complex or slower methods of mapping hash values to indices.
Wrap-Around Behavior
- Modulo inherently "wraps around" the range of possible indices. For example:
- If
table_size = 8
, valid indices are[0, 1, 2, 3, 4, 5, 6, 7]
. Any hash value is reduced into this range.
- If
4. An Example Walkthrough
Let’s consider a hash table with 8 buckets (table_size = 8
):
Keys and Their Hash Values
keys = ["apple", "banana", "cherry"]
hash_values = [hash("apple"), hash("banana"), hash("cherry")]
Calculating Bucket Indexes
bucket_indices = [hash(key) % 8 for key in keys]
Suppose the hash values are:
hash("apple") = 12345
hash("banana") = 67890
hash("cherry") = 13579
The modulo operation maps these to indices:
-
12345 % 8 = 1
→ "apple" is placed in bucket 1. -
67890 % 8 = 2
→ "banana" is placed in bucket 2. -
13579 % 8 = 7
→ "cherry" is placed in bucket 7.
5. Why Not Use Other Methods Instead of Modulo?
Direct Hash Value Using the raw hash value as the index would not work because:
- Hash values can be arbitrarily large (or negative), so they would exceed the bounds of the table.
- It would lead to excessive memory usage if the table had to accommodate all possible hash values.
Multiplication or Bitwise Operations While alternatives like multiplication or bitwise masking are also used in some hash table implementations (e.g., power-of-2 sizes), modulo is straightforward and versatile. However:
-
Bitwise AND (
&
) can replace%
if the table size is a power of 2 (e.g.,table_size = 8
):This is faster than modulo but less flexible because it requires a power-of-2 table size.index = hash_value & (table_size - 1)
6. Limitations of Modulo Hashing
-
Clustering
- If
table_size
is poorly chosen (e.g., not a prime number), certain patterns in hash values can cause uneven bucket distribution (e.g., many keys end up in the same bucket). - To mitigate this, many hash table implementations choose a table size that is a prime number.
- If
-
Performance on Resize
- When the table grows or shrinks, all hash values need to be re-mapped to new bucket indices.
7. A Deeper Dive into Python’s Implementation
Python’s dictionary implementation is slightly more sophisticated:
- Instead of directly using
hash % table_size
, Python uses additional techniques (like a mask for power-of-2 table sizes) to improve performance and minimize clustering. - The hash table size is dynamically adjusted as the dictionary grows, maintaining a load factor below a threshold (e.g., ~2/3 full).
8. Summary
Using hash % table_size
ensures:
- Index Validity: The resulting index is always within the bounds of the hash table.
- Uniformity: It spreads keys evenly across buckets, minimizing collisions.
- Simplicity: The modulo operation is efficient and easy to implement.
Without %
, the hash value would either:
- Be too large to fit in the hash table.
- Cause indexing errors or memory wastage.
By using % table_size
, Python ensures that its dictionaries remain fast, efficient, and reliable for key lookups.
Yes, exactly! When storing a key-value pair in a Python dictionary, the same hashing and modulo calculation is performed to determine the index (bucket) where the pair will be stored. Let’s break this process down step by step:
1. Storing a Key-Value Pair in a Python Dictionary
Suppose you want to store a key-value pair like this:
my_dict["name"] = "Alice"
Step-by-Step Process
-
Compute the Hash of the Key
- Python computes the hash of the key using the
hash()
function. - For example:
key = "name" hash_value = hash(key) # Example output: 123456789
- Python computes the hash of the key using the
-
Calculate the Bucket Index
- Using the modulo operation, the hash value is mapped to a bucket index in the hash table:
index = hash_value % table_size
- If the hash table has 8 buckets (
table_size = 8
), the calculation might look like this:index = 123456789 % 8 # Example output: 5
- Using the modulo operation, the hash value is mapped to a bucket index in the hash table:
-
Store the Key-Value Pair in the Bucket
- Python checks the bucket at the calculated index (
index = 5
). - If the bucket is empty:
- The key-value pair is stored directly in the bucket.
- If the bucket is already occupied (due to a hash collision):
- Python uses probing (e.g., linear probing or quadratic probing) to find the next available slot.
- The key and its hash are stored along with the value to allow proper collision resolution later.
- Python checks the bucket at the calculated index (
2. Example Walkthrough
Let’s store multiple key-value pairs in a dictionary:
my_dict = {}
my_dict["name"] = "Alice"
my_dict["age"] = 25
my_dict["city"] = "New York"
Key-Value Pair: "name": "Alice"
-
Compute the Hash:
hash("name") = 123456789
-
Calculate the Index:
index = 123456789 % 8 # Example table_size = 8 index = 5
-
Store in Bucket:
- The pair
("name", hash("name"), "Alice")
is stored in bucket5
.
- The pair
Key-Value Pair: "age": 25
-
Compute the Hash:
hash("age") = 987654321
-
Calculate the Index:
index = 987654321 % 8 index = 1
-
Store in Bucket:
- The pair
("age", hash("age"), 25)
is stored in bucket1
.
- The pair
Key-Value Pair: "city": "New York"
-
Compute the Hash:
hash("city") = 567890123
-
Calculate the Index:
index = 567890123 % 8 index = 7
-
Store in Bucket:
- The pair
("city", hash("city"), "New York")
is stored in bucket7
.
- The pair
3. Handling Collisions While Storing
When two keys produce the same hash value or map to the same index, Python uses collision resolution techniques to ensure that both key-value pairs can coexist.
Collision Example
If we add a new key "job"
with hash("job") = 123456789
, it would map to the same index as "name"
(index = 5
).
Steps to Handle the Collision:
- Python detects that bucket
5
is already occupied. - It compares the new key's hash (
hash("job")
) with the existing hash in the bucket. - If the hashes differ, Python probes (checks nearby buckets) to find the next available slot.
- The new pair
("job", hash("job"), "Engineer")
is stored in a different bucket.
4. Why Hash Calculation Happens Both During Storage and Lookup
The same hashing and modulo operation are used for both storage and lookup to ensure consistency:
- During Storage: The hash and modulo calculation determine the bucket where the key-value pair is stored.
- During Lookup: The hash and modulo calculation are repeated to find the same bucket and locate the value.
This consistency is key to the efficiency of hash tables.
5. Dynamic Resizing of the Hash Table
When the dictionary becomes too full:
- Python allocates a larger hash table (usually doubling its size).
- All existing key-value pairs are rehashed and redistributed into the new table.
This resizing ensures that:
- Collisions remain rare.
- Performance remains efficient (
O(1)
average time complexity for operations).
6. Summary
-
During Storage:
- Compute the hash of the key.
- Map the hash to a bucket index using
% table_size
. - Store the key, its hash, and the value in the bucket.
- Resolve collisions if needed.
-
During Lookup:
- The same hash and index calculations are repeated to locate the correct bucket.
This approach ensures fast and reliable storage and retrieval of key-value pairs in Python dictionaries.
When two keys hash to the same index due to a hash collision (i.e., different hash values but the same bucket index from hash % table_size
), Python uses a strategy called open addressing with probing to resolve the collision. Let’s break down how this works during storage and retrieval:
1. Storing Key-Value Pairs with Collisions
Example Scenario:
- Hash table size (
table_size
) = 8. - Key
1
:hash(1) = 12345
index = hash(1) % 8 = 3
- Stored in bucket
3
.
- Key
2
:hash(2) = 23456
-
index = hash(2) % 8 = 3
(collision with key1
).
How Python Resolves the Collision During Storage:
-
Detect the Collision:
- When trying to store
key 2
, Python finds bucket3
already occupied (bykey 1
).
- When trying to store
-
Probe for the Next Free Bucket:
- Python probes the next bucket (index
4
, then5
, etc.) until it finds an empty bucket. - The probing strategy is typically linear probing (incrementing the index) or quadratic probing (skipping more buckets as the probing progresses).
- Let’s assume Python finds bucket
5
is free.
- Python probes the next bucket (index
-
Store the Key-Value Pair in the Free Bucket:
- Python stores
("key 2", hash(2), value_2)
in bucket5
.
- Python stores
2. Retrieving Key-Value Pairs After a Collision
Now, when retrieving key 2
, Python must locate it despite the collision. The process involves probing again to find the correct key.
How Python Resolves Collisions During Retrieval:
-
Start at the Calculated Index:
- Python computes the hash and initial index for
key 2
:index = hash(2) % 8 = 3
- Python checks bucket
3
.
- Python computes the hash and initial index for
-
Check the Key in the Bucket:
- Bucket
3
contains("key 1", hash(1), value_1)
. - Python compares
key 2
with the key stored in bucket3
:- The hash (
hash(2)
) does not match the stored hash (hash(1)
), or - The keys are not equal (
key 2 != key 1
).
- The hash (
- Bucket
-
Probe the Next Bucket:
- Python continues probing to the next bucket (
index 4
,index 5
, etc.). - At bucket
5
, Python finds:- The hash matches (
hash(2)
) and the keys are equal (key 2 == key 2
).
- The hash matches (
- Python continues probing to the next bucket (
-
Return the Value:
- Python retrieves the value associated with
key 2
.
- Python retrieves the value associated with
3. Why This Works:
-
During both storage and retrieval, Python uses the same probing sequence:
- The initial index is always calculated as
hash % table_size
. - If a collision occurs, Python probes the table in a predictable order.
- The initial index is always calculated as
-
This ensures consistency:
- The same key always maps to the same index and follows the same probing path.
4. Handling Multiple Collisions
If there are multiple collisions (e.g., keys 1
, 2
, and 3
all hash to index 3
):
- Python probes further to find empty buckets for storage.
- During retrieval, it probes until the correct key is found or an empty bucket is encountered (indicating the key does not exist).
5. Efficiency of Probing
-
Average Case: Probing is efficient when the hash table is sparsely populated. Most lookups resolve in a few probes (
O(1)
on average). -
Worst Case: If the table is densely populated, multiple collisions can lead to longer probing chains (
O(n)
in the worst case). To mitigate this:- Python resizes the table when it gets too full, ensuring a low load factor (typically ≤ 2/3).
6. Summary
- When a collision occurs during storage:
- Python detects it and probes for the next available bucket to store the new key-value pair.
- During retrieval:
- Python starts at the calculated index and probes in the same sequence until the correct key is found or an empty bucket is reached.
- This method ensures that all keys are retrievable even when collisions occur, while maintaining consistent performance through resizing and efficient probing strategies.
You're absolutely correct! While Python dictionaries are designed to achieve O(1) average time complexity for lookups, their actual performance depends on how densely the hash table is populated and the presence of collisions. This means that:
- Lookups are average-case O(1) when collisions are rare or minimal.
- Lookups can degrade to O(1 + k) where ( k ) is the number of probes required to find the desired key in the case of collisions.
Let’s explain this in more detail:
1. How Probing Affects Lookup Time
When a collision occurs during key lookup:
- The hash value of the key is used to calculate the initial index with
hash % table_size
. - If the key is not in the bucket at that index, Python probes (checks other buckets) to locate the key.
- The number of probes required is determined by the collision resolution strategy (e.g., linear probing, quadratic probing).
Key Insight: The lookup time becomes ( O(1 + k) ), where ( k ) is the number of probes required to find the key. The value of ( k ) depends on:
- The number of collisions.
- The load factor of the hash table.
2. Factors Affecting ( k ):
a. Load Factor The load factor is the ratio of the number of entries in the dictionary to the total number of buckets in the hash table: [ \text{Load Factor} = \frac{\text{Number of Entries}}{\text{Table Size}} ]
- Low Load Factor: Fewer collisions, ( k ) is small, performance is close to O(1).
- High Load Factor: More collisions, ( k ) increases, performance degrades.
Python resizes the hash table when the load factor exceeds a certain threshold (typically ( \sim2/3 )), keeping ( k ) small.
b. Quality of the Hash Function
- A good hash function ensures an even distribution of keys across the table, reducing clustering and minimizing ( k ).
- Poor hash functions may produce many collisions, increasing ( k ).
c. Probing Strategy
-
Linear Probing: ( k ) grows linearly with the number of collisions.
- For example: If index 3 is full, check 4, then 5, etc.
-
Quadratic Probing: ( k ) grows quadratically, spreading probes out to reduce clustering.
- For example: If index 3 is full, check ( 3+1^2, 3+2^2, 3+3^2, \ldots ).
- Double Hashing: Uses a second hash function to determine the probing step size, reducing ( k ).
3. Best Case vs. Worst Case Lookup Time
Best Case (O(1)):
- The key maps directly to a bucket with no collisions.
- The value is retrieved immediately.
Worst Case (O(1 + n)):
- Every key hashes to the same index, causing ( n ) collisions.
- Probing traverses all buckets in the table.
- Python mitigates this by resizing the hash table when the load factor becomes too high.
4. Python's Optimizations to Keep ( k ) Small
Python dictionaries are highly optimized to keep ( k ) low:
-
Dynamic Resizing:
- When the table gets too full, it is resized (typically doubled in size), and all keys are rehashed and redistributed.
- This keeps the load factor low, reducing collisions.
-
Sparse Tables:
- Python deliberately leaves the table sparsely populated (load factor ( \leq 2/3 )) to minimize collisions and ( k ).
-
Hash Function Design:
- Python's
hash()
function ensures good key distribution to reduce clustering.
- Python's
-
Efficient Probing:
- Python uses a combination of techniques (including linear and secondary probing) to efficiently resolve collisions.
5. Practical Implications
While theoretical performance can degrade to ( O(n) ) in the worst case (e.g., extremely poor hash function or high load factor), Python's dictionary implementation ensures that:
- ( k ) remains very small in most practical scenarios.
- Lookup performance stays close to ( O(1) ) for well-designed use cases.
6. Summary
Yes, dictionary lookups in Python can be O(1 + k), where ( k ) is the number of probes required in case of collisions. However:
- Python's design keeps ( k ) small by maintaining a low load factor and using a good hash function.
- In practice, Python dictionaries are extremely efficient and perform close to ( O(1) ) in almost all cases.
The outputs of the code:
print(round(7.5)) # 8
print(round(6.5)) # 6
Explanation:
Python's built-in round()
function rounds numbers to the nearest integer. When the fractional part of a number is exactly 0.5, Python follows a specific rule to break the tie: it rounds to the nearest even integer. This rule is called "round half to even" or "bankers' rounding" and is used to reduce rounding bias in large datasets.
-
round(7.5)
:- The fractional part is 0.5.
- The nearest integers are 7 and 8.
- Since 8 is even, Python rounds 7.5 to 8.
-
round(6.5)
:- The fractional part is 0.5.
- The nearest integers are 6 and 7.
- Since 6 is even, Python rounds 6.5 to 6.
This behavior ensures that rounding errors are distributed evenly rather than consistently rounding up or down.
Instead of the traditional method, Python uses what's known as “Banker's Rounding” or “round half to even”. In this method, if a number is equidistant between two others, it rounds to the nearest even number
from dataclasses import dataclass, field
@dataclass
class Person:
name: str
age: int
height: float
weight: float
# vip: bool = False # default can be provided like this as well for the immutable data types
vip: bool = field(default=False)
friends_list: list[str] = field(default_factory=list) # default value for the mutable data types must be provided like this
p1 = Person("Logesh", 24, 6, 70, True, ['all', 'in', 'all'])
print(p1)
Ex: this allows for initializing field values that depend on one or more other fields. For example:
**The __post_init__
run only once after the __init__
.
Init-only variables
If a field is an InitVar, it is considered a pseudo-field called an init-only field. As it is not a true field, it is not returned by the module-level fields() function. Init-only fields are added as parameters to the generated init() method, and are passed to the optional post_init() method. They are not otherwise used by dataclasses.
Initvar is meant to be only used for init not as attributes. Since is_rare is made as Initvar with the default value as 1 and created instance by passing the value as 1000 to is_rare. Getting the value from the instance is still the default value that we assign, not the value that we passed.
Like normal class we can also specify the property decorator in data class
TaskGroup
is a high-level construct in Python's asyncio
library introduced in Python 3.11. It simplifies working with multiple asynchronous tasks by managing their lifecycle in a structured and safe manner, similar to how asyncio.gather
or asyncio.create_task
work but with improved error handling.
Key Features of TaskGroup
-
Structured Concurrency: Tasks are managed as a group, ensuring that their lifecycle is bound to the
TaskGroup
context. - Automatic Cancellation: If one task fails, all other tasks in the group are automatically canceled, ensuring that the program doesn't continue in an inconsistent state.
- Error Propagation: Errors are automatically propagated and aggregated, making debugging easier.
Example: Using TaskGroup
import asyncio
async def task1():
await asyncio.sleep(1)
print("Task 1 done")
async def task2():
await asyncio.sleep(2)
print("Task 2 done")
async def main():
async with asyncio.TaskGroup() as tg:
tg.create_task(task1())
tg.create_task(task2())
asyncio.run(main())
Key Methods
-
create_task(coro)
: Schedules a coroutine as a task within theTaskGroup
.
Exception Handling Example
async def faulty_task():
await asyncio.sleep(1)
raise ValueError("Something went wrong!")
async def main():
try:
async with asyncio.TaskGroup() as tg:
tg.create_task(task1())
tg.create_task(faulty_task())
except Exception as e:
print(f"Caught an exception: {e}")
asyncio.run(main())
Why Use TaskGroup
?
- Simpler Code: Easier management of multiple tasks.
- Better Error Handling: Automatic cancellation of related tasks on failure.
- Cleaner API: More readable and maintainable than manual task management.
The walrus operator (:=), officially known as the assignment expression, was introduced in Python 3.8. It allows you to assign a value to a variable within an expression.
name := expression
Benefits:
- Reduces code length: Combines assignment and evaluation into one step.
- Improves readability: Keeps related expressions together.
- Eliminates repetitive operations: Avoids calculating the same expression multiple times.
without Walrus operator
n = len(a)
if n > 10:
print(f"List is too long ({n} elements, expected <= 10)")
with Walrus operator
if (n := len(a)) > 10:
print(f"List is too long ({n} elements, expected <= 10)")
In Python, the match
case is a control flow statement introduced in Python 3.10, used for pattern matching. It's similar to switch
statements in other languages but is much more powerful and flexible, enabling you to match against complex data structures and conditions.
match subject:
case pattern1:
# code block for pattern1
case pattern2:
# code block for pattern2
case _:
# code block for default case
Key Points
-
match subject
: Thesubject
is the value being tested. -
case pattern:
Patterns are matched against thesubject
. If a pattern matches, the corresponding code block runs. -
Wildcard
_
: The_
is a wildcard that matches anything (acts as a "default" case).
Examples
Basic Example
def http_status(status):
match status:
case 200:
return "OK"
case 404:
return "Not Found"
case 500:
return "Internal Server Error"
case _:
return "Unknown Status"
print(http_status(200)) # Output: OK
print(http_status(403)) # Output: Unknown Status
Matching Data Structures
def process_data(data):
match data:
case {"type": "error", "code": code}:
return f"Error with code {code}"
case {"type": "success", "value": value}:
return f"Success with value {value}"
case _:
return "Unknown data structure"
print(process_data({"type": "error", "code": 404})) # Output: Error with code 404
print(process_data({"type": "success", "value": 42})) # Output: Success with value 42
Nested Structures
def analyze_point(point):
match point:
case (0, 0):
return "Origin"
case (x, 0):
return f"Point on the x-axis at {x}"
case (0, y):
return f"Point on the y-axis at {y}"
case (x, y):
return f"Point at ({x}, {y})"
case _:
return "Not a point"
print(analyze_point((0, 0))) # Output: Origin
print(analyze_point((3, 0))) # Output: Point on the x-axis at 3
Using Guards
Guards allow conditional matching using if
.
def describe_number(num):
match num:
case x if x < 0:
return "Negative number"
case x if x == 0:
return "Zero"
case x if x > 0:
return "Positive number"
print(describe_number(-5)) # Output: Negative number
When to Use match
- For Structured Data: Works great with dictionaries, tuples, lists, and custom objects.
-
Cleaner Alternatives: When many
if-elif
statements make the code less readable. - Complex Matching: When you need to test patterns or conditions beyond basic equality.
This makes match
versatile and suitable for modern, pattern-matching-based programming paradigms.
The success of the case statement is based on two things
- Pattern matching to the case
- [Optional] If we have any condition (guard clause) then pattern matching should happen first and then guard clause is executed. if it success then only case statement get executed.
The variable in the case is first assigned with the value from the match if it macthes the pattern and then it checks the condition
If the default case is matched then all the case variable before that, will get assigned with that value regardless of the condition is success or not but it should match the pattern.
Once the pattern is macthed then only the condition is evaluated (guard condition) if any provided. Else pattern match is enough to make the case execute.
- If the first case is matched successfully then only that case variable will get assigned with the value. The rest is not assigned, since rest of the cases were not executed.
- If the second case condition is success, then first and second case var will get assigned with that match value if the first has matched the pattern but failes the gaurd clause.
Its not always neccessary to use different var name for different case. But if we do so then the match value will be assigned to those vars when they match the pattern (regardless of whether the condition is success or not).
The default case or wildcard case should exists only one. here in the below case default_case:
and case _:
tries to catch the wildcard. So we encounter the error.
SyntaxError: name capture 'default_case' makes remaining patterns unreachable
- We can just play with the patterns
- Here we can concluded that pattern matching should happen then only the value will be assigned to that case var. Pattern match and then condition needs to satisfied if any(if statements we provided) inorder to execute that case.
Memoization is an optimization technique used to speed up programs by storing the results of expensive function calls and reusing those results when the same inputs occur again. In Python, memoization can be implemented manually or by using built-in tools like functools.lru_cache
or or functools.cache
.
How Memoization Works
- Store Results: When a function is called with specific arguments, the result is stored in a data structure (e.g., dictionary).
- Reuse Results: If the function is called again with the same arguments, the stored result is returned instead of recomputing it.
This technique is particularly useful for recursive functions or functions with repeated computations.
Manual Implementation of Memoization
You can manually implement memoization using a dictionary to cache results.
Example: Fibonacci with Manual Memoization
def fib(n, memo={}): # Default argument is a mutable dictionary
if n in memo: # Check if the result is already in the cache
return memo[n]
if n <= 1: # Base cases
return n
# Compute and store the result in the cache
memo[n] = fib(n - 1, memo) + fib(n - 2, memo)
return memo[n]
print(fib(10)) # Output: 55
print(fib(50)) # Output: 12586269025
-
memo
is a dictionary that stores results for specificn
. - When
fib(10)
is called, results for smaller Fibonacci numbers (likefib(9)
,fib(8)
, etc.) are stored inmemo
.
Python provides a built-in way to add memoization using the functools.lru_cache
decorator. It automatically caches function results without requiring manual caching logic.
Example: Fibonacci with lru_cache
from functools import lru_cache
@lru_cache(maxsize=None) # Cache with unlimited size
def fib(n):
if n <= 1: # Base cases
return n
return fib(n - 1) + fib(n - 2)
print(fib(10)) # Output: 55
print(fib(50)) # Output: 12586269025
-
@lru_cache
: Caches results of the decorated function. -
maxsize
: Limits the cache size. Ifmaxsize=None
, the cache can grow indefinitely. - Efficient and Easy: Eliminates the need to manage a manual cache.
Comparison: Manual vs. lru_cache
Feature | Manual Memoization | functools.lru_cache |
---|---|---|
Setup | Requires manual implementation | Simple decorator syntax |
Flexibility | Fully customizable | Limited to lru_cache features |
Cache Size Control | Unlimited or custom logic | Controlled with maxsize
|
Ease of Use | More effort | Easier, concise implementation |
Practical Use Cases for Memoization
-
Recursive Functions:
- Fibonacci numbers
- Factorials
- Dynamic programming problems (e.g., knapsack, longest common subsequence)
-
Expensive Calculations:
- Complex mathematical computations
- Simulations with repeated input parameters
-
Caching in Web Applications:
- Cache database or API query results to avoid redundant calls.
Clearing the Cache
If you need to clear the cache at runtime (e.g., when inputs change or memory needs to be freed), you can use the .cache_clear()
method.
Cache Info
The @lru_cache
and cache
decorator provides the .cache_info()
method, which returns detailed statistics about the cache:
- hits: Number of times a cached result was used.
- misses: Number of times a result had to be computed.
- maxsize: Maximum size of the cache.
- currsize: Current number of items in the cache.
In Python 3.9 and later, functools.cache
is introduced as an alternative to functools.lru_cache
for memoization. It provides simple, unlimited caching without the additional features of lru_cache
, such as limiting cache size.
How cache
Works
- The
@cache
decorator automatically stores the results of function calls in an internal dictionary. - If the function is called again with the same arguments, the stored result is returned directly, skipping the computation.
Comparison: cache
vs. lru_cache
Feature | functools.cache |
functools.lru_cache |
---|---|---|
Cache Size Limit | Unlimited | Customizable (maxsize ) |
Eviction Policy | None | Least Recently Used (LRU) |
Simplicity | Simpler, no parameters | More flexible |
Python Version | Introduced in Python 3.9 | Available since Python 3.2 |
Advantages of cache
- Unlimited caching: No need to worry about cache size.
-
Simple usage: No need to specify parameters like
maxsize
. - Good for Static/Fixed Inputs: Ideal when you know your inputs won't grow excessively.
Example
import time
from functools import lru_cache
@lru_cache(maxsize=None) # Cache with unlimited size
def expensive_func(n):
print('Expensive operation started...')
time.sleep(5)
print('Ended')
return n*4
print(expensive_func(4))
print(expensive_func(4))
print(expensive_func.cache_info())
print(expensive_func(5))
print(expensive_func.cache_info())
# Output
# Expensive operation started...
# Ended
# 16
# 16
# CacheInfo(hits=1, misses=1, maxsize=None, currsize=1)
# Expensive operation started...
# Ended
# 20
# CacheInfo(hits=1, misses=2, maxsize=None, currsize=2)
import time
from functools import cache
@cache
def expensive_func(n):
print('Expensive operation started...')
time.sleep(5)
print('Ended')
return n*4
print(expensive_func(4))
print(expensive_func(4))
print(expensive_func.cache_info())
print(expensive_func(5))
print(expensive_func.cache_info())
# Output
# Expensive operation started...
# Ended
# 16
# 16
# CacheInfo(hits=1, misses=1, maxsize=None, currsize=1)
# Expensive operation started...
# Ended
# 20
# CacheInfo(hits=1, misses=2, maxsize=None, currsize=2)
@cached_property
is a decorator in Python, introduced in Python 3.8, that is used to cache the result of a property method the first time it is accessed. After the first computation, the cached value is returned for all subsequent accesses, without recomputing it. It is part of the functools
module.
How @cached_property
Works
- When the property is accessed for the first time, the decorated method runs, and its result is stored in the instance's dictionary (
__dict__
). - On subsequent accesses, the cached value is retrieved directly from the instance's dictionary, avoiding the overhead of recalculating the value.
Use Case
@cached_property
is useful for:
- Expensive computations that do not change during the object's lifetime.
- Lazily evaluated attributes that you don't want to calculate until needed.
Example: Basic Usage
from functools import cached_property
class Circle:
def __init__(self, radius):
self.radius = radius
@cached_property
def area(self):
print("Calculating area...")
return 3.14159 * self.radius ** 2
circle = Circle(10)
# First access computes and caches the area
print(circle.area) # Output: "Calculating area..." followed by 314.159
# Subsequent access retrieves the cached value
print(circle.area) # Output: 314.159 (no "Calculating area..." message)
Explanation:
- On the first access to
circle.area
, the method runs, and the result is stored incircle.__dict__['area']
. - On subsequent accesses, the cached result is used.
Advantages
- Performance: Reduces repeated computations for attributes accessed multiple times.
- Convenience: Makes it easy to implement lazily evaluated properties.
- Simplifies Code: Eliminates the need to manage manual caching logic.
Clearing or Recomputing Cached Properties
If you want to clear or recompute a cached property, you can delete the cached value from the instance's dictionary using del
.
Example:
del circle.area # Remove the cached value
print(circle.area) # Recomputes and caches again
Example with Mutable Objects If the underlying data that the property depends on changes, you need to clear the cached value manually.
class Person:
def __init__(self, first_name, last_name):
self.first_name = first_name
self.last_name = last_name
@cached_property
def full_name(self):
return f"{self.first_name} {self.last_name}"
person = Person("John", "Doe")
print(person.full_name) # Output: "John Doe"
# Update first_name
person.first_name = "Jane"
# Cached value remains outdated
print(person.full_name) # Output: "John Doe"
# Clear the cache
del person.full_name
print(person.full_name) # Output: "Jane Doe"
Comparison with @property
Feature | @property |
@cached_property |
---|---|---|
Recomputed Every Time | Yes | No (cached after first access) |
Performance | Slower for repeated access | Faster for expensive calculations |
Cache Management | Not applicable | Cached in __dict__ , can be cleared manually |
Limitations
- Immutable Usage: It works best with attributes that do not change.
- Instance-Specific: The cache is stored per instance, not shared among instances.
- Manual Cache Clearing: If the dependent data changes, you need to manage the cache explicitly.
Monkey patching refers to the dynamic modification or extension of a class or module at runtime. It allows you to alter or replace methods, attributes, or behaviors of existing classes or modules without modifying their source code.
While powerful, monkey patching can lead to unexpected behavior if not used carefully, as it changes the original implementation of a class or module globally.
How Monkey Patching Works
Monkey patching in Python is possible because of Python's dynamic nature, which allows you to:
- Add new attributes or methods to existing classes.
- Replace existing methods or functions with new implementations.
Example: Modifying a Class Method
Original Class
class Animal:
def speak(self):
return "I make a sound"
Applying a Monkey Patch
# Original behavior
animal = Animal()
print(animal.speak()) # Output: I make a sound
# Monkey patching the `speak` method
def new_speak(self):
return "I am patched!"
Animal.speak = new_speak
# Behavior after patching
print(animal.speak()) # Output: I am patched!
Here, the speak
method of the Animal
class is replaced with the new_speak
method.
Monkey Patching a Module
You can also modify or extend a module's behavior dynamically.
Example: Monkey Patching the math
Module
import math
# Add a new function to the math module
def square(x):
return x * x
math.square = square
# Use the new function
print(math.square(4)) # Output: 16
Real-World Example: Patching a Library
Example: Patching datetime
for Testing
Suppose you want to simulate a specific date and time for testing purposes.
import datetime
# Original behavior
print(datetime.datetime.now()) # Outputs current date and time
# Monkey patch the `datetime.now` method
class FixedDateTime(datetime.datetime):
@classmethod
def now(cls):
return cls(2020, 1, 1, 12, 0, 0)
datetime.datetime = FixedDateTime
# Behavior after patching
print(datetime.datetime.now()) # Output: 2020-01-01 12:00:00
Advantages of Monkey Patching
- Flexibility: You can dynamically adapt or fix behavior without modifying the original source code.
- Testing: Useful for testing or simulating specific behaviors, such as fixed dates or mocked methods.
- Prototyping: Quick experimentation with changes or enhancements to a class or module.
Disadvantages of Monkey Patching
- Global Impact: Changes apply globally and can lead to unintended side effects.
- Readability: It can make the code harder to understand and debug, as the behavior of the patched class or module differs from its source.
- Fragility: Updates to the original class or module might break the patch.
Best Practices
- Use Sparingly: Avoid monkey patching unless absolutely necessary.
- Scope Patching: If possible, apply monkey patches only within the local scope where needed.
- Document Changes: Clearly document any monkey patches to avoid confusion for other developers.
-
Use Libraries: Consider using libraries like
unittest.mock
for temporary patches during testing instead of permanent monkey patches.
Alternative to Monkey Patching: Mocking
For testing purposes, it's better to use the unittest.mock
module to temporarily replace functionality.
Example with unittest.mock.patch
:
from unittest.mock import patch
import datetime
# Temporarily patch `datetime.datetime.now`
with patch('datetime.datetime.now', return_value=datetime.datetime(2020, 1, 1, 12, 0, 0)):
print(datetime.datetime.now()) # Output: 2020-01-01 12:00:00
# Outside the patch context, the original behavior is restored
print(datetime.datetime.now()) # Outputs the actual current date and time
Conclusion Monkey patching is a powerful but potentially risky technique in Python. While it allows for dynamic changes to existing code, it's important to use it judiciously, document its use, and consider alternatives like mocking or proper subclassing to avoid unintended side effects.
The timeit
module in Python is used for measuring the execution time of small code snippets. It is useful for benchmarking and performance tuning. The module provides two primary functions: timeit()
and repeat()
.
The timeit.timeit()
function measures the time it takes to execute a code snippet repeatedly for a specified number of times.
Syntax
timeit.timeit(stmt='pass', setup='pass', timer=<default_timer>, number=1000000, globals=None)
-
stmt
: The code snippet to time as a string. Defaults to'pass'
. -
setup
: The setup code to run once before timing the main code. Defaults to'pass'
. -
timer
: Timer function to use. Defaults totime.perf_counter
for high precision. -
number
: The number of times to execute the code. Defaults to 1,000,000. -
globals
: A dictionary of global variables accessible tostmt
andsetup
.
Example
import timeit
# Measure the time to execute a simple statement
execution_time = timeit.timeit('sum(range(100))', number=100000)
print(f"Execution time: {execution_time} seconds")
timeit.repeat
The timeit.repeat()
function is similar to timeit.timeit()
, but it runs the timing multiple times and returns a list of results. This is useful for understanding variability in execution time.
Syntax
timeit.repeat(stmt='pass', setup='pass', timer=<default_timer>, repeat=5, number=1000000, globals=None)
-
repeat
: The number of times the timing experiment is repeated. - Other arguments are the same as in
timeit.timeit()
.
Example
import timeit
# Measure the execution time across multiple runs
times = timeit.repeat('sum(range(100))', repeat=5, number=100000)
print(f"Execution times: {times}")
print(f"Best time: {min(times)} seconds")
Difference Between timeit
and repeat
Feature | timeit.timeit |
timeit.repeat |
---|---|---|
Output | Single timing value (float) | List of timing values (list of floats) |
Use Case | Quick timing of a code snippet | Measure timing variability over multiple runs |
Customizability | Number of executions (number ) |
Multiple repetitions of timing experiments (repeat ) |
Using setup
for Imports
When benchmarking code that requires imported modules or setup code, use the setup
parameter.
Example:
import timeit
# Use setup to import a module
execution_time = timeit.timeit('sqrt(100)', setup='from math import sqrt', number=100000)
print(f"Execution time: {execution_time} seconds")
Comparing Two Code Snippets
You can use timeit
to compare the performance of two implementations.
Example: Compare sum()
and Loop
import timeit
# Using sum()
time_sum = timeit.timeit('sum(range(1000))', number=100000)
# Using a loop
time_loop = timeit.timeit('total=0\nfor i in range(1000): total += i', number=100000)
print(f"Using sum(): {time_sum} seconds")
print(f"Using loop: {time_loop} seconds")
Real-World Application
timeit
is ideal for:
- Benchmarking algorithms: Compare performance between different algorithms.
- Optimizing code: Identify bottlenecks and replace slower code with faster alternatives.
- Learning Python internals: Understand how different Python constructs behave in terms of speed.
Using timeit
as a CLI Tool
The timeit
module can also be used directly from the command line.
Example:
python -m timeit "sum(range(100))"
Sample Output:
1000000 loops, best of 5: 0.202 usec per loop
Caveats
- Do not include I/O in timing tests: File operations or print statements can skew results.
- Warm-up time: Ignore initial timing results due to interpreter warm-up effects.
- System load: Run tests in a low-load environment for consistent results.
timeit
and repeat
are versatile tools for performance testing, and understanding them can help in writing more efficient Python code.
file reading is exhaustive. Which means once the file.read() reads the file content we can't use file.read() again to read the content. (Like generator)
The glob
module in Python is used for file and directory path matching. It allows you to search for files and directories based on specific patterns. These patterns can include wildcards like *
(matches any number of characters) and ?
(matches a single character). The module is particularly useful for tasks such as listing files with specific extensions or finding files that match a naming convention.
Commonly Used Functions in glob
-
glob.glob(pattern, recursive=False)
- Returns a list of paths that match the specified pattern.
- If
recursive
is set toTrue
, it can match files in all subdirectories (using the**
wildcard).
import glob # List all Python files in the current directory python_files = glob.glob("*.py") print(python_files) # List all files in subdirectories recursively all_files = glob.glob("**/*", recursive=True) print(all_files)
-
glob.iglob(pattern, recursive=False)
- Similar to
glob.glob()
, but returns an iterator instead of a list.
for file in glob.iglob("*.txt"): print(file)
- Similar to
-
glob.escape(pathname)
- Escapes special characters in a path, making it suitable for literal matching.
special_path = glob.escape("file[1].txt") print(glob.glob(special_path)) # Matches 'file[1].txt' exactly
-
Using Wildcards in Patterns
-
*
: Matches zero or more characters. -
?
: Matches exactly one character. -
[seq]
: Matches any character in the sequence. -
[!seq]
: Matches any character not in the sequence.
# Examples print(glob.glob("data_*.csv")) # Matches files like 'data_1.csv', 'data_abc.csv' print(glob.glob("file?.txt")) # Matches 'file1.txt', 'file2.txt', but not 'file10.txt'
-
Example Use Case
import glob
import os
# List all JPEG and PNG files in a directory
image_files = glob.glob("images/*.[jp][pn]g")
for file in image_files:
print(f"Found image: {os.path.basename(file)}")
about recursive
from glob import glob
print(glob('*.py', recursive=True)) # current dir ./*.py No effect in using recursive=True.
# recursive=True will work as expected only with **
print(glob('**/*.py', recursive=False)) # one lvl nested search ./some_dir/*.py
print(glob('**/*.py', recursive=True)) # infinite lvl nested search ./recursive/all/the/way/to/end/*.py
The glob
module is efficient for pattern-based file searching, but if you need more advanced features, consider using the os
or pathlib
modules alongside it.
Serialization and deserialization
In Python, the pickle
module is used for serializing (converting Python objects into a byte stream) and deserializing (converting byte streams back into Python objects). It is particularly useful for saving Python objects to files or transmitting them over a network.
Key Features of pickle
- Supports Python-specific objects: Can serialize and deserialize complex Python objects, including custom classes, functions, and recursive structures.
- Binary format: Encodes data in a compact binary format, making it faster than text-based formats like JSON.
- Python-specific: Pickle is not cross-language; the serialized data can only be deserialized in Python.
Basic Usage of pickle
-
Importing
pickle
import pickle
-
Serialization (Pickling):
- Converts a Python object into a byte stream.
- Use
pickle.dump()
to write the byte stream to a file orpickle.dumps()
to get it as a bytes object.
Example:
data = {"name": "Alice", "age": 30} # Serialize to a file with open("data.pkl", "wb") as file: pickle.dump(data, file) # Serialize to a bytes object byte_stream = pickle.dumps(data) print(byte_stream)
-
Deserialization (Unpickling):
- Converts a byte stream back into a Python object.
- Use
pickle.load()
to read from a file orpickle.loads()
to deserialize from a bytes object.
Example:
# Deserialize from a file with open("data.pkl", "rb") as file: loaded_data = pickle.load(file) print(loaded_data) # Deserialize from a bytes object loaded_data = pickle.loads(byte_stream) print(loaded_data)
Pickle in Custom Classes
pickle
can also serialize and deserialize instances of custom classes.
Example:
class Person:
def __init__(self, name, age):
self.name = name
self.age = age
def __repr__(self):
return f"Person(name={self.name}, age={self.age})"
# Create an instance
person = Person("Alice", 30)
# Serialize
with open("person.pkl", "wb") as file:
pickle.dump(person, file)
# Deserialize
with open("person.pkl", "rb") as file:
loaded_person = pickle.load(file)
print(loaded_person) # Output: Person(name=Alice, age=30)
Advantages of pickle
- Python-native serialization: Handles Python objects seamlessly, including those not supported by JSON (e.g., sets, tuples, and custom objects).
- Compact binary format: Makes it faster for serialization and deserialization compared to text-based formats.
Disadvantages of pickle
-
Security Risk: Deserializing untrusted data can execute malicious code, leading to security vulnerabilities.
- Avoid loading untrusted pickle files.
- Python-Specific: Pickle files are not portable across languages.
- Backward Compatibility: Changes in class definitions or Python versions can make old pickled files unreadable.
Best Practices
-
Use for internal purposes: Only use
pickle
for applications where data security and cross-language compatibility are not concerns. - Avoid untrusted sources: Always validate the source of a pickle file before unpickling it.
- Consider alternatives for safer serialization: Use formats like JSON for interoperability and simplicity or Protobuf for efficiency and schema validation.
The pickle
module in Python provides several methods for working with serialization and deserialization beyond the basic dump
and load
methods. Here’s a detailed explanation of all the important methods in the pickle
module:
Primary Methods
-
pickle.dump(obj, file, protocol=None, *, fix_imports=True, buffer_callback=None)
- Serializes a Python object
obj
and writes it to a binary file-like objectfile
. -
Parameters:
-
obj
: The Python object to serialize. -
file
: A writable binary file-like object (e.g., a file opened in'wb'
mode). -
protocol
: Optional. The serialization protocol to use (see protocols below).- Default is
HIGHEST_PROTOCOL
(currently 5 in Python 3.8+).
- Default is
-
fix_imports
: Handles compatibility for Python 2 pickles (default:True
). -
buffer_callback
: Used for out-of-band buffers withpickle5
(advanced usage).
-
-
Example:
import pickle data = {"name": "Alice", "age": 30} with open("data.pkl", "wb") as f: pickle.dump(data, f)
- Serializes a Python object
-
pickle.dumps(obj, protocol=None, *, fix_imports=True, buffer_callback=None)
- Serializes a Python object
obj
and returns it as a byte stream (in-memory). - Same parameters as
dump
. -
Example:
byte_stream = pickle.dumps({"name": "Alice", "age": 30}) print(byte_stream)
- Serializes a Python object
-
pickle.load(file, *, fix_imports=True, encoding="ASCII", errors="strict", buffers=None)
- Deserializes a pickled object from a binary file-like object
file
. -
Parameters:
-
file
: A readable binary file-like object (e.g., a file opened in'rb'
mode). -
encoding
: Used for decodingstr
objects (default:"ASCII"
). -
fix_imports
: Same as indump
. -
errors
: Error handling during decoding (default:"strict"
).
-
-
Example:
with open("data.pkl", "rb") as f: data = pickle.load(f) print(data)
- Deserializes a pickled object from a binary file-like object
-
pickle.loads(data, *, fix_imports=True, encoding="ASCII", errors="strict", buffers=None)
- Deserializes a pickled object from a bytes-like object
data
. - Same parameters as
load
. -
Example:
byte_stream = b'\x80\x05\x95...\x94.' data = pickle.loads(byte_stream) print(data)
- Deserializes a pickled object from a bytes-like object
Advanced and Specialized Methods
-
pickle.HIGHEST_PROTOCOL
- The highest protocol version available. It is more efficient and supports the latest features.
- Use this for optimal performance.
-
Example:
pickle.dumps(data, protocol=pickle.HIGHEST_PROTOCOL)
-
pickle.DEFAULT_PROTOCOL
- The default protocol version used by
pickle
(currently 4 in Python 3.4+). - Lower than
HIGHEST_PROTOCOL
to maintain some backward compatibility. -
Example:
pickle.dumps(data, protocol=pickle.DEFAULT_PROTOCOL)
- The default protocol version used by
-
pickle.Pickler
- A class that allows more control over the pickling process.
-
Usage: Create a
Pickler
object for advanced serialization. -
Example:
with open("data.pkl", "wb") as f: pickler = pickle.Pickler(f) pickler.dump({"name": "Alice", "age": 30})
-
pickle.Unpickler
- A class that allows more control over the unpickling process.
-
Usage: Create an
Unpickler
object for advanced deserialization. -
Example:
with open("data.pkl", "rb") as f: unpickler = pickle.Unpickler(f) data = unpickler.load() print(data)
Customizing Serialization with __reduce__
and __setstate__
For custom classes, you can define how they should be pickled and unpickled by implementing special methods.
-
__reduce__
- Defines how an object is reduced to a serializable form.
- Returns a tuple containing:
- A callable to recreate the object.
- Arguments to pass to that callable.
-
Example:
class CustomClass: def __init__(self, name): self.name = name def __reduce__(self): return (CustomClass, (self.name,)) obj = CustomClass("Alice") serialized = pickle.dumps(obj) deserialized = pickle.loads(serialized) print(deserialized.name) # Output: Alice
-
__setstate__
and__getstate__
-
__getstate__
: Defines what data should be serialized. -
__setstate__
: Defines how to restore the state during deserialization. -
Example:
class CustomClass: def __init__(self, name): self.name = name def __getstate__(self): return {"name": self.name.upper()} def __setstate__(self, state): self.name = state["name"].lower() obj = CustomClass("Alice") serialized = pickle.dumps(obj) deserialized = pickle.loads(serialized) print(deserialized.name) # Output: alice
-
Pickle Protocol Versions Pickle supports multiple protocols:
- Protocol 0: Human-readable (ASCII-based), oldest format.
- Protocol 1: Binary format, introduced in Python 2.3.
- Protocol 2: More efficient binary format, introduced in Python 2.3.
- Protocol 3: Introduced in Python 3.0, supports bytes objects.
- Protocol 4: Introduced in Python 3.4, supports larger objects and out-of-band buffers.
- Protocol 5: Introduced in Python 3.8, improves performance for large data.
Important Notes
- Security Warning: Never unpickle data from untrusted sources. It can execute arbitrary code.
- Cross-Version Compatibility: Pickled data may not always be compatible across Python versions, especially if the data involves custom objects.
- Alternatives: For safer or cross-language serialization, consider using JSON, MessagePack, or Protobuf.
noqa = NO-QA (NO Quality Assurance)
It's generally used in Python code to ignore PEP8 warnings.
Lines with #noqa at the end will be ignored by linter programs and won't raise any warnings.
In Python, # noqa is a comment directive that tells linters and code style checkers like flake8 and pycodestyle to ignore any errors or warnings on that particular line.
x = 1+2 # noqa
Specific error codes: You can also specify which specific error codes you want to ignore:
x = 1+2 # noqa: E226
When to use it:
- When you have a good reason to deviate from PEP8:
Sometimes, adhering strictly to style rules can make your code less readable. In such cases, you can use # noqa to suppress the warning.
- When dealing with third-party code:
If you're using a library that doesn't follow PEP8, you can use # noqa to avoid getting flooded with warnings.
- Temporary fixes:
If you need a quick fix that doesn't follow PEP8, you can use # noqa as a temporary solution until you can refactor the code properly.