PyPa - sgml/signature GitHub Wiki

Idioms

Passing

Use pass in an if/else block as a placeholder when you need to distinguish between a function and a procedure

Linting

Use a minimum set of things to exclude from pylint, for example: #pylint: disable=no-member, invalid-name, line-too-long

Logging

Log all method locals and method args

String Concat

Use ''.join(foo, bar) instead of foo + bar to distinguish string manipulation from arithmetic and for easier portability to other languages

Pattern Matching

Use polymorphism to implement the multiple dispatch pattern:

class Animal:
    def speak(self):
        pass

class Dog(Animal):
    def speak(self):
        return "Woof!"

class Cat(Animal):
    def speak(self):
        return "Meow!"

def make_animal_speak(animal: Animal):
    return animal.speak()

Use the split and replace methods in tandem multiple times for simple pattern matching. For example:

# Split the connection string into components
# example:
# postgresql://fakeuser:[email protected]:5432/fakedbname
_, username, password_and_host, port_and_dbname = self.cstring.split(':')
username = username.replace("//","")
password, host = password_and_host.split('@')
port, dbname = port_and_dbname.split('/')
# Copy password to paste during interactive prompt

Use the RegExp module for complex strings and JSONPath(https://pypi.org/project/jsonpath/) and JSONPointer (https://pypi.org/project/jsonpointer/) for everything else to avoid complex if/else/elif or case logic

Boxing

Use autoboxing via the box module(https://pypi.org/project/python-box/) to normalize access to nested key/value pairs

Use getattr to access methods within the box-python lib(https://github.com/cdgriffith/Box/blob/master/test/test_box.py#L856)

Type Checking

Use multiple dispatch(https://pypi.org/project/multipledispatch/) instead of if/else/elif or case logic to handle method calls which need to handle both structured and unstructured data

Safe Subset

Use a safe subset

State

Use the truths module to encapsulate boolean logic:

Data

Use adapters to maintain API conpatibility layers

Use featuretools to generate mock data

from truths import Truths

# Define your Boolean expressions
expressions = ['(a and b)', 'a and b or x', 'a and (b or x) or d']

# Generate the truth table
my_table = Truths(['a', 'b', 'x', 'd'], expressions)
print(my_table)

Install pip-audit

Add pip-audit to your requirements.txt and run it as part of the CI build to find vulnerabilities in installed modules

Upgrade poetry

Run poetry lock --no-update to update poetry

Clean up Dependencies

  • Move devdependencies to dependencies if they are only needed for runtime (boto3, jinja)
  • Move dependencies to devdependencies if they are only needed for build time and testing(pytest, urllib3)
  • Make sure there is no overlap between dependencies and devdependencies
  • Pin dependencies using < if there are compatibility issues

Optimize Deployment

  • Modules should be zipped to compress the deployed package
  • Make sure lockfiles are in the deployed package

Optimize Configuration

  • Make sure variable interpolation is using the correct syntax for XML/JSON/YAML files
  • Use functions instead of any other data type if there is an option
  • Make sure quotation marks are consistent (double vs single, unix vs windows)
  • Use a list instead of a dictionary if the spec requires a list

Use sums of boolean values instead of and/or clauses

If you have conditions based on multiple boolean values, add up the sum of the boolean values rather than using complex logic:

# Replace this:
if a or b:
# With this:
if a + b == 1:

# Replace this:
if a and b:
# With this:
if a + b == 2:

Raise if you do not get the type you expected

Use type to verify assumptions; if they are not true, raise an exception:

    if bool(type(meta) is dict) + bool(type(meta) is list) !== 1
        raise Exception('Wrong')

Raise TypeError or ValueError instead of a generic error

Use NameError, TypeError or ValueError to make exceptions more specific for class, type, or value related code respectively:

class MyException(Exception):
    pass

Use assert to raise errors for impossible calculations

# no matter what, discounted prices cannot be lower than 0 or higher than the listed price
assert 0 <= price <= product['price']

# check if the value of `a plus b` is less than 3
assert a + b < 3, f'No, the answer is {a + b}, which means someone changed the input types from boolean to integer'

# assert a numeric string is numberic
def add_dollar_sign(numeric_string):
    assert numeric_string.isnumeric(), 'Not a numeric string'
    return '$' + numeric_string

Combine logging with uncaught exceptions to trace unexpected errors

def excepthook(*args):
  logging.getLogger().error('Uncaught exception:', exc_info=args)

sys.excepthook = excepthook

assert 1==2, 'Something went wrong'

Use init to share globals

"""
Foo
"""
import os
from foo import Foo
import requests

foo_api_secrets = get_secret("foo")
foo_token = foo_secrets["token"]
foo = Foo(token=foo_token)
from util.foo import foo

Use comments to determine method scope

Add comments first to avoid scope creep

Add a README and link to it in the comments for mission critical details Money

Use Markdown within comments to add complex mixed content, like tables

Use absolute imports instead of relative imports

Instead of:

from . import hubspot

Use the name of the top-level directory instead:

from foo.bar import hubspot

Even if the name of the directory matches the name of the module:

from foo.hubspot import hubspot

Use eval to convert a string to a list

foo = '[[1]]'
bar = eval(foo)
type(bar)

Check how many times a string is found in a list

lst = [1, 2, 3, 'Alice', 'Alice']

indices = [i for i in range(len(lst)) if lst[i]=='Alice']

print(indices)

Convert a one member tuple to a string for use with the DBAPI:

if isinstance(foo, tuple) and len(foo) == 1:
   bar = str(foo).replace(",","")

Convert a multi member tuple to a dict for use with the DBAPI:

   baz = [dict(bar) for bar in foo]

Use return statements instead of assignment statements with the or clause

return {"foo": foo or "N/A"}

vs

try:
   foo = bar
except Exception as e:
   foo = "N/A"
   print e

Use asyncio instead of threads to avoid running out of processes

Use specific exceptions when using except clauses

  • Use library specific exceptions instead of the generic Exception class
  • Create custom Exception classes when writing user-defined modules
  • Use Python's built-in Exception classes when you expect a specific exception

Use nested try/except blocks to swallow errors:

try:
    foo
    try:
        bar
    except Exception as e:
        print("inner")
except Exception as e:
    print("outer")

Use raise and except to make non-200 responses throw exceptions


Use finally to print logs

    try:
        foo = bar
    except:
        bar = baz
    finally:
        print(foo)
        print(bar)
        print(baz)

Use the following built-ins to distinguish strings:

int(min("0", "hi")) # if you expect a numeric string, but get a alphanum string, this returns 0

str(max("9999", "A")) # if you expect a alphanum string, but get a numeric string, this returns 9999

If a sequence is length zero, it is falsy, so no need to check > 0

search result = []

if foo:
    print(f"foo is empty: {len(foo)}")

If you check the length of a dictionary, it will return the number of keys

   foo = {"hi":"mom","my":"name","is":"kid"}
   print(f"number of keys in foo: {len(foo)}")

Use getattr to check for existence in an object

foo = str(getattr(name, 'first_name', None))

Use a dictionary for caching

foo_cache = {}

if str(first_name) not in name_cache:
   new_name = search_name(first_name)
   if new_name:
       name_cache["first_name"] = new_name

Use .get() and or to search dictionaries

# if get() returns None, the or statement will return a string which can be parsed by the `in` operator
if str(foo) not in (bar.get("baz", "") or ""):
    bar["baz"] = str(foo)

Use split and join rather than string manipulation

number_list = ["1 2 3 4 "].strip().split(' ')
# remove extra whitespace between list indices
normalized_number_list = [index.strip() for index in number_list]
normalized_number_list.remove(4)
# convert list back to a string 
foo = " ".join(normalized_number_list)
# use the new length as a separate variable
foo_length = len(normalized_number_list)

Use str(), int() and min() to defensively add numbers

# Normalize all inputs as a string
foo = str("error")
bar = str(1)
# fallback to the numeric value if casting both as an integer fails
try:
    baz = int(foo) + int(bar) 
except:
    baz = int(min(foo, bar))

Logging to stdout and stderr

import logging
logging.info('This is the existing protocol.')
FORMAT = "%(asctime)-15s %(clientip)s %(user)-8s %(message)s"
logging.basicConfig(format=FORMAT)
d = {'clientip': '192.168.0.1', 'user': 'fbloggs'}
logging.warning("Protocol problem: %s", "connection reset", extra=d)

Fundamentals

isinstance => instanceOf

min/max => type coercion

repr => valueOf / toString

assert => console.assert

with => context scoping

The with statement makes access to named references inefficient, because the scopes for such access cannot be computed until runtime.

Use strict standards for loops:

  • use a break to get out of a loop, especially if it uses recursive calls
  • check a loop variable is a list before iterating
  • check a loop variable has more than the zero index before iterating; otherwise change the type to a dictionary

Use the following pattern to create a switch/case statement:

from collections import namedtuple

Case = namedtuple('Case', ['condition', 'code'])

cases = (Case('i > 0.5',
            """print 'greater than 0.5'"""),

         Case('i == 5',
            """print 'it is equal to 5'"""),

         Case('i > 5 and i < 6',
            """print 'somewhere between 5 and 6'"""))

def switch(cases, **namespace):
    for case in cases:
        if eval(case.condition, namespace):
            exec(case.code, namespace)
            break
    else:
        print 'default case'

switch(cases, i=5)

Use the following pattern to create a default value:

>>> li1 = None
>>> li2 = [1, 2, 3]

#  li1 is None so li2 is assigned 
a = li1 or li2

Gotchas

If you define a method with a default argument, its mutations will be cached, and all future calls will merge the new data and the mutated data.

Use None as a default argument to avoid this.

If you bind a default value to a lambda, it will be bound to the method:

f = lambda x=x: x

Decorating a Python method with staticmethod ensures that self will not be provided as an argument. staticmethods. Unlike other methods in Python, the first argument is always the class object.

Use the following argument to the print statement to clear the output buffer: flush=true

for i in range(10):
    print(i, end=" ", flush=True)
    time.sleep(.2)
    print()

Use raise to bubble up errors raised in a try clause:

try:
    raise NameError('HiThere')
    except NameError:
        print('An exception flew by!')
        raise

Use if/else and raise inside a try clause to conditionally bubble up non-programmatic errors in a REST API:

try:
   if foo:
       return jsonify({"foo":str(foo)})
   else:
       raise Exception(f'foo is None so the response cannot be parsed')
except Exception as e:
   return jsonify({"exception": str(e)})

Use the build environment variable to return try/except data in responses when for QA/Test environments:

try:
    foo()
except Exception as e:
    if BUILD_ENV == 'develop':
        return {"error": e, status: 500}
    else:
        return {"data": [], status: 500}

Use SQL regexp methods as a first resort, and re as a second resort (JS as a last resort) to do complex string manipulation:

import re

class Solution(object):
    
    def __init__(self):
        self.email_cache = {}
    
    def uniqueLocalName(self, email):
        sanitized_plus = re.sub('\+([^@]+)', '', email)
        unsanitized_local_name, domain_name = re.split('@', sanitized_plus)
        sanitized_local_name = re.sub(r'([^\.]+)\.?', r"\1", unsanitized_local_name)
        sanitized_email = sanitized_local_name + '@' + domain_name
        print(sanitized_email)
                                 
        if sanitized_email not in self.email_cache:
            self.email_cache[sanitized_email] = sanitized_email
            

    def numUniqueEmails(self, emails):
        """
        :type emails: List[str]
        :rtype: int
        """
        for email in emails:
            self.uniqueLocalName(email)
            
        return len(self.email_cache.keys())

Always catch exceptions when creating lists of dictionaries. If an exception happens, assign it to a variable in the except block then return it as a msg key/value pair. Otherwise, use the list index as the msg value:

        try:
            results = None
            payload = []
            results = db_session.execute(query).fetchall()
        except Exception as e:
            print(f'query: {e}')
            results = e
        finally:
            print(f'uuid: {uuid}')

        if isinstance(results, list):
            for idx, row in enumerate(results):
                payload.append(
                    {
                        "id": str(row[0]),
                        "msg": str(idx),
                    }
                )
        else:
                payload.append(
                    {    
                        "id": "",
                        "msg": str(results),
                    }
                )

Use an enum class to store constants:

class Day(IntEnum):
    MONDAY = 0
    TUESDAY = 1
    WEDNESDAY = 2
    THURSDAY = 3
    FRIDAY = 4
    SATURDAY = 5
    SUNDAY = 6

Run things locally to get stack traces that are obscured by the logging needle in a haystack problem.

flask dev

Use an event loop to control asyncio:

import asyncio

def hello_world(loop):
    """A callback to print 'Hello World' and stop the event loop"""
    print('Hello World')
    loop.stop()

loop = asyncio.new_event_loop()

# Schedule a call to hello_world()
loop.call_soon(hello_world, loop)

# Blocking call interrupted by loop.stop()
try:
    loop.run_forever()
finally:
    loop.close()

Use type hint to generalize type checking:

from typing import Dict, List, Optional

class Node:
    ...

class SymbolTable(Dict[str, List[Node]]):
    def push(self, name: str, node: Node) -> None:
        self.setdefault(name, []).append(node)

    def pop(self, name: str) -> Node:
        return self[name].pop()

    def lookup(self, name: str) -> Optional[Node]:
        nodes = self.get(name)
        if nodes:
            return nodes[-1]
        return None

'''
SymbolTable is a subclass of dict and a subtype of Dict[str, List[Node]].
'''

Use Slack webhooks as a poor man's Persistent Logger

Marshmallow

from marshmallow import Schema, fields

class RSSItemSchema(Schema):
    title = fields.String()
    link = fields.Url()
    description = fields.String()
    pubDate = fields.DateTime()
    guid = fields.String()
from marshmallow import Schema, fields

class SAMLAssertionSchema(Schema):
    issuer = fields.String()
    subject = fields.String()
    audience = fields.String()
    conditions = fields.String()
    authn_statement = fields.String()
    attribute_statement = fields.String()

Abstract Syntax Tree

import ast
import astunparse

class PrintVisitor(ast.NodeTransformer):
    def visit_Print(self, node):
        # Replace the old print statement with a new print function
        new_node = ast.Expr(
            value=ast.Call(
                func=ast.Name(id='print', ctx=ast.Load()),
                args=node.values,
                keywords=[],
            )
        )
        return ast.copy_location(new_node, node)

def convert_print_statements(source_code):
    # Parse the source code into an AST
    tree = ast.parse(source_code)

    # Transform the AST
    PrintVisitor().visit(tree)

    # Generate the new source code from the AST
    new_source_code = astunparse.unparse(tree)

    return new_source_code

# Read the old source code
with open('old.py', 'r') as f:
    old_source_code = f.read()

# Convert the print statements
new_source_code = convert_print_statements(old_source_code)

# Write the new source code
with open('new.py', 'w') as f:
    f.write(new_source_code)

Docstrings

Use curly braces to do variable substitution within docstrings

name = "Foo"
bar = f"""
Hi, {name}!
"""
print(bar)

Index out of range

Use a try/catch block, an initial assignment statement, and a nested try/catch block to test and log index access:

try:
     foo = None

        try:
            foo = results["data"]
        except (KeyError, Exception) as e:
            logging.warning(f"failed to get results for: {results}")
        finally:
            foo = foo or {"ok": False}

Open Source Ownership and Auditing

Glossary

Grammar

Gotchas

Data Model

pip

__init / sys.path

command-line args

Executable Scripts as Modules

Separate the __main__ logic for the module itself. For example:

def my_function():
    # Your function implementation here

if __name__ == "__main__":
    # Code to run when the script is executed directly
    print("This will only run if you run the script explicitly, not import it")

Activestate

Recipes

CMIS

Reflection

Sample Projects

Sample setup.py

from distutils.core import setup
from setuptools import find_packages

setup(
    name="foobarbaz",
    version="0.9.8",
    description="utility belt",
    author="Foo Bar Bazman",
    author_email="foobarbaz@http://foobarbaz.example.com",
    url="",
    packages=find_packages(),
    package_data={'config': ['README.md']}, # full path: ~/foobarbaz/config/README.md
    install_requires=[
        "hubspot-api-client==3.4.2",
        "python-box>=5.3.0",
        "stripe==2.42.0",
    ],
)
from typing import Dict, List, Optional

class Node:
    ...

class SymbolTable(Dict[str, List[Node]]):
    def push(self, name: str, node: Node) -> None:
        self.setdefault(name, []).append(node)

    def pop(self, name: str) -> Node:
        return self[name].pop()

    def lookup(self, name: str) -> Optional[Node]:
        nodes = self.get(name)
        if nodes:
            return nodes[-1]
        return None

SymbolTable is a subclass of dict and a subtype of Dict[str, List[Node]].

Callable vs Non-Callable Native Properties

my_string = "Hello, World!"
string_methods = [method for method in dir(my_string) if callable(getattr(my_string, method))]
print("String methods:")
for method in string_methods:
    print(method)

Wrapper Method

Use a class to return a method wrapped in your own library rather than a method to avoid returning a function that has to be called rather than a class which includes the method as a property:

import bar

def foo:
    return bar

# import foo
# foo = foo()
# baz = foo.bar(True)
import bar

class Foo:
    def __init__(self):
        self.bar = bar

## import Foo
foo = Foo()
baz = foo.bar(True)

Import Error Handling

import traceback
import importlib

def find_bar_variable(module_path):
    try:
        # Import the module dynamically
        module = importlib.import_module(module_path)

        # Check if 'bar' is a callable attribute (method or function)
        if hasattr(module, 'bar') and callable(getattr(module, 'bar')):
            # Call the 'bar' method and print the returned value
            result = getattr(module, 'bar')()
            print(f"Variable returned by 'bar': {result}")
        else:
            print("Method 'bar' not found in the module.")
    except Exception as e:
        print(f"Exception occurred: {e}")
        traceback.print_exc()
    finally:
        print(f"Call Stack: {traceback.print_stack()}")

# Example usage:
module_path = 'your_module_name'  # Replace with the actual module name or file path
find_bar_variable(module_path)

Class and Method Definitions

  • DECOUPLE, Decouple, decouple
  • Use Class definitions to organize state, define naming conventions, and enforce the order of operations
  • Use arguments to make things testable

Method Tracing

import sys
from functools import wraps

class TraceCalls(object):
    """ Use as a decorator on functions that should be traced. Several
        functions can be decorated - they will all be indented according
        to their call depth.
    """
    def __init__(self, stream=sys.stdout, indent_step=2, show_ret=False):
        self.stream = stream
        self.indent_step = indent_step
        self.show_ret = show_ret

        # This is a class attribute since we want to share the indentation
        # level between different traced functions, in case they call
        # each other.
        TraceCalls.cur_indent = 0

    def __call__(self, fn):
        @wraps(fn)
        def wrapper(*args, **kwargs):
            indent = ' ' * TraceCalls.cur_indent
            argstr = ', '.join(
                [repr(a) for a in args] +
                ["%s=%s" % (a, repr(b)) for a, b in kwargs.items()])
            self.stream.write('%s%s(%s)\n' % (indent, fn.__name__, argstr))

            TraceCalls.cur_indent += self.indent_step
            ret = fn(*args, **kwargs)
            TraceCalls.cur_indent -= self.indent_step

            if self.show_ret:
                self.stream.write('%s--> %s\n' % (indent, ret))
            return ret
        return wrapper

And here's how we can use it:

@TraceCalls()
def iseven(n):
    return True if n == 0 else isodd(n - 1)

@TraceCalls()
def isodd(n):
    return False if n == 0 else iseven(n - 1)

print(iseven(7))

System Level Exception Handling

import sys

def custom_exception_hook(exctype, value, traceback):
    print(f"Caught {exctype.__name__}: {value}")
    # Handle the exception or perform other actions here

# Set the custom exception hook
sys.excepthook = custom_exception_hook

# Your code goes here...
# Any unhandled exceptions will now be caught by the custom_exception_hook.

metaclasses

Use tuples to return multiple values and implement multiple dispatch based on the method signature of a class method

Creating a metaclass can allow you to add behavior to a class, for example, adding dot notation to a dictionary:

class DotDict(dict):
    def __getattr__(self, attr):
        return self.get(attr)

    def __setattr__(self, key, value):
        self[key] = value

    def __delattr__(self, item):
        if item in self:
            del self[item]

which can be used like this:

d = DotDict()
d.foo = 'bar'  # equivalent to d['foo'] = 'bar'
print(d.foo)  # equivalent to print(d['foo'])

2to3

2to3 is one good example of a good use of eval

Installation/Upgrading cpython

pylint

pytest

Mock Servers

jupyter

pandas

arguments

special methods

return values

internal boxing and unboxing

switch/case method dispatch table

Performance Profiling

Debugging Modules

Recipes

site-packages

String Formatting

Async

Decorators / Mixins

Patterns

Parsers

Quiz

Multiprocessing

Lazy Eval

JWT

WSGI/ASGI

Sample Code

References

ipinfo API Design

use Geo::IPinfo;
my $access_token = 'your_api_token_here';
my $ipinfo = Geo::IPinfo->new($access_token);
my $ip_address = '216.239.36.21';
my $details = $ipinfo->info($ip_address);
my $city = $details->city;  # Emeryville
my $loc = $details->loc;    # 37.8342,-122.2900

ASCII Art

+-------------------+       +-------------------+       +-------------------+
|                   |       |                   |       |                   |
|   Cron Job        | ----> |   Shell Script    | ----> |   Boto3 Script    |
|                   |       |                   |       |                   |
+-------------------+       +-------------------+       +-------------------+
        |                           |                           |
        |                           |                           |
        |                           |                           |
        v                           v                           v
+-------------------+       +-------------------+       +-------------------+
|                   |       |                   |       |                   |
|   Execute Shell   | ----> |   Execute Boto3   | ----> |   Interact with   |
|   Script          |       |   Script          |       |   SQLite Database |
|                   |       |                   |       |                   |
+-------------------+       +-------------------+       +-------------------+

Prompt

Other than lists, strings, tuples, dictionaries, integers, classes, functions, and metaclasses, what other data structures are in Python core without the use of module import statements?

Utility Functionality

Definition

def run_if_main(): # Your code here print("This code runs only if the script is executed directly.")

if name == "main": run_if_main()

Usage

another_module.py

from my_utils import run_if_main

Other code here

Call the utility function

run_if_main()

⚠️ **GitHub.com Fallback** ⚠️