Python - robbiehume/CS-Notes GitHub Wiki

Table of Contents (Click me)

  1. General Notes
  2. Python Environment Notes
  3. Specific version features
  4. Python Language Notes
  5. Add to google colab
  6. requests, urllib(3), & http modules
  7. Concurrent, asynchronous, multiprocessing
  8. Performance / memory considerations
  9. itertools
  10. Misc.
  11. Do specific things

Colab Notebooks

Links

General resource websites:


Look into

General Notes

  • To make FIPS compliant, from hashlib use sha256() instead of md5()

Python Environment Notes

pip

  • Install pip: link
  • PyPI is the central repository of software that pip uses to download from
  • pip installs packages, which are usually bunded into 'wheels' (.whl) prior to installation
  • A wheel is a zip-style archive that contains all the files necessary for a typical package installation
  • It may be best to run it as python3 -m pip instead of using the system-installed pip
    • Why you should use python -m pip
    • This uses executes pip using the Python interpreter you specified as python
    • This is beneficial so you know what pip version is being used if you have multiple python versions installed

Tips

  • See package dependencies: pipdeptree
    • Install it first with pip install pipdeptree
  • Loop through n first elements in list link: for item in itertools.islice(my_list, n)

Wheel files (.whl)

Virtual environments (venv)

  • Complete guide to python virtual environments
  • Virtual environments are used when you want to be more explicit and limit packages to a specific project
  • You should never install stuff into your global Python interpreter when you develop locally
  • To create an environment: python -m venv <venv name>

Python Language Notes

Variables and parameter passing

  • Deep dive into variables in Python
  • Python uses pass-by-object-reference for function parameter passing
    • Any changes to the actual object held in the parameter will change the original variable
    • Any reassignment will not be reflected in the original variable
  • When passing a variable of a mutable object (list, dict, set, classes, etc.), make sure that you pass a copy if you don't want the original variable object to be modified by any changes in the function

Duck typing

  • Duck typing is a programming concept where the suitability of an object is determined by the presence of certain methods and properties, rather than the object's actual type
  • Instead of checking an object's type explicitly, code assumes that if it "quacks like a duck" (i.e., has the necessary behavior), it can be used in the desired context
  • Key points:
    • Behavior over type: The focus is on what an object can do, not what class it belongs to.
    • Flexibility: Allows functions to operate on any object that implements the expected interface, promoting reusable and adaptable code.
    • Python's dynamic nature: Python commonly uses duck typing, enabling developers to write generic functions that work with various objects as long as they provide the required attributes or methods
  • Ex:
    • def quack_and_walk(duck):
          duck.quack()
          duck.walk()
      
      class Duck:
          def quack(self):
              print('Quack!')
      
      class Person:
          def quack(self):
              print('I can quack like a duck!')
      
      # Both objects work with quack because they implement quack()
      quack(Duck())
      quack(Person())

Dunder Methods (aka Magic Methods)

  • Dunder methods are methods that allow instances of a class to interact with the built-in functions and operators of the language

** (un)packing

Exceptions; try / except

  • Raising / handling custom exception:
    • class ExampleException(Exception):
        pass
      ...
      try: 
        if !var:
          raise ExampleException
        else:
          // do work
      except Exception as err: 
        if type(err) == ExampleException:
          // handle exception
      finally:
        // always runs regardless (even if the other blocks have a return statement)
        print('inside')
    • The finally block always runs regardless, even if the other blocks have a return statement
      • If the finally block has a return also, it overwrites the return from the other block

Emptiness / None check

  • if not nums is preferred and quicker than if len(nums) == 0
  • If need to explicitly check for None, need to do if nums is None

Add to google colab:

  • String methods: .startswith() and .endswith() instead of string slicing
  • Create dictionary from two diffferent sequences / collections: dict(zip(list1, list2))
  • Update multiple key/value pairs in a dticitonary: d1.update(key1=val1, key2=val2) or d1.update({'key1': 'val1', 'key2': 'val2'})
    • Ex: class_grades.update(algebra='A', english='B')

Dictionaries:

Generators:

Classes / OOP

  • To access the base class methods and attributes, you can use super()
    • Ex: calls both display_info() functions
      class Polygon:
          def __init__(self, sides):
              self.sides = sides
      
          def display_info(self):
              print("A polygon is a two dimensional shape with straight lines.")
      
      class Triangle(Polygon):
          def display_info(self):
              print("A triangle is a polygon with 3 edges.")
              super().display_info() # call the display_info() method of Polygon
  • If you have an __init__() constructor in the child class, you need to call super() inside it so that it initializes the attributes from the parent class
    • Ex: calls both display_info() functions
      class Person:
          def __init__(self, name):
              self.name = name
      
      class Student(Person):
          def __init__(self, student_id):
              self.student_id = student_id
              super().__init__()  # instantiate the Person attributes
  • @staticmethod decorator defines a class method as static. It doesn't take a self parameter

Python Code Organization: Classes, Files, Modules, Packages, Libraries, Frameworks, and Imports

  • modules ⊆ packages ⊆ libraries ⊆ frameworks

Python Files (.py)

  • A Python file (.py) contains Python code (functions, variables, classes, etc.)
  • It can be executed directly or imported as a module

Modules (single file)

  • A module is simply a Python file (.py) that can be imported into another Python file
  • Modules help organize and reuse code
  • Standard modules (like math, os) are built-in

Packages (directory with modules)

  • A package is a directory containing multiple modules and an __init__.py file (optional in Python 3.3+)
  • Helps organize large projects
  • Example Package Structure:
      my_project/
      │── main.py
      │── my_package/
      │   │── __init__.py
      │   │── module1.py
      │   │── module2.py
    

Libraries (Collection of Modules & Packages)

  • A collection of modules or packages that provide reusable functionality
  • Examples: requests, numpy, flask

Frameworks (Structured Library for a Purpose)

  • A structured collection of libraries and conventions that help build applications
  • Examples: Django (web development), Flask (web framework), PyTorch (machine learning)

requests, urllib(3), & http modules

Newer / higher-level: urllib3 vs requests

  • urllib3 and requests
  • Feature urllib3 requests
    Level of Control High (Low-level, customizable) Medium (High-level, abstracted API)
    Ease of Use Easier than urllib, more setup than requests Very easy, user-friendly
    Connection Pooling Automatic, efficient pooling Automatic, abstracted
    Performance Lightweight, efficient for many requests Slightly heavier due to extra features
    Retries and Timeouts Customizable, easy to configure Built-in, simplified
    Code Readability More readable than urllib, but still verbose Highly readable, concise
    Community/Documentation Moderate community, adequate docs Large community, extensive documentation
    Dependencies Few dependencies More dependencies (heavier library)
    Handling JSON and Sessions Basic handling Built-in, user-friendly

requests

  • requests Session to keep state
  • For most use cases, requests is the best choice. It is:
    • User-friendly: Easy to use with a simple API
    • Readable: Produces clean, concise, and maintainable code
    • Feature-rich: Includes built-in support for sessions, cookies, retries, and JSON handling
    • Community: Has extensive documentation and a large user base

urllib3

  • placeholder

Older / lower-level: urllib vs http(.client)

  • Useful when you need to avoid third-party dependencies or require very low-level control

  Comparison table(Click me)

  • Feature urllib http.client
    Level of Control High (Low-level API, manual setup) Very High (Raw HTTP control)
    Ease of Use Complex, verbose More complex, highly verbose
    Connection Pooling No built-in pooling Manual connection handling
    Performance Lightweight but verbose Lightweight, but requires manual setup
    Retries and Timeouts Manual setup Manual setup
    Code Readability Verbose, not user-friendly Highly verbose, least readable
    Community/Documentation Limited (older, standard lib) Limited (low-level library)
    Dependencies No external dependencies (built-in) No external dependencies (built-in)
    Handling JSON and Sessions Manual handling Manual handling

http

Concurrent, asynchronous, multiprocessing

  • concurrent.futures allows for easy integration of async functionality for certain parts of a mostly synchronous program

Performance / memory considerations

  • Sets (O(1)) have faster lookup times than lists (O(n))
  • List comprehensions / generator expressions are typically faster than filter() / map() / reduce() combinations
  • Generator expressions are preferred over list comprehensions when possible
    • Generator expressions produce values on-the-fly and are more memory-efficient and typically faster than list comprehensions, as it avoids creating an intermediate list
    • However, generator expressions can be slower than list comprehensions for small datasets due to the overhead of creating the iterator

itertools

itertools tutorial / documentation

Note: the operator module is used in some examples, but it is not necessary when using itertools

  • accumulate(): makes an iterator that returns the results of a function
    • itertools.accumulate(iterable[, func])
    • Passing a function
      data = [1, 2, 3, 4, 5]
      result = itertools.accumulate(data, operator.mul)
      print(list(result))   # [1, 2, 6, 24, 120]
    • Without passing a function (defaults to summation)
      result = itertools.accumulate(data)
      print(list(result))   # [1, 3, 6, 10, 15]
  • combinations(): takes an iterable and a integer. This will create all the unique combination that have r members
    • itertools.combinations(iterable, r)
      shapes = ['circle', 'triangle', 'square',]
      result = itertools.combinations(shapes, 2)
      print(list(result))   # [1, 2, 6, 24, 120]
  • count(): makes an iterator that returns evenly spaced values starting with number start
    • Similar to range(), but works for an infinite sequence (and is more memory efficient?)
    • itertools.count(start=0, step=1)
      for i in itertools.count(10,3):
          print(i)
          if i > 20:
              break
      # 10, 13, 16, 19, 22  (as individual lines)
  • cycle(): cycles through an iterator endlessly
    • itertools.cycle(iterable)
      colors = ['red', 'orange', 'yellow', 'green']
      for color in itertools.cycle(colors):
          print(color)
      # red, orange, yellow, green, red, orange, ...  (as individual lines)
  • chain(): cycles through an iterator endlessly
    • itertools.cycle(iterable)
      colors = ['red', 'orange', 'yellow', 'green']
      for color in itertools.cycle(colors):
          print(color)
      # red, orange, yellow, green, red, orange, ...  (as individual lines)
  • islice():
    • Similar to index slicing ([:x]), but is more memory-efficient and can handle infinite and non-indexable iterables
    • itertools.islice(iterable, start, stop[, step])
      colors = ['red', 'orange', 'yellow', 'green']
      for color in itertools.islice(colors, 2):
          print(color)
      # red, orange (as individual lines)
  • permutations():
    • itertools.permutations(iterable, r=None)
      alpha_data = ['a', 'b', 'c']
      result = itertools.permutations(alpha_data)
      list(result)  # [('a', 'b', 'c'), ('a', 'c', 'b'), ('b', 'a', 'c'), ('b', 'c', 'a'), ('c', 'a', 'b'), ('c', 'b', 'a')]
  • product(): creates the Cartesian products from a series of iterables.
    • itertools.permutations(iterable, r=None)
      num_data = [1, 2, 3]
      alpha_data = ['a', 'b', 'c']
      result = itertools.product(num_data, alpha_data)
      list(result)  # [(1, 'a'), (1, 'b'), (1, 'c'), (2, 'a'), (2, 'b'), (2, 'c'), (3, 'a'), (3, 'b'), (3, 'c')]

Misc.

Version features

3.8

  • Assignment Expressions (walrus operator) (link): can use := in an expression in a while loop or if statement to assign and evaluate it
    • Ex: if (y := 2) > 1: # sets y = 2 and evaluates the expression as 2 > 1
    • Ex: while (user_input := input("Enter text: ")) != "stop": # keeps getting user input until "stop" is entered
    • Can also use it in list comprehensions: [result for i in range(5) if (result := func(i)) == True]
      • It is more efficient because it potentially only makes half the func() calls compared to [func(i) for i in range(5) if func(i) == True]
  • f-string improvements: Now supports the = specifier for debugging (f"{var=}")

3.9

  • Dictionary union operators (| and |=):
    • d1 | d2 results in new dictionary resulting from the union of d1 and d2
    • Can use |= to do an in-place (update) union: d1 |= d2 // will make d1 equal to the resulting union
  • .removeprefix() and .removesuffix(): methods to simplify removing prefixes and suffixes from strings

3.10

  • Pattern matching (match and case) statements:
    • Introduces a match statement similar to switch-case, allowing pattern matching
    • Example
      match command:
      case 'start':
          start_process()
      case 'stop':
          stop_process()
  • Parenthesized Context Managers:
  • Allows using multiple context managers more neatly.
  • Example: with (open('file1') as f1, open('file2') as f2):

3.11

  • Significant Performance Improvements:
    • Python 3.11 includes performance improvements, claiming to be around 10-60% faster than Python 3.10
  • Exception Groups (ExceptionGroup) and except*:
    • Allows raising and handling multiple exceptions simultaneously.
    • except* is used to handle ExceptionGroup objects
      • This allows you to handle multiple exceptions raised together, enabling you to catch subsets of exceptions more precisely
    • Example:
      try:
        raise ExceptionGroup("Multiple Errors", [ValueError("Invalid value"), TypeError("Type mismatch")])
      except* ValueError as e:
        print(f"Caught ValueError: {e}")
      except* TypeError as e:
        print(f"Caught TypeError: {e}")
  • taskgroups in asyncio:
    • Easier way to manage groups of asynchronous tasks.
    • Example:
      async with asyncio.TaskGroup() as tg:
        tg.create_task(some_coroutine())

3.12

  • Enhanced async and await:
    • Improvements in asyncio and asynchronous task handling for better performance and simpler code patterns

Do specific things:

  • Debug print:
    • DEBUG == True
      def print_debug(*args, **kwargs):
          if DEBUG == True:
              print(' '.join(map(str,args)), **kwargs, flush=True)
  • Print traceback of error after catching an Exception: traceback.format_exc()
  • Check if variable or attribute exists, without causing an error if it doesn't:
    • if 'myVar' in locals():
          # myVar exists in local scope
      if 'myVar' in globals():
          # myVar exists in global scope
      if hasattr(obj, 'attr_name'):
          # obj.attr_name exists
  • Convert string to json, only if it exists (isn't empty)
    • data = json.loads(body) if (body := event.get("body")) else body
  • Clean use of ternary for dictionary value:
    • email = os.environ.get('stg_email' if 'stg' in env else 'prod_email')
  • Concise and efficient way to get a value based on a specific input (similar to case statement):
    • type_ = query.get('type')
      type_num = {'a': 1, 'b': 2, 'c': 3}.get(type_, 0)   # Provide a default value (e.g., 0)
  • Build dictionary from list of keys:
    • def get_data_values(data):   # Dict[str, Any] -> Dict[str, Any]
          keys = ['a', 'b', 'c', 'd']
          return {
              'type': 'data',
              'id': data.get('data_id', ''),
              **{key: data.get(key, '') for key in keys}
          }
  • See list of installed and built-in modules: print(help('modules'))
  • Check if object is an instance of a specific class(es)
    • isinstance(var, str): returns true if var is a string object
    • isinstance(var, (str, int)): returns true if var is a string or an int object
⚠️ **GitHub.com Fallback** ⚠️