ORM Models and Schema Models - jbrucker/home-log GitHub Wiki

ORM Models describe the structure of entity objects that are persisted to database or other persistent storage. SqlAlchemy defines the syntax for ORM Models.

Schema Models describe how entity objects are represented. Schema models also provide validation rules and serialization/deserialization. Pydantic defines the syntax for Schema models.

You can Populate a ORM Model from a Schema Model and vice versa.

Referenced Objects and Reverse Relationships, how to define an attribute for an object referenced by a foreign key, and how to create a corresponding reverse relationship. By default, SqlAlchemy uses lazy instantiation but you can require eager instantiation either in the schema or as an option in a query.

Validation can be done by Pydantic or SqlAlchemy.

ORM Table Models

The SqlAlchemy 2.0 official style adds annotated type hints to database-mapped attributes:

from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column
from sqlalchemy import Integer, String
from sqlalchemy.ext.async import AsyncAttrs

class Base(AsyncAttrs, DeclarativeBase):
    pass

class User(Base):
     __tablename__ = "users"
    id: Mapped[int] = mapped_column(primary_key=True)
    username: Mapped[str] = mapped_column(String(50), nullable=False)

class UserPassword(Base):
    __tablename__ = "user_passwords"
    hashed_password: Mapped[str] = mapped_column(String, nullable=False)
    user_id: Mapped[int] = mapped_column(ForeignKey("users.id", ondelete="CASCADE"), nullable=False)

Inferred values can be omitted, such as Integer for primary key(in code above). You can completely omit mapped_column and accepted inferred or default values, as in:

   username: Mapped[str]  # inferred column type is `String`

Explanation

Mapped[int] is a generic marker meaning "this is an ORM-mapped attribute"
mapped_column(...) creates a Column object under the hood, but designed to work with Python typing.
in mapped_column it is not necessary to specify the datatype if it can be inferred from the type hint.
the entire mapped_column can be omitted if you want the default datatype and inferred properties, i.e. username: Mapped[str] defaults to mapped_column('String', nullable=True)
specify the data type if you need to add detail (String(50) or DateTime(timezone=True)), disambiguate, or add options: created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), nullable=False)

Old-Style ORM Models

Prior to SqlAlchemy 2.0, ORM table models were written as:

Base = declarative_base()

class User(Base):
    __tablename__ = "users"
    id = Column(Integer, primary_key=True, nullable=False)
    username = Column(String(50), unique=True, nullable=False)

Schema Models

You can have multiple Pydantic schema models for one persistence model. Each schema provides a subset of attributes for a particular purpose.

import pydantic
from pydantic import EmailStr
from datetime import datetime, timezone

class UserCreate(pydantic.BaseModel):
    """User attributes that are given to a service endpoint to create a new User entity."""
    email: EmailStr
    username: Optional[str] = None


class User(UserCreate):
    """The complete User schema."""
    id: int
    # In model classes, these default to current time
    created_at: datetime = datetime.now(timezone.utc)
    updated_at: datetime = datetime.now(timezone.utc)
    # model_config replaces the Config inner-class in Pydantic 2.0
    model_config = ConfigDict(from_attributes=True)

or:

    model_config = ConfigDict(from_attributes=True, model_class="User")

Populate a ORM Model from a Schema Model

Any of these techniques can be used.

1. Use **user_data.model_dump() (Pydantic v2) or **user_data.dict() (v1 or v2)

def save_user(user_data: schemas.UserCreate):
    user = models.User(**user_data.model_dump())

This assumes attribute names in UserCreate schema match those in User model.

2. Explicit Assignment of Attributes

def save_user(user_data: schemas.UserCreate):
    user = models.User(
             username=user_data.username,
             email=user_data.email
           )

Requires manual updating if new attributes are added to model.

3. Factory method in User model or UserCreate schema

Define your own method to perform the conversion. Models should not depend on schema, so put the method in UserCreate schema or a separate factory class:

def save_user(user_data: schemas.UserCreate):
    user = user_data.as_model()

# schemas class
class UserCreate(BaseModel):

    def as_model(self) -> models.User:
        return models.User(username=self.username, email=self.email)

Validation

Validation is done by Pydantic.

Schema classes automatically apply validation rules when you create a new schema object, but not if you assign a new value to an object.

import schemas
user = schema.UserCreate(username="Santa", email="[email protected]")
# but doesn't validate email here:
user.email = "santa@"

**model_validate(obj)** class method validates the parameter. obj` can be a model, a dict, or another schema object and returns a new Pydantic model instance.

import schemas
user_in = schemas.UserCreate(username="Santa", email="[email protected]")
# validate & create a different schema object (User)
user = schemas.User.model_validate(user_in)

data = {'username': 'harry', 'email': 'hackers@com'}
user = schemas.User.model_validate(data)
# raises ValidationError because 'email' is malformed

model_validate_json(json_data) validate JSON data (against a schema class) and returns an instance of the schema class.

Update an Existing Model Instance from a Pydantic Schema instance

Suppose user is an existing SqlAlchemy model (models.User) and user_data is a schema instance (schemas.User). To update only explicitly set fields:

update_data = user_data.model_dump(exclude_unset=True)

for field, value in update_data.items():
    setattr(user, field, value)

Referenced Objects and Reverse Relationships

A DataSource model contains a owner_id foreign key referencing the primary key of a User:

class DataSource(Base):
    """A source of data values, such as a meter or sensor."""
    __tablename__ = "data_sources"
    id: Mapped[int] = mapped_column(primary_key=True, nullable=False)
    name: Mapped[str] = mapped_column(String(MAX_NAME), nullable=False)
    owner_id: Mapped[int] = mapped_column(
                                    Integer,
                                    # Poor Abstraction: "users.id" instead of User.id
                                    ForeignKey("users.id", ondelete="SET NULL"),
                                    nullable=True
                                    )

You can define an attribute for the object referenced by owner_id:

class DataSource(Base):

    owner: Mapped[User] = relationship("User")

By default, SqlAlchemy uses lazy instantiation so that this query:

source_id = 1
source = await session.get(DataSource, source_id)

does not instantiate the source.owner reference. owner is lazily instantiated when referenced in code:

owner_name = source.owner.name  # owner (User) is instantiated

this can cause an async error ("Missing Greenlet") or session expired error if done outside the scope of the session.

Eager Instantiation in Query

To request that owner be populated in the query for DataSource, use:

source = await session.get(DataSource, source_id, 
                           options=[joinedload(DataSource.owner)])

joinedload uses a JOIN so that both objects are fetched in a single query.

To do the same thing is a select statement:

stmt = select(DataSource).options(joinedload(DataSource.owner)).where(DataSource.id == source_id)
result = await session.execute(stmt)
source = result.scalar_one_or_none()

An alternative to joinedload is selectinload, which uses 2 queries but may be better for large 1-to-many collections.

Reverse Relationships

To define a synchronized, 2-way association use one of these options in the relationship:

backpopulates - an explicit, 2-sided definition
backref - 1-side defined both ends of the relationship

Both of these can be used with other options for lazy/eager instantiation, cascading (deletion), 1-to-many, and whether each end should be a list or single value.

Reverse Relationship using `back_populates`

Explicitly define each end of the association. Each end may have different options.

class User(Base):
    __tablename__ = "users"
    id: Mapped[int] = mapped_column(primary_key=True)
    ...
    data_sources: Mapped[list[DataSource]] = relationship(
                      "DataSource",          # can omit class name if it can be inferred.
                      backpopulates="user",
                      cascade="all, delete-orphan",
                      lazy="select"  # lazily load objects when accessed
                      )

class DataSource(Base):
    __tablename__ = "data_sources"
    id: Mapped[int] = mapped_column(primary_key=True)
    owner_id: Mapped[int] = mapped_column(
                      Integer,
                      ForeignKey("users.id", ondelete="SET NULL"),
                      nullable=True
                      )
    owner = relationship("User",
                      back_populates="data_sources",
                      lazy="joined"   # use JOIN to eagerly fetch related object
                      )

Reverse Relationship using `backref`

One end declares and controls both ends of the relationship.

from sqlalchemy.orm import backref

class User(Base):
    __tablename__ = 'users'
    id = mapped_column(Integer, primary_key=True)
    data_sources = relationship("DataSource", backref="user")

This automatically creates the user attribute on UserPassword.

If you want to customize the other side, use backref(...) with parameters:

    data_sources = relationship(
                      "Datasource",
                      backref=backref("user", lazy="joined", cascade="all, delete")
                      )

How back_populates and backref differ:

Characteristic	`back_populates`	`backref`
Explicit declaration	Declare relationships on both sides	Declare both sides in one place
Control	More control; each side has its own config	Both sides share config
Clarity	Explicit; easier to read in large schemas	More concise for simple relationships
Customization	Each side can have independent `cascade`, `lazy`, etc.	Must use `backref()` to customize other side

Options for `back_populates` and `backref`

lazy= - loading strategy affects how related objects are loaded

Option	Behavior
`"select"`	Default. Loads related objects on access (lazy load).
`"joined"`	Uses `JOIN` to eagerly load related object in same query.
`"selectin"`	Loads related object(s) with a separate `SELECT ... IN` query.
`"immediate"`	Loads the related objects as soon as the parent is loaded (rarely used).
`"noload"`	Does not load the relationship at all (returns empty list or `None`).
`"raise"`	Raises an error if the relationship is accessed (used to enforce manual loading).

cascade= - how operations on parent affect the related children

Cascade Rule	Meaning
`"save-update"`	Propagate `session.add()` and updates.
`"merge"`	Merge operations will cascade.
`"expunge"`	Removing parent from session removes children too.
`"delete"`	Deleting parent will also delete the children (if not orphaned).
`"delete-orphan"`	If a child is removed from the collection, it will be deleted.
`"all"`	Includes all except `"delete-orphan"`.
`"all, delete-orphan"`	Typical for one-to-many where orphaned children should be deleted.

uselist= controls where relationship returns a list or scalar. Normally uselist=False needed only for 1-to-1 relationships. It is inferred on the "many" end of many-to-one relationships.

Setting Behavior

uselist=True (default) Returns a list (for one-to-many)

uselist=False Returns a scalar (for one-to-one or many-to-one)