post python · 2025-01-18 · 5 min read

Pydantic v2 for production data contracts

#python#pydantic#data-contracts#validation

The line “we use Pydantic” gets thrown around like it’s just a faster dataclasses. It is not. Pydantic v2 is a contract layer for any place where data crosses from outside your code (HTTP request, JSON file, LLM tool response, queue message) to inside it. Used well, it eliminates an entire class of “we got bad data and the bug surfaced 4 layers downstream” problem.

This post is the patterns I actually reach for in production, with code, plus the cases where I deliberately choose dataclasses or attrs instead.

The basic shape

from pydantic import BaseModel, Field

class User(BaseModel):
    id: int
    email: str
    age: int = Field(ge=0, le=150)
    is_active: bool = True

Validation happens at construction:

User(id=1, email="x@y.com", age=30)             # valid
User(id="abc", email="x@y.com", age=30)         # ValidationError: id must be int
User(id=1, email="x@y.com", age=200)            # ValidationError: age must be ≤ 150

Field(ge=...) adds runtime constraints. The model becomes a self-documenting contract: read the class, you know exactly what’s allowed.

Validators where types are not enough

For constraints types alone can’t express, use validators.

Single-field validator — runs after type coercion:

from pydantic import BaseModel, field_validator

class User(BaseModel):
    email: str

    @field_validator("email")
    @classmethod
    def email_must_have_at(cls, v: str) -> str:
        if "@" not in v:
            raise ValueError("email must contain '@'")
        return v.lower()  # normalise on the way in

Note the @classmethod decorator and cls first arg. Pydantic v2 enforces this.

Whole-model validator — runs after every field is parsed, sees the whole instance:

from pydantic import model_validator

class DateRange(BaseModel):
    start: date
    end: date

    @model_validator(mode="after")
    def end_must_follow_start(self) -> "DateRange":
        if self.end < self.start:
            raise ValueError("end must be on or after start")
        return self

Use model_validator for cross-field invariants (“end must follow start”, “if A is null then B must not be”). Use field_validator for per-field rules.

mode="before" for pre-coercion — runs on the raw input before type conversion:

class User(BaseModel):
    age: int

    @field_validator("age", mode="before")
    @classmethod
    def parse_age(cls, v) -> int:
        if isinstance(v, str) and v.endswith(" years"):
            return int(v.removesuffix(" years"))
        return v

Use mode="before" when the incoming data has a quirky encoding you need to normalise. Default (mode="after") when you just want to validate the parsed value.

Computed fields for derived state

from pydantic import BaseModel, computed_field

class Rectangle(BaseModel):
    width: float
    height: float

    @computed_field
    @property
    def area(self) -> float:
        return self.width * self.height

Rectangle(width=3, height=4).area → 12.0, included in model_dump() output. Useful when you want a derived value to ship with the serialised data without storing it as a field.

Discriminated unions for polymorphic JSON

This is where Pydantic v2 actually shines. You have JSON like:

[
  { "kind": "click", "x": 10, "y": 20 },
  { "kind": "scroll", "delta": 5 },
  { "kind": "key", "code": "Enter" }
]

Three different shapes, distinguished by the kind field. Without Pydantic this is awful. With:

from typing import Annotated, Literal, Union
from pydantic import BaseModel, Field, TypeAdapter

class Click(BaseModel):
    kind: Literal["click"]
    x: int
    y: int

class Scroll(BaseModel):
    kind: Literal["scroll"]
    delta: int

class KeyEvent(BaseModel):
    kind: Literal["key"]
    code: str

Event = Annotated[
    Union[Click, Scroll, KeyEvent],
    Field(discriminator="kind"),
]

# Single event
event = TypeAdapter(Event).validate_python({"kind": "click", "x": 10, "y": 20})
# event is now correctly typed as Click

# List of events
events = TypeAdapter(list[Event]).validate_python(raw_json_list)

The discriminator="kind" tells Pydantic to look at the kind field first and pick the matching schema. Validation is fast (no trial-and-error) and errors point at the right place.

This pattern is the cleanest way I know to parse LLM tool-call payloads, agent message buses, or any polymorphic JSON.

TypeAdapter for non-model validation

You don’t need a BaseModel for everything. For a one-off shape:

from pydantic import TypeAdapter

UserList = TypeAdapter(list[User])
parsed = UserList.validate_python(raw_data)

Or for a primitive with constraints:

from typing import Annotated
from pydantic import Field, TypeAdapter

PostalCode = Annotated[str, Field(pattern=r"^\d{5}$")]
NL_PostalCode = Annotated[str, Field(pattern=r"^\d{4}\s?[A-Z]{2}$")]

TypeAdapter(NL_PostalCode).validate_python("1316XW")   # passes
TypeAdapter(NL_PostalCode).validate_python("13 16XW")  # raises ValidationError

Useful for validating function arguments, config values, or anywhere a full BaseModel is overkill.

`model_config` for cross-cutting behaviour

Tweak how a model behaves with model_config:

from pydantic import BaseModel, ConfigDict

class StrictUser(BaseModel):
    model_config = ConfigDict(
        frozen=True,                  # immutable after construction
        populate_by_name=True,         # accept either field name or alias
        extra="forbid",                # reject unknown keys
        str_strip_whitespace=True,     # auto-strip strings
    )

    id: int
    email: str

The four flags above are the ones I set most often:

frozen=True — turns instances into pseudo-records. Hashable, can go in sets and as dict keys.
populate_by_name=True — needed when you have field aliases (camelCase JSON ↔ snake_case Python) and want to construct from either name.
extra="forbid" — strict-mode for ingest contracts. Catch typos like enabel: true instead of silently ignoring.
str_strip_whitespace=True — common normalisation; saves a .strip() call everywhere.

Serialisation: `model_dump()` is what you want

user = User(id=1, email="x@y.com", age=30)

user.model_dump()                  # dict
user.model_dump_json()             # JSON string
user.model_dump(exclude={"email"}) # drop fields
user.model_dump(by_alias=True)     # use aliases (camelCase JSON)

Forget .dict() (v1 API, deprecated). Use model_dump() everywhere.

When to NOT use Pydantic

A few cases where dataclasses or attrs win:

Internal data with no external boundary. If the data only flows between trusted internal functions and you have full type-checker coverage, the runtime validation overhead isn’t worth it. Use @dataclass(slots=True).
Hot paths. Pydantic is fast (Rust core in v2), but it’s not free. Validating millions of objects per second still measurably slower than constructing dataclasses. Profile before assuming.
Existing attrs codebases. Don’t introduce a third schema library. Pick one and stick with it.

A real-world example: LLM tool argument schema

This is where Pydantic earns its keep in 2025:

from pydantic import BaseModel, Field

class SearchArgs(BaseModel):
    query: str = Field(description="free-text search term, eg 'vegan restaurants'")
    center_lat: float = Field(description="latitude of search centre, decimal degrees")
    center_lon: float = Field(description="longitude of search centre, decimal degrees")
    radius_meters: int = Field(default=1000, ge=50, le=50_000)
    limit: int = Field(default=10, ge=1, le=50)
    min_rating: float | None = Field(default=None, ge=0, le=5)

# Hand the schema to the LLM
schema_json = SearchArgs.model_json_schema()

# Parse the LLM's tool call
def search(raw_args: dict) -> SearchResult:
    args = SearchArgs.model_validate(raw_args)  # ValidationError if bad
    # args is now fully typed, with bounds enforced
    ...

The Field(description="...") strings end up in the JSON schema the LLM sees. The bounds (ge, le) get enforced at parse time. Bad arguments fail fast with a structured error you can turn into a “retry with corrected args” envelope (covered in the LLM tool-design post).

Closing

Pydantic isn’t “dataclasses with validation”. It’s a contract layer for everywhere data is untrusted: HTTP, file, LLM, queue. Use it at boundaries, lean on validators for invariants types can’t express, lean on discriminated unions for polymorphism, and lean on TypeAdapter for one-off shapes. Inside the trusted core of your code, dataclasses are fine.

The mental shift: stop thinking of it as “schema for my API” and start thinking of it as “the gate at every boundary.” Once you do, the code that survives makes more sense.