post python · 2025-01-18 · 5 min read

Pydantic v2 for production data contracts

#python#pydantic#data-contracts#validation

The line “we use Pydantic” gets thrown around like it’s just a faster dataclasses. It is not. Pydantic v2 is a contract layer for any place where data crosses from outside your code (HTTP request, JSON file, LLM tool response, queue message) to inside it. Used well, it eliminates an entire class of “we got bad data and the bug surfaced 4 layers downstream” problem.

This post is the patterns I actually reach for in production, with code, plus the cases where I deliberately choose dataclasses or attrs instead.

The basic shape

from pydantic import BaseModel, Field
class User(BaseModel):
id: int
email: str
age: int = Field(ge=0, le=150)
is_active: bool = True

Validation happens at construction:

User(id=1, email="x@y.com", age=30) # valid
User(id="abc", email="x@y.com", age=30) # ValidationError: id must be int
User(id=1, email="x@y.com", age=200) # ValidationError: age must be ≤ 150

Field(ge=...) adds runtime constraints. The model becomes a self-documenting contract: read the class, you know exactly what’s allowed.

Validators where types are not enough

For constraints types alone can’t express, use validators.

Single-field validator — runs after type coercion:

from pydantic import BaseModel, field_validator
class User(BaseModel):
email: str
@field_validator("email")
@classmethod
def email_must_have_at(cls, v: str) -> str:
if "@" not in v:
raise ValueError("email must contain '@'")
return v.lower() # normalise on the way in

Note the @classmethod decorator and cls first arg. Pydantic v2 enforces this.

Whole-model validator — runs after every field is parsed, sees the whole instance:

from pydantic import model_validator
class DateRange(BaseModel):
start: date
end: date
@model_validator(mode="after")
def end_must_follow_start(self) -> "DateRange":
if self.end < self.start:
raise ValueError("end must be on or after start")
return self

Use model_validator for cross-field invariants (“end must follow start”, “if A is null then B must not be”). Use field_validator for per-field rules.

mode="before" for pre-coercion — runs on the raw input before type conversion:

class User(BaseModel):
age: int
@field_validator("age", mode="before")
@classmethod
def parse_age(cls, v) -> int:
if isinstance(v, str) and v.endswith(" years"):
return int(v.removesuffix(" years"))
return v

Use mode="before" when the incoming data has a quirky encoding you need to normalise. Default (mode="after") when you just want to validate the parsed value.

Computed fields for derived state

from pydantic import BaseModel, computed_field
class Rectangle(BaseModel):
width: float
height: float
@computed_field
@property
def area(self) -> float:
return self.width * self.height

Rectangle(width=3, height=4).area12.0, included in model_dump() output. Useful when you want a derived value to ship with the serialised data without storing it as a field.

Discriminated unions for polymorphic JSON

This is where Pydantic v2 actually shines. You have JSON like:

[
{ "kind": "click", "x": 10, "y": 20 },
{ "kind": "scroll", "delta": 5 },
{ "kind": "key", "code": "Enter" }
]

Three different shapes, distinguished by the kind field. Without Pydantic this is awful. With:

from typing import Annotated, Literal, Union
from pydantic import BaseModel, Field, TypeAdapter
class Click(BaseModel):
kind: Literal["click"]
x: int
y: int
class Scroll(BaseModel):
kind: Literal["scroll"]
delta: int
class KeyEvent(BaseModel):
kind: Literal["key"]
code: str
Event = Annotated[
Union[Click, Scroll, KeyEvent],
Field(discriminator="kind"),
]
# Single event
event = TypeAdapter(Event).validate_python({"kind": "click", "x": 10, "y": 20})
# event is now correctly typed as Click
# List of events
events = TypeAdapter(list[Event]).validate_python(raw_json_list)

The discriminator="kind" tells Pydantic to look at the kind field first and pick the matching schema. Validation is fast (no trial-and-error) and errors point at the right place.

This pattern is the cleanest way I know to parse LLM tool-call payloads, agent message buses, or any polymorphic JSON.

TypeAdapter for non-model validation

You don’t need a BaseModel for everything. For a one-off shape:

from pydantic import TypeAdapter
UserList = TypeAdapter(list[User])
parsed = UserList.validate_python(raw_data)

Or for a primitive with constraints:

from typing import Annotated
from pydantic import Field, TypeAdapter
PostalCode = Annotated[str, Field(pattern=r"^\d{5}$")]
NL_PostalCode = Annotated[str, Field(pattern=r"^\d{4}\s?[A-Z]{2}$")]
TypeAdapter(NL_PostalCode).validate_python("1316XW") # passes
TypeAdapter(NL_PostalCode).validate_python("13 16XW") # raises ValidationError

Useful for validating function arguments, config values, or anywhere a full BaseModel is overkill.

model_config for cross-cutting behaviour

Tweak how a model behaves with model_config:

from pydantic import BaseModel, ConfigDict
class StrictUser(BaseModel):
model_config = ConfigDict(
frozen=True, # immutable after construction
populate_by_name=True, # accept either field name or alias
extra="forbid", # reject unknown keys
str_strip_whitespace=True, # auto-strip strings
)
id: int
email: str

The four flags above are the ones I set most often:

Serialisation: model_dump() is what you want

user = User(id=1, email="x@y.com", age=30)
user.model_dump() # dict
user.model_dump_json() # JSON string
user.model_dump(exclude={"email"}) # drop fields
user.model_dump(by_alias=True) # use aliases (camelCase JSON)

Forget .dict() (v1 API, deprecated). Use model_dump() everywhere.

When to NOT use Pydantic

A few cases where dataclasses or attrs win:

A real-world example: LLM tool argument schema

This is where Pydantic earns its keep in 2025:

from pydantic import BaseModel, Field
class SearchArgs(BaseModel):
query: str = Field(description="free-text search term, eg 'vegan restaurants'")
center_lat: float = Field(description="latitude of search centre, decimal degrees")
center_lon: float = Field(description="longitude of search centre, decimal degrees")
radius_meters: int = Field(default=1000, ge=50, le=50_000)
limit: int = Field(default=10, ge=1, le=50)
min_rating: float | None = Field(default=None, ge=0, le=5)
# Hand the schema to the LLM
schema_json = SearchArgs.model_json_schema()
# Parse the LLM's tool call
def search(raw_args: dict) -> SearchResult:
args = SearchArgs.model_validate(raw_args) # ValidationError if bad
# args is now fully typed, with bounds enforced
...

The Field(description="...") strings end up in the JSON schema the LLM sees. The bounds (ge, le) get enforced at parse time. Bad arguments fail fast with a structured error you can turn into a “retry with corrected args” envelope (covered in the LLM tool-design post).

Closing

Pydantic isn’t “dataclasses with validation”. It’s a contract layer for everywhere data is untrusted: HTTP, file, LLM, queue. Use it at boundaries, lean on validators for invariants types can’t express, lean on discriminated unions for polymorphism, and lean on TypeAdapter for one-off shapes. Inside the trusted core of your code, dataclasses are fine.

The mental shift: stop thinking of it as “schema for my API” and start thinking of it as “the gate at every boundary.” Once you do, the code that survives makes more sense.