post python Β· 2025-01-18 Β· 5 min read
Pydantic v2 for production data contracts
The line βwe use Pydanticβ gets thrown around like itβs just a faster dataclasses. It is not. Pydantic v2 is a contract layer for any place where data crosses from outside your code (HTTP request, JSON file, LLM tool response, queue message) to inside it. Used well, it eliminates an entire class of βwe got bad data and the bug surfaced 4 layers downstreamβ problem.
This post is the patterns I actually reach for in production, with code, plus the cases where I deliberately choose dataclasses or attrs instead.
The basic shape
from pydantic import BaseModel, Field
class User(BaseModel): id: int email: str age: int = Field(ge=0, le=150) is_active: bool = TrueValidation happens at construction:
User(id=1, email="x@y.com", age=30) # validUser(id="abc", email="x@y.com", age=30) # ValidationError: id must be intUser(id=1, email="x@y.com", age=200) # ValidationError: age must be β€ 150Field(ge=...) adds runtime constraints. The model becomes a self-documenting contract: read the class, you know exactly whatβs allowed.
Validators where types are not enough
For constraints types alone canβt express, use validators.
Single-field validator β runs after type coercion:
from pydantic import BaseModel, field_validator
class User(BaseModel): email: str
@field_validator("email") @classmethod def email_must_have_at(cls, v: str) -> str: if "@" not in v: raise ValueError("email must contain '@'") return v.lower() # normalise on the way inNote the @classmethod decorator and cls first arg. Pydantic v2 enforces this.
Whole-model validator β runs after every field is parsed, sees the whole instance:
from pydantic import model_validator
class DateRange(BaseModel): start: date end: date
@model_validator(mode="after") def end_must_follow_start(self) -> "DateRange": if self.end < self.start: raise ValueError("end must be on or after start") return selfUse model_validator for cross-field invariants (βend must follow startβ, βif A is null then B must not beβ). Use field_validator for per-field rules.
mode="before" for pre-coercion β runs on the raw input before type conversion:
class User(BaseModel): age: int
@field_validator("age", mode="before") @classmethod def parse_age(cls, v) -> int: if isinstance(v, str) and v.endswith(" years"): return int(v.removesuffix(" years")) return vUse mode="before" when the incoming data has a quirky encoding you need to normalise. Default (mode="after") when you just want to validate the parsed value.
Computed fields for derived state
from pydantic import BaseModel, computed_field
class Rectangle(BaseModel): width: float height: float
@computed_field @property def area(self) -> float: return self.width * self.heightRectangle(width=3, height=4).area β 12.0, included in model_dump() output. Useful when you want a derived value to ship with the serialised data without storing it as a field.
Discriminated unions for polymorphic JSON
This is where Pydantic v2 actually shines. You have JSON like:
[ { "kind": "click", "x": 10, "y": 20 }, { "kind": "scroll", "delta": 5 }, { "kind": "key", "code": "Enter" }]Three different shapes, distinguished by the kind field. Without Pydantic this is awful. With:
from typing import Annotated, Literal, Unionfrom pydantic import BaseModel, Field, TypeAdapter
class Click(BaseModel): kind: Literal["click"] x: int y: int
class Scroll(BaseModel): kind: Literal["scroll"] delta: int
class KeyEvent(BaseModel): kind: Literal["key"] code: str
Event = Annotated[ Union[Click, Scroll, KeyEvent], Field(discriminator="kind"),]
# Single eventevent = TypeAdapter(Event).validate_python({"kind": "click", "x": 10, "y": 20})# event is now correctly typed as Click
# List of eventsevents = TypeAdapter(list[Event]).validate_python(raw_json_list)The discriminator="kind" tells Pydantic to look at the kind field first and pick the matching schema. Validation is fast (no trial-and-error) and errors point at the right place.
This pattern is the cleanest way I know to parse LLM tool-call payloads, agent message buses, or any polymorphic JSON.
TypeAdapter for non-model validation
You donβt need a BaseModel for everything. For a one-off shape:
from pydantic import TypeAdapter
UserList = TypeAdapter(list[User])parsed = UserList.validate_python(raw_data)Or for a primitive with constraints:
from typing import Annotatedfrom pydantic import Field, TypeAdapter
PostalCode = Annotated[str, Field(pattern=r"^\d{5}$")]NL_PostalCode = Annotated[str, Field(pattern=r"^\d{4}\s?[A-Z]{2}$")]
TypeAdapter(NL_PostalCode).validate_python("1316XW") # passesTypeAdapter(NL_PostalCode).validate_python("13 16XW") # raises ValidationErrorUseful for validating function arguments, config values, or anywhere a full BaseModel is overkill.
model_config for cross-cutting behaviour
Tweak how a model behaves with model_config:
from pydantic import BaseModel, ConfigDict
class StrictUser(BaseModel): model_config = ConfigDict( frozen=True, # immutable after construction populate_by_name=True, # accept either field name or alias extra="forbid", # reject unknown keys str_strip_whitespace=True, # auto-strip strings )
id: int email: strThe four flags above are the ones I set most often:
frozen=Trueβ turns instances into pseudo-records. Hashable, can go in sets and as dict keys.populate_by_name=Trueβ needed when you have field aliases (camelCase JSON β snake_case Python) and want to construct from either name.extra="forbid"β strict-mode for ingest contracts. Catch typos likeenabel: trueinstead of silently ignoring.str_strip_whitespace=Trueβ common normalisation; saves a.strip()call everywhere.
Serialisation: model_dump() is what you want
user = User(id=1, email="x@y.com", age=30)
user.model_dump() # dictuser.model_dump_json() # JSON stringuser.model_dump(exclude={"email"}) # drop fieldsuser.model_dump(by_alias=True) # use aliases (camelCase JSON)Forget .dict() (v1 API, deprecated). Use model_dump() everywhere.
When to NOT use Pydantic
A few cases where dataclasses or attrs win:
- Internal data with no external boundary. If the data only flows between trusted internal functions and you have full type-checker coverage, the runtime validation overhead isnβt worth it. Use
@dataclass(slots=True). - Hot paths. Pydantic is fast (Rust core in v2), but itβs not free. Validating millions of objects per second still measurably slower than constructing dataclasses. Profile before assuming.
- Existing attrs codebases. Donβt introduce a third schema library. Pick one and stick with it.
A real-world example: LLM tool argument schema
This is where Pydantic earns its keep in 2025:
from pydantic import BaseModel, Field
class SearchArgs(BaseModel): query: str = Field(description="free-text search term, eg 'vegan restaurants'") center_lat: float = Field(description="latitude of search centre, decimal degrees") center_lon: float = Field(description="longitude of search centre, decimal degrees") radius_meters: int = Field(default=1000, ge=50, le=50_000) limit: int = Field(default=10, ge=1, le=50) min_rating: float | None = Field(default=None, ge=0, le=5)
# Hand the schema to the LLMschema_json = SearchArgs.model_json_schema()
# Parse the LLM's tool calldef search(raw_args: dict) -> SearchResult: args = SearchArgs.model_validate(raw_args) # ValidationError if bad # args is now fully typed, with bounds enforced ...The Field(description="...") strings end up in the JSON schema the LLM sees. The bounds (ge, le) get enforced at parse time. Bad arguments fail fast with a structured error you can turn into a βretry with corrected argsβ envelope (covered in the LLM tool-design post).
Closing
Pydantic isnβt βdataclasses with validationβ. Itβs a contract layer for everywhere data is untrusted: HTTP, file, LLM, queue. Use it at boundaries, lean on validators for invariants types canβt express, lean on discriminated unions for polymorphism, and lean on TypeAdapter for one-off shapes. Inside the trusted core of your code, dataclasses are fine.
The mental shift: stop thinking of it as βschema for my APIβ and start thinking of it as βthe gate at every boundary.β Once you do, the code that survives makes more sense.