project · 2023-2024
PPDA Data Validator, schema-aware validation for analytics pipelines
A Pydantic-based data-validation library for the team that runs API analytics. Engineers point it at a JSON event, it generates a schema, they edit it to add field-level rules (null, range, allowed-values, custom), and the orchestrator runs validations every ingest. Distributed as an installable wheel.
A Python library I built so any engineer on the API analytics team could add validation to a new event type in under an hour, without rolling their own Pydantic schema by hand.
What it does
Three steps:
- Generate a schema from a sample JSON event. Library reads the event and emits a Pydantic-flavoured schema file with placeholder rules.
- Engineer edits the schema to add field-level rules. The library supports three out of the box:
- Null check: whether a field is allowed to be null.
- Range check: numeric
[lo, hi]bounds. - Allowed-values check: whitelist for enumerated fields. Custom rules drop in as Python functions when a built-in is not enough.
- Run the orchestrator with the input dataset. It applies the schema + custom validators, returns a structured pass/fail report.
Why this matters for analytics teams
Schema drift in upstream telemetry is a silent killer for downstream dashboards. The team was discovering bad data via “the dashboard looks weird” rather than at ingest. The validator pushes detection upstream: ingest the event, validate, fail loud. The schema files become living documentation for what each event is supposed to look like.
Stack
- Python, Pydantic for the runtime types.
- CLI for schema generation; library API for embedding in larger pipelines.
- Distributed internally as a
.whlso consumer teams justpip installand import.
Why this earns a spot in projects
It is a small library, but it is the kind of thing that quietly lifts the floor of a whole team. Six months after I shipped it, every new event type at the team got a schema written before the data even started flowing, because that was the path of least resistance. That is the bar: make the right thing the easy thing.