project · 2023-2024

Vehicle-telemetry silver-layer ETL

Refactored a vehicle-telemetry processing pipeline. Transforms raw nested JSON / protobuf from millions of in-car navigation clients into clean Delta tables that power navigation-quality dashboards. Designed for query simplicity: PMs answer 'route success rate in country X this week' with a single SELECT.

Refactored the data pipeline that processes vehicle-navigation telemetry from millions of in-car clients. The output is the silver layer of TomTom’s medallion-architecture data lake, clean, queryable Delta tables that downstream PMs, ops engineers, and partner-facing dashboards all read from.

What was wrong with the old pipeline

The “before” was a Databricks pipeline that did sort-of work but wasn’t sustainable:

What I built

Architecture

in-car navigation client
↓ MQTT
telemetry backend
Azure Event Hub
ingest service
Azure Data Lake (bronze, raw protobuf)
↓ ── MY WORK ──
Silver-layer ETL (Databricks / PySpark)
Delta tables (silver: route_planning, connectivity, traffic_health, …)
Grafana / partner dashboards / SLA reports

Why this matters for the business

Navigation telemetry isn’t a vanity metric, it’s contractual. OEM partners pay for navigation that meets agreed quality bars. If route success rates dip, TomTom needs to know within hours, not weeks. The silver layer is what makes that detection-and-explanation loop fast.

Why this earns a spot in projects

Data engineering work is invisible when it’s good, nobody compliments your ETL. But the bar for “good” silver-layer design is can a non-technical PM answer their own question in one query without engineering help? On that bar, this one shipped.

The silver layer is also what unlocks the DCP Guardian AI agent on top of it. The agent walks an event registry and metric catalog that only exist because this pipeline exists. Stable schemas in the data layer → reliable answers in the agent layer. They are the same project at two levels of abstraction.

← all projects