tag

#data-engineering

10 items · 9 projects · 1 feed item

projects

project

Vehicle-telemetry silver-layer ETL

Refactored a vehicle-telemetry processing pipeline. Transforms raw nested JSON / protobuf from millions of in-car navigation clients into clean Delta tables that power navigation-quality dashboards. Designed for query simplicity: PMs answer 'route success rate in country X this week' with a single SELECT.

project

ETL template for API analytics pipelines

Refactored the inconsistent set of ETL pipelines feeding TomTom's API analytics into a single OOP template, distributed as an internal Python package. Engineers subclass the base, get live Azure Data Explorer connections for free, and only write the business logic in extract / transform / load. Result: consistent, version-controlled pipelines across volume, response-time, and error-rate use cases.

project

Developer-portal analytics APIs

REST API layer that powers the analytics dashboards on developer.tomtom.com. Sits on top of an Azure Data Explorer (Kusto) backend that ingests every API call across TomTom's developer products. Volume reports, response-time percentiles, error-rate breakdowns, and per-product usage all flow through this layer.

project

Enterprise data lake on GCP, from scratch

Architected an enterprise data lake on Google Cloud (BigQuery, Cloud Storage, Cloud Composer / Airflow) at Oyo Vacation Homes. Designed a metadata-driven ETL framework in PySpark that cut new-pipeline development time by ~70%. Established schema-versioning and quality-validation patterns the team still uses.

project

PySpark ETL optimisation for Citi Bank Singapore

Optimised long-running ETL pipelines for Citi Bank Singapore (TCS engagement). Cut Data Stage job execution from 4 hours to 1 hour through Python multiprocessing + PySpark parallelisation. Designed real-time message handling with Kafka, plus AVRO / Parquet conversion and HDFS compression strategies for the bank's machine-learning data pipelines.

feed

← all tags