project · 2019-2021

ADBConnectors, Databricks integration library

A PySpark-based Python package abstracting the I/O patterns between Azure Databricks and the half-dozen sinks an enterprise pipeline typically needs. Built during the ABN AMRO Bank engagement and open-sourced internally there.

#data-engineering #pyspark #databricks #python-package #open-source

A reusable PySpark integration library that abstracts the I/O patterns between Azure Databricks and the half-dozen sinks an enterprise data pipeline typically touches, Synapse, SQL Server, Cosmos DB, blob / parquet, JDBC. Built during the ABN AMRO Bank engagement and open-sourced internally to other teams there. Eliminates the boilerplate every team was reinventing per pipeline.

What it gives you

One unified API: read(source, ...) / write(target, ...) across all supported backends.
Credential injection from Azure Key Vault baked in, no secrets in notebooks.
Connection pooling at the cluster level, not the notebook level.
Partition-aware writes: dynamic partition overwrite semantics for Synapse, schema-evolution policies handled at the package level.

Adoption

Open-sourced inside ABN AMRO so other engineering teams could pick it up. A typical pipeline lost 100+ lines of plumbing per stage and gained consistent error semantics. Reviews stopped including “did you handle the connection close in the failure path?” because the package handled it.

Stack

Python · PySpark · Azure Databricks · Azure Key Vault · Synapse · Cosmos DB · JDBC.

← all projects