project · 2019-2021
ADBConnectors, Databricks integration library
A PySpark-based Python package abstracting the I/O patterns between Azure Databricks and the half-dozen sinks an enterprise pipeline typically needs. Built during the ABN AMRO Bank engagement and open-sourced internally there.
A reusable PySpark integration library that abstracts the I/O patterns between Azure Databricks and the half-dozen sinks an enterprise data pipeline typically touches, Synapse, SQL Server, Cosmos DB, blob / parquet, JDBC. Built during the ABN AMRO Bank engagement and open-sourced internally to other teams there. Eliminates the boilerplate every team was reinventing per pipeline.
What it gives you
- One unified API:
read(source, ...)/write(target, ...)across all supported backends. - Credential injection from Azure Key Vault baked in, no secrets in notebooks.
- Connection pooling at the cluster level, not the notebook level.
- Partition-aware writes: dynamic partition overwrite semantics for Synapse, schema-evolution policies handled at the package level.
Adoption
Open-sourced inside ABN AMRO so other engineering teams could pick it up. A typical pipeline lost 100+ lines of plumbing per stage and gained consistent error semantics. Reviews stopped including “did you handle the connection close in the failure path?” because the package handled it.
Stack
Python · PySpark · Azure Databricks · Azure Key Vault · Synapse · Cosmos DB · JDBC.