project · 2025-2026
DCP Guardian, AI agent that validates data-collection plans
An AI-powered Slack bot that reads a data-collection-plan document on Confluence, cross-references it against the live event registry and metrics catalog, optionally analyses a GitHub PR alongside, and posts a structured pass/fail review back into the thread. Multi-agent workflow with Agno + ChromaDB vector search + deterministic result caching.
An AI agent that lives in Slack and replaces the most tedious part of a data engineer’s review cycle: reading a data-collection-plan document, checking it field-by-field against the event registry, validating that the proposed schema matches what the code actually emits, and writing up the review.
Sits on top of the vehicle-telemetry silver layer: once that pipeline turns raw protobuf into clean Delta tables, this agent is what lets a non-engineer ask “is this DCP safe to deploy?” and get a defensible answer in 90 seconds. Same data, two layers up the stack.
See it work (scripted demo)
Press play. The dialogue below is hardcoded but mirrors the actual workflow. No live data, no live LLM, just a faithful reproduction of one happy-path review.
The deterministic cache means a re-submission of the same DCP returns instantly with no LLM calls. The conditional branch means a failed template skips event/metric validation entirely (no point validating schemas inside a malformed document). Both decisions came from watching the team actually use the bot for a week.
How it works
A reviewer mentions the bot in a Slack thread with a link to a Confluence DCP page (and optionally a GitHub PR). The bot then runs a multi-stage Agno workflow:
- Input validation parses the Slack submission into structured inputs.
- DCP template validation reviews the document’s required sections against a rubric and returns PASS / FAIL with specific reasons.
- Conditional branching:
- On PASS, optionally analyse the GitHub PR, validate referenced events against the registry, validate the metric definitions, optionally analyse the PKD (production data) layer, then synthesise a single comprehensive report.
- On FAIL, post a concise summary listing missing or incomplete fields.
- Caching: result keyed by a deterministic fingerprint of the Confluence content. Identical submissions get cached responses immediately.
- Posting: final report goes back to the Slack thread.
There is also a partial-review workflow for incremental DCP updates: when only a few new events or metrics are added to a previously-approved plan, the agent skips template validation and runs only the relevant sub-agents.
Stack
- Multi-agent orchestration: Agno Workflows v2 with explicit hand-off contracts between sub-agents (template-validator, event-validator, metric-validator, PR-analyser, report-synthesiser).
- Knowledge sources: vector search over
event_definitions.json(telemetry events extracted from Java sources) andoverview_per_dcp.json(metric definitions extracted from Confluence). ChromaDB as the vector store. - Tools: Confluence and GitHub API integrations exposed as Agno tools so the agent can read pages and PRs at runtime.
- Schema knowledge: Unity Catalog table schemas in the silver layer, so the agent can generate correct SQL when asked “what’s the success-rate metric for event X look like in code?”
- Slack interface: Slack Bolt app, deployed on an Azure VM, with persistent SQLite storage for sessions and cache.
- Pipelines: Databricks notebooks extract upstream data (Confluence pages, Java event definitions, metric definitions, schemas) into blob storage on a schedule. The bot reads from there.
My role
Architected end-to-end. Owned requirements gathering with the data-engineering team, conceptual design (the multi-agent decomposition, the deterministic cache, the conditional branch on template PASS / FAIL), and technical design (the sub-agent boundaries, the knowledge-source schemas, the cache-key fingerprint, the Agno Workflows v2 graph). Implementation delivered by junior engineers on the team under my design oversight, code review, and mentoring. The hardest decisions, where to put a sub-agent boundary, what to ground in vector search vs deterministic schema, when to cache, were the design ones, and those are mine.
Why this earns a spot in projects
This is the most honest demonstration I have shipped of the difference between a single LLM call and an agentic system. A single GPT-4 call given a Confluence URL would hallucinate half the validation. The agent works because each sub-agent has narrowly-scoped responsibilities, structured inputs and outputs, and grounded knowledge bases it must consult before answering. The deterministic cache is the kind of detail that only emerges after watching a real team use the thing for a week and noticing they re-ran identical reviews.