project · 2023-2024

TheNewsBid, AI-synthesised news + RAG study chatbot

Co-founded a news-aggregation platform that uses GPT models to synthesise stories from multiple sources, removing single-source bias. Plus a RAG-based study chatbot for students preparing for India's public-service exams: embeddings pipeline, vector search, and a conversational interface.

#llm #rag #vector-search #full-stack #news-aggregation #edtech #startup

Two products under one umbrella, both built around the same insight: when consumers cannot tell a sourced fact from a confidently-stated opinion, an LLM that cites its sources is more useful than a faster one.

Product 1: news synthesis

The reader inputs a topic. The platform pulls coverage from many publishers, summarises each independently, and synthesises a balanced article that surfaces the points of agreement and the points of contention. Every claim links back to the source articles. The reader sees a synthesis, not an opinion.

Product 2: RAG chatbot for India’s public-service exam students

A conversational study assistant grounded in the curated study material for India’s UPSC-style civil-service exams. Built as a RAG pipeline:

Source material chunked, embedded, and indexed in a vector store.
User questions retrieve the top-k chunks and feed them into a GPT-style model that answers strictly from the retrieved context, with citations.
Conversation history is preserved per user so follow-up questions resolve correctly.
Difficult questions surface a “show me the source passage” affordance so students can read the original material.

Build and stack

Backend: Python (FastAPI) for both products, serving streaming responses.
Embeddings + vector search: production embeddings pipeline; vector store with daily reindexing as the source corpus grows.
LLM layer: GPT-class models with structured output where the use case demanded structure (citations, claim attributions).
Frontend: React for the news product; a chat-style UI for the study chatbot.

My role

Co-founder, lead developer. Owned product direction, technical architecture, deployment, and the iterative loop with early users.

Why this earns a spot in projects

The study-bot in particular taught me what “production RAG” actually has to do: chunking strategy that respects the source material’s structure, retrieval evaluation against real student questions (not synthetic), citation rendering that students trust, and aggressive caching for the queries that repeat (and they repeat a lot, exam syllabi do not change often). Every RAG system I have built since reuses those lessons.