ML-Powered Data Product

GameThread: Multi-Sport Predictive Analytics

A production-grade NBA analytics platform built across 7 architectural phases — from raw data ingestion to an agentic AI chatbot. Combines an XGBoost/LightGBM/TensorFlow ensemble with SHAP explainability and Kelly Criterion bet sizing, a LangGraph multi-agent Gemini RAG chatbot (ChromaDB + pgvector, Langfuse tracing), and a full MLOps loop with drift detection and retrain-policy evaluation — all deployed on Google Cloud Run with Cloud SQL, Vertex AI, Pub/Sub, and Cloud Scheduler orchestrating daily pipelines.

Problem

Sports analytics data is fragmented across APIs, and building a single platform that reliably ingests live game data, trains and serves explainable ML predictions at production scale, and surfaces insights through both an agentic AI chatbot and a developer SQL playground — while keeping the entire system observable, auditable, and continuously deployable on cloud infrastructure — is an engineering challenge that most hobby projects never attempt at this depth.

Approach

Designed a 7-phase architecture on Google Cloud. Phase 1–2: PostgreSQL on Cloud SQL with incremental watermark ingestion via a dedicated Cloud Run Job, raw snapshots to GCS, and Pub/Sub events triggering downstream jobs. Phase 3–4: XGBoost + LightGBM + TensorFlow Wide & Deep ensemble exported to ONNX, registered in Vertex AI Model Registry, and served via a live Vertex AI Endpoint — with SHAP per-game attributions and Kelly Criterion stake sizing. Phase 5: LangGraph supervisor-specialist multi-agent chatbot — a Supervisor node routes queries to StatsAgent (SQLQueryTool), NewsAgent (RAGRetrieverTool), and PredictionAgent (ExplainabilityTool), all as typed LangChain BaseTool subclasses; Google Gemini generates answers from ChromaDB + pgvector retrieval with Langfuse chain tracing and citation guardrails. Phase 6: MLOps loop with KL-divergence drift detection, statsmodels SPRT significance testing, automated retrain-policy evaluation, and Prometheus metrics. Phase 7: Cloud Build CI/CD builds 4 Docker images (API, ingestion, RAG, Prefect agent), pushes to Artifact Registry, runs Alembic schema migrations via Cloud Run Job before deploying to Cloud Run, and triggers a GitHub Actions regression gate — with all secrets managed via Secret Manager.

Outcome

Deployed a fully automated GCP pipeline: Cloud Scheduler triggers the NBA ingestion Cloud Run Job daily, which writes raw snapshots to GCS and publishes to Pub/Sub — driving the RAG refresh and prediction scoring jobs downstream with end-to-end audit lineage.

Shipped a production ML serving stack: XGBoost + LightGBM + TensorFlow Wide & Deep ensemble registered in Vertex AI Model Registry and served via a live Vertex AI Endpoint, producing per-game win-probability scores, SHAP feature attributions, and mathematically optimal Kelly Criterion stake sizing.

Built a LangGraph supervisor-specialist multi-agent system powered by Google Gemini: a Supervisor node routes each query to StatsAgent, NewsAgent, and/or PredictionAgent, then a Synthesiser node merges results — with pgvector RAG retrieval, Langfuse LLM-chain tracing, and citation guardrails keeping answers grounded.

Established a full MLOps loop: KL-divergence data drift detection, SPRT significance testing for retrain decisions, automated retrain-policy evaluation, and a GitHub Actions CI gate enforcing DB-backed invariant tests on every push.

Implemented zero-downtime Cloud Build CI/CD: builds 4 Docker images per commit, pushes to Artifact Registry, runs Alembic schema migrations as a Cloud Run Job (blocking deploy on failure), then deploys the API service to Cloud Run — all secrets sourced from Secret Manager.

Architected for multi-sport scale: incremental watermark ingestion, idempotent upserts, and a feature store design ready to extend from NBA to NBL, Football, Tennis, F1, and Cricket without schema rewrites.