ml-pipeline.ai

Autonomous ML Pipeline — Raw Data to Trained Model, Zero Human Intervention

Overview

ml-pipeline.ai is built on a Supervisor + Specialist Node pattern using LangGraph's StateGraph. A central state object flows through six specialist nodes, with a Critic node that can route execution backward for iterative refinement. Give it a CSV and a goal in plain English — it handles the rest.

The Critic loop is what makes this autonomous rather than just automated. Traditional pipelines execute linearly. This one reflects, reasons about quality, and takes corrective action — profiling the data, engineering features, training multiple model candidates with Optuna hyperparameter tuning, and iterating up to 3 times until quality thresholds are met.

LLM Provider

Claude Sonnet 4.5

Architecture

LangGraph StateGraph

Status

Active Development

6-Phase Pipeline

Each phase is a LangGraph node — a pure function that reads from PipelineState, executes work, and writes results back

Data Profiler

Phase 1

Automated statistical profiling — shape detection, dtype analysis, missing values, correlations, task type inference (classification vs. regression), and target column identification.

Feature Engineer

Phase 2

LLM-generated feature transformations based on data profile insights. New columns, dropped columns, shape changes, and validation — all tracked. The Critic may route back here if model performance is insufficient.

Visualizer

Phase 3

Automated EDA visualizations using a custom Seaborn dark theme — count plots, histograms with KDE, correlation heatmaps, violin plots, and scatter matrices rendered at 150 DPI.

Model Trainer

Phase 4

Trains 3+ model candidates with cross-validation, compares accuracy/precision/recall/F1, runs Optuna hyperparameter tuning on the best candidate, and produces feature importance rankings.

Evaluator

Phase 5

Cross-validation analysis, overfitting risk assessment, test metrics, and LLM-synthesized evaluation summaries with confusion matrices and ROC curves.

Critic Review

Routes backward
Phase 6

The differentiator. Evaluates the full pipeline state — model metrics, overfitting risk, feature quality — and decides whether to finalize, refine features, or retrain. This creates a self-improving loop.

Neural Observatory

The Neural Observatory is the real-time monitoring dashboard. It streams pipeline state via 2-second polling, rendering each phase as it completes — with live graph updates, animated timelines, and immediate results visualization.

Pipeline Graph

Node-by-node progress with Critic loop visualization and conditional edge highlighting

Phase Timeline

Duration tracking with loop-aware status — amber “will re-run” indicators when Critic iterates

Results Panels

Phase-specific visualization — data profiling, feature diffs, Seaborn charts, model comparison, evaluation metrics

Architecture


  ┌───────────────────────────────────────────────────┐
  │              Neural Observatory (Next.js 15)       │
  │         Real-time Pipeline Monitoring Dashboard    │
  └────────────────────┬──────────────────────────────┘
                       │ 2s polling
  ┌────────────────────▼──────────────────────────────┐
  │              FastAPI + Pydantic v2                  │
  │    REST API, streaming status, dataset upload       │
  └────────────────────┬──────────────────────────────┘
                       │
  ┌────────────────────▼──────────────────────────────┐
  │         LangGraph StateGraph Orchestration          │
  │  ┌─────────────────────────────────────────────┐  │
  │  │  Profiler → Features → Viz → Train → Eval   │  │
  │  │                                      ↓       │  │
  │  │                              ┌───────────┐   │  │
  │  │              ← refine ←──── │   CRITIC   │   │  │
  │  │              ← retrain ←──  │  (LLM Q/A) │   │  │
  │  │                              └─────┬─────┘   │  │
  │  │                                    ↓         │  │
  │  │                               finalize       │  │
  │  └─────────────────────────────────────────────┘  │
  └────────────────────┬──────────────────────────────┘
                       │
            ┌──────────┼──────────┐
            │          │          │
        ┌───▼───┐ ┌───▼───┐ ┌───▼────┐
        │SQLite │ │ LLM   │ │Sandbox │
        │ State │ │Claude │ │ Exec   │
        └───────┘ └───────┘ └────────┘

Technology Stack

Orchestration

LangGraph
LangChain Core
LangSmith

LLM Providers

Claude Sonnet 4.5
GPT-4o

Backend

Python 3.12
FastAPI
Pydantic v2
uv

ML / Data

scikit-learn
XGBoost
LightGBM
Optuna
pandas
NumPy

Visualization

Seaborn
Matplotlib

Frontend

Next.js 15
TypeScript
Tailwind CSS
Lucide

Infrastructure

Docker Compose
SQLite
structlog

Part of the AI Ecosystem

Three specialized platforms designed to compose — each solving a distinct domain while sharing architectural patterns

Commander.ai

Orchestration & Command

Multi-agent task decomposition and coordination — the brain that delegates work across specialist agents.

View project →

WorldMaker.ai

Lifecycle Intelligence

Enterprise digital asset lifecycle analysis — understands what exists, how it's connected, and what code to generate.

View project →

ml-pipeline.ai

Autonomous ML

This Project

Self-improving machine learning pipeline — takes raw data to trained model with zero human intervention.

The strategic intersection: building smaller specialized solutions that aggregate into an ecosystem. LangGraph state machines, LLM-driven specialist nodes, and real-time observation UIs are shared architectural patterns. Commander.ai orchestrates agents. WorldMaker.ai generates intelligence about digital assets. ml-pipeline.ai trains models autonomously. Together, they form a composable AI platform where each system amplifies the others.

Implementation Highlights

Self-Improving Critic Loop

The Critic node evaluates the full pipeline state — model metrics, overfitting risk, feature quality — and autonomously decides to finalize, refine features, or retrain. Bounded by configurable MAX_LOOPS (default: 3), this creates a self-improving system that reasons about its own output quality.

Sandboxed Code Execution

LLM-generated Python code executes in a subprocess sandbox with enforced timeouts, memory limits, and import restrictions. Every phase generates code dynamically — from feature transformations to model training scripts — then validates output before updating pipeline state.

Multi-Candidate Model Training

Trains 3+ model candidates (scikit-learn, XGBoost, LightGBM) with cross-validation, compares across accuracy/precision/recall/F1, then runs Optuna hyperparameter tuning on the best candidate. All metrics stream to the Neural Observatory in real time.

Cloud-Ready Architecture (AWS)

Production CDK deployment maps to ECS Fargate (private subnets), Cognito authentication, EFS for artifact persistence, S3 for model storage, and Secrets Manager for API keys. Internal-facing by design — LLM-generated code execution demands network isolation and defense-in-depth.