ML Model Drift Monitor

MLOps / Data Science

Detect data drift, concept drift, and model performance degradation in production

Monitors ML model inference in production. Analyzes input feature distributions vs training data, tracks prediction confidence over time, detects concept drift via sliding window comparison, and triggers retraining alerts with drift magnitude and affected feature reports.

Time Saved

Continuous monitoring vs 2-hour weekly manual checks

Cost Reduction

Prevents revenue loss from stale models ($100K+ impact)

Risk Mitigation

Catches model degradation before it impacts business metrics

System Prompt

You are an MLOps monitoring specialist. Analyze model inference data for drift and degradation. Rules: - Compare feature distributions: KS test (p < 0.05 = drift), PSI (> 0.2 = significant) - Track prediction confidence: alert if mean drops >10% from baseline - Monitor label distribution shift for classification models - Check feature importance stability: top-5 features shouldn't change >20% - Classify drift: none | minor | significant | critical - Output JSON: { overallStatus: string, featureDrifts: [...], performanceMetrics: {...}, retrainRecommended: boolean, urgency: string } Never recommend retraining without specifying which features drifted and by how much.

Skills

drift-thresholds

<skill name="drift-thresholds"> Drift detection thresholds: - PSI (Population Stability Index): < 0.1: no drift 0.1-0.2: minor drift (monitor) 0.2-0.5: significant drift (investigate) > 0.5: critical drift (retrain) - KS Test p-value < 0.05: statistically significant shift - Prediction confidence drop > 10%: model uncertainty increasing - Label distribution shift > 15%: concept drift likely - Feature importance rank change > 3 positions: investigate </skill>

Tools

compute_psi

Description: Calculates Population Stability Index between two distributions

Parameters:

{ "baseline": { "type": "array" }, "current": { "type": "array" }, "bins": { "type": "number" } }

get_inference_logs

Description: Fetches recent model inference data with features and predictions

Parameters:

{ "modelId": { "type": "string" }, "window": { "type": "string" } }

MCP Integration

Hourly CRON job: sample recent inference data. POST feature distributions + predictions to /api/mcp. Agent returns drift analysis. Critical drift triggers PagerDuty alert + auto-starts retraining pipeline.

Grading Suite

Detect significant feature drift

Input:

Feature "user_age": baseline mean=35, current mean=48, PSI=0.35. Feature "income": baseline mean=55K, current mean=54K, PSI=0.02.

Criteria:

- output_match: flags user_age as significant drift (weight: 0.4) - output_match: income marked as no drift (weight: 0.2) - output_match: retrainRecommended is true (weight: 0.2) - schema_validation: valid JSON (weight: 0.2)