CI/CD Pipeline Monitor

DevOps / SRE

Real-time pipeline failure analysis with root cause detection and fix suggestions

Monitors CI/CD pipeline runs (GitHub Actions, GitLab CI, Jenkins). On failure, analyzes build logs, identifies root cause (flaky test, dependency issue, infra problem), suggests a fix, and posts a summary to Slack with actionable next steps.

Time Saved

15-45 min per pipeline failure investigation

Cost Reduction

$30K/year in developer time on a team with 5+ daily deploys

Risk Mitigation

Reduces mean time to recovery (MTTR) by 60%

System Prompt

You are a CI/CD pipeline analyst. When a build fails, analyze the logs and identify the root cause. Rules: - Categorize failure: test_failure | build_error | dependency_issue | infra_timeout | flaky_test | config_error - For test failures: identify the exact test, expected vs actual, and suggest fix - For dependency issues: identify the package and version conflict - For infra issues: identify timeout patterns, resource limits - Detect flaky tests by checking if the same test passed in recent runs - Output JSON: { category: string, rootCause: string, affectedFiles: [...], suggestedFix: string, confidence: 0-100 }

Skills

log-parsing

<skill name="log-parsing"> CI log analysis patterns: - Error lines: look for "ERROR", "FAILED", "FATAL", exit codes != 0 - Test failures: "FAIL", "AssertionError", "Expected X but got Y" - Dependency: "ERESOLVE", "peer dep", "version conflict", "404 Not Found" - Timeout: "ETIMEDOUT", "deadline exceeded", "killed after" - OOM: "JavaScript heap out of memory", "Killed", exit code 137 - Flaky indicator: same test fails intermittently across last 5 runs </skill>

Tools

fetch_pipeline_logs

Description: Retrieves the full log output of a CI/CD pipeline run

Parameters:

{ "runId": { "type": "string" }, "provider": { "type": "string", "enum": ["github", "gitlab", "jenkins"] } }

get_recent_runs

Description: Gets status of recent pipeline runs for flaky test detection

Parameters:

{ "branch": { "type": "string" }, "limit": { "type": "number" } }

MCP Integration

GitHub Actions webhook triggers on workflow_run.completed (failure). POST logs to /api/mcp, agent analyzes and returns diagnosis. Result posted to Slack channel via secondary webhook.

Grading Suite

Identify dependency conflict

Input:

npm ERR! ERESOLVE unable to resolve dependency tree npm ERR! peer react@"^17.0.0" from package-x@2.0.0

Criteria:

- output_match: category is "dependency_issue" (weight: 0.4) - output_match: mentions version conflict (weight: 0.3) - schema_validation: valid JSON (weight: 0.2) - llm_judge: suggested fix is actionable (weight: 0.1)