Pulse Agent // AI SRE for OpenShift

// CAPABILITIES

Full-spectrum cluster operations. Autonomous when you trust it.

SRE Diagnostics AI

Investigate pod failures, crash loops, OOM kills, scheduling problems. Correlate events, logs, and Prometheus metrics for root cause analysis.

Security Scanner SEC

Privileged containers, RBAC audit, network policy gaps, image security, SCC analysis, secret hygiene. Full cluster security posture in one scan.

Autonomous Monitor AUTO

23 scanners running every 60 seconds. 11 reactive (crashloops, pending pods, failed deployments, node pressure, cert expiry, firing alerts, OOM kills, image pull errors, degraded operators, DaemonSet gaps, HPA saturation), SLO burn rate, security posture, 5 audit scanners, 5 predictive trend scanners.

Auto-Fix Engine AUTO

Default trust level 2: propose + confirm. Opt into level 3 for safe auto-fix, level 4 for full autonomous. Rollback snapshots for every action. Rate-limited with cooldown.

ORCA Intelligence AI

6-channel skill selector (keyword, component, historical, semantic, taxonomy, temporal). 7 skills with trigger patterns. Phased plan execution: triage, diagnose, remediate, verify, postmortem. Self-improving learned weights.

Self-Improving Memory AI

Learns from every interaction. Extracts runbooks from confirmed resolutions. Detects patterns. Augments prompts with relevant past incidents.

Confidence Scoring

Every finding, investigation, and action includes a 0-100% confidence score. Reasoning chains with evidence and alternatives considered.

Error Intelligence

Structured error types with 7 categories. Thread-safe ring buffer tracking. Circuit breaker with auto-recovery. Database resilience decorators.

Morning Briefing

Time-aware greeting, action summary, investigation count, current cluster state. The 30-second standup replacement for your cluster.

Generative Dashboards AI

83 PromQL recipes across 16 categories. Semantic auto-layout engine. discover_metrics → verify_query → build. Data-first, never empty charts.

Intelligence Loop AI

Feeds analytics back into the system prompt. Query reliability scores, dashboard patterns, error hotspots, token efficiency. The agent gets smarter over time.

Prompt Optimization

71% prompt reduction (28KB → 8KB). Selective component schemas. Selective runbooks. Token tracking per turn. Cache hit rate monitoring.

Tool Analytics AI

Full audit log of every tool invocation. Chain discovery via bigram analysis. Next-tool hints injected into prompts. Token tracking and usage stats API.

Fleet Management

Multi-cluster operations across your fleet. 5 fleet tools for cross-cluster visibility, comparison, and coordinated actions.

GitOps / ArgoCD

6 ArgoCD integration tools. Application sync status, health checks, rollback operations, and drift detection across GitOps-managed clusters.

Adaptive Tool Selection AI

TF-IDF prediction from real usage, Haiku LLM fallback for cold start, co-occurrence bundles, mid-turn chain expansion. Negative signals suppress wasted tools. Self-improving — LLM trains itself out of a job.

Eval Framework & Scores

16 eval suites, 192 scenarios. Release gate: 99.6%. Safety, adversarial, autofix, capacity, postmortem, SLO management suites. Selector: 59 scenarios. Ablation testing, regression tracking, baseline comparison.

Modular Architecture

49 modules across 3 packages. No file over 910 lines. Typo auto-correction for 130+ K8s terms. Centralized Pydantic config, zero raw os.environ.

PULSE AGENT