AI SRE – The Verbalization Layer

Today, internal platform teams at banks, logistics firms, telcos, and retailers are actively building in-house AI SRE capabilities. The product category they are reproducing is structurally open to any competent engineering organization because it consists strictly of model access wired into operational surfaces the buyer already owns: telemetry, ticketing, runbooks, and deployment pipelines. The commercial vendor market sits adjacent to this internal activity, offering merely to configure a language model against those same pre-existing, pre-paid surfaces.

This economic reality is quite telling. It reveals that a vendor’s product must inevitably converge toward the commodity cost of its underlying tokens. True differentiation becomes a challenge, prompting the need for scaffolding, integration glue, and a marketing surface to categorize the assemblage. A buyer who employs competent platform engineers already has an internal alternative that continuously improves, operates on proprietary data, and retains the margin in-house. In contrast, a buyer who seeks outsourced engineering judgment from a vendor faces a structural deficit that an AI SRE product can only partially address. The economics reveal the true nature of today’s “AI SRE” product is plumbing.

Builders of Words, Not Worlds!

In current practice, building an AI SRE consists entirely of connecting a language model to a buyer’s existing data surfaces: Datadog for metrics and traces, Sentry for errors, PagerDuty for incidents, Jira for tickets, and a git repository for institutional memory. The agent is granted access through MCP servers or equivalent gateways. A prompt instructs it to summarize, correlate, and hypothesize. The output is text.

The resulting artifact is a verbalization layer over states. Records describe what the system was at discrete moments captured by instrumentation. The agent paraphrases these records; the operator reads the paraphrase.

They builders of these solutions talk about integrations, schemas, retrieval, embeddings, context windows, and evaluations against historical incident corpora. However, the vocabulary of actual system behavior, such as coupling, propagation, saturation, hysteresis, control loops, and situational awareness, resides elsewhere. The thing being constructed sits outside the system. The system itself remains in its original state: unknown and unmodeled.

Three Legitimate Uses

The verbalization layer has genuine utility at three distinct levels.

The agent summarizes tickets, incidents, logs, and runbooks. It compresses incident histories into narratives and searches across surfaces that an operator would otherwise have to traverse manually.
The agent helps operators explore telemetry faster by translating natural-language questions into structured queries against metrics, traces, and logs. It surfaces data the operator would otherwise have to compose by hand.
The agent generates initial hypotheses, postmortem skeletons, and status-page summaries as first drafts. A human then applies engineering judgment to the text, verifying facts and correcting errors.

This highlights how the inflated marketing claims of AI SRE vendors confuses such tools with autonomous AI SREs.

The Vanishing Model

Site reliability engineering emerged as a discipline predicated on a specific shape of operator competence. The senior engineer carried an internal representation of the service, such as a mental model that included error budgets, request flows, saturation curves, queueing dynamics, failure modes, dependency graphs, deployment topologies, and a working theory of how the system would degrade. This representation was forged through the friction of post-incident reviews, code reviews, pager rotations alongside mentors, and the slow, painful accretion of incidents survived. The representation lived inside the person and was activated in the moment of operation.

This representation is precisely what the discipline meant by a model. It was internal, partial, revisable, and indispensable. Dashboards were prompts to this model; metrics were inputs to it; runbooks were artifacts of it.

The AI SRE product fundamentally shifts the location of the work. While the records, dashboards, and runbooks remain, the model itself persists in fragmented forms, including within code, architecture diagrams, deployment topologies, SLO definitions, and the minds of human experts who were trained before the transition. The model that used to sit at the center of operational competence has been evicted to the periphery. The center is now occupied by a language model holding parameters trained on text. The system model is now substituted with a plausible prose artifact.

The Geometry of the Shadow

To understand what the agent sees, we must view it as a projection. A distributed system is fundamentally high-dimensional, governed by latencies, queue depths, cache hit rates, garbage collection pauses, network jitter, retry storms, partial failures, request shapes, deployment versions, configuration drift, and the tight temporal coupling between them all. Instrumentation inevitably collapses this high-dimensional reality onto a flat surface of dashboards and logs. The agent queries this flat surface and verbalizes its findings. The operator inhabits this verbalization. The verbalization feels complete because every query returns an answer. Every incident receives a summary; every alert receives a hypothesis. Yet, the dimensions collapsed away by the projection are precisely where novel failures live.

The shadow on the wall is a faithful projection, and the cave wall is a real wall. However, treating the projection as the system itself is the foundational category error that organizes the entire AI SRE industry.

Corpus-Bounded Competence

An agent’s grasp of failure is inherently retrospective. Its competence is concentrated strictly where text describing past failures already exists: postmortems, runbooks, and training corpora. But failures well-represented in text are, by definition, failures that have already been recognized, neutralized, automated away, or absorbed into the platform.

The interesting failures in any mature system are those that have yet to be seen. They live outside the training data, outside the runbooks, and outside the pattern library.

SRE as a craft has always been epistemic work performed at the edge of the known. The value of a senior engineer is the ability to form a viable working hypothesis from sparse, contradictory, and partially instrumented signals.

The agent is highly competent at the problems competence has already solved. At the edge of the known, where true competence is required, the agent remains trapped inside its corpus.

The Apprenticeship Path

Systems competence has traditionally been passed down through apprenticeship. This involved the seasoned engineer rotating the pager, conducting post-incident reviews with those who possessed the mental model, and conducting code reviews where architectural assumptions were exposed. The senior engineer reinforced the model through teaching, while the junior engineer instantiated it through learning. However, the AI SRE product disrupts this transmission chain.

The junior engineer reads the agent’s summary, while the senior engineer relies on it. The model that was intended to be passed down is now left stranded. The operationally active model has become obsolete and is no longer present. Institutional memory is now outsourced to a static statistical process that only retains memory from training time.

This hollowing-out is immediate and generational. The initial cohort of operators learning the discipline through AI-mediated incident response will interact with the agent instead of a system model. Consequently, the subsequent cohort will enter a profession without internal representation in the curriculum or lineage.

Situational Intelligence tools shouldn’t just act or summarize, but must educate and expose systemic assumptions to the operator during peacetime, thereby preserving the apprenticeship loop rather than shattering it.

Plausibility as Product

The most dangerous attribute of a large language model in this domain is its plausibility.

Correlation-as-narrative is what an agent produces, and the narrative is compelling because language models are optimized for fluency. Plausibility satisfies the auditor, fits the postmortem template, and pacifies the executive dashboard. Plausibility is what is purchased. Meanwhile, the system continues to behave exactly as its underlying dynamics dictate—regardless of how well the plausible narrative approximates reality.

When the narrative aligns with the dynamics, operators feel satisfied. However, when the narrative diverges from the dynamics, operators experience the same level of satisfaction through a different mechanism: the narrative remains coherent, the dashboard remains green, and the postmortem remains coherent. The divergence remains unnoticed until the system fails in a manner that surpasses the narrative’s capacity to comprehend it.

The vendors selling AI SRE products have simply rediscovered the monitoring industry of the early 2000s, adding a language model to the front-end. The shape of the product is identical; the mechanism of value capture is familiar. The only new ingredient is plausibility at scale. Verbalization is the honest name for this product category.

The Failure Mode

The failure mode is already present in today’s deployments. A future outage will only make it visible. The failure involves context-trashing at a single-agent scale, occupying the cognitive space that should be used for critical human inquiry. Operators read the narrative and skip the investigation, which is crucial. The skipped inquiry is where failure hardens.

An agent’s confidence is a function of context, bounded tightly by what the records show. The instrumentation gap is the ultimate failure surface, yet this surface is exactly where the agent’s confidence runs highest because the verbalization mechanism produces fluent prose regardless of underlying data coverage.

The agent speaks most elegantly about what it can see, which is invariably what is already well-understood. It will speak with that exact same fluency about the parts beyond its visibility, because fluency is a structural property of the verbalization mechanism itself, completely decoupled from underlying truth.

The outage will arrive. It will be explained, after the fact, by an agent summarizing a postmortem written by humans who attempted to diagnose it without the agent. The summary will be plausible. It will be published. And the next outage will be quietly set up by the illusions of the last.

Ashby’s Law of Requisite Variety

Ashby’s Law of Requisite Variety asserts that the variety of a regulator must be equal to or greater than the variety of the system it regulates. However, the AI SRE’s variety is constrained by the variety of its inputs. These inputs are limited by the existing instrumentation, which is designed to detect specific failure modes. In contrast, the running system’s variety far surpasses all these constraints, and this gap increases exponentially with architectural complexity.

The industry is selling regulators whose variety is structurally inadequate to govern the systems they claim to manage. This inadequacy is a structural property of placing a verbalization layer over an instrumentation projection of a system whose dynamics extend far beyond that projection. Because it is structural, this inadequacy cannot be fixed by prompt tuning, larger corpora, or expanding context windows.

Situating Intelligence

Building genuine situational intelligence at the system layer remains entirely untouched. It demands operators—human or otherwise—who inhabit an actual representation of the running system: a representation that grasps behavior, holds dynamics, anticipates propagation, and supports rigorous inquiry at the edge of the known. It requires restoring the model to a position where it can perform real operational work. It demands giving the system itself something to think with: signs over signals, situations over states, and holonic scoping over flat record graphs.

The architectural layer this work points toward has a name worth fixing: the situational intelligence layer. This layer actively participates in the regulation of the system. It carries the model, supports inquiry, and transmits competence through daily use. This work has its own discipline, its own concepts, and its own pace of maturation. Its market is structurally defensible. The verbalization market will inevitably compress toward the marginal token cost of the model behind it. The situational intelligence layer is where engineering judgment, systems understanding, and regulation live.