Observability: The Unsustainable Path

The observability industry needs to say something plainly that it has been avoiding.

The current path is not sustainable.

The reason is not that observability is unnecessary. The opposite is true. The systems we build are getting harder to understand, harder to predict, and harder to trust, and the need to know what is happening has never been greater. The trouble is the answer the industry keeps reaching for. When in doubt, collect more.

More traces.
More logs.
More metrics.
More spans.
More attributes.
More prompt captures.
More completions.
More tool calls.
More context.
More everything.

Every new uncertainty becomes an instrumentation opportunity. Every new technology becomes another source of telemetry. Every new failure mode becomes another reason to retain more data. With generated code and agentic systems, that reflex is now being pushed to an extreme. This is a condition that should be named: Telemetry Inflation.

Telemetry inflation occurs when the industry responds to increasing uncertainty by expanding collection at a faster rate than its understanding. This approach relies on the assumption that by gathering more data, preserving context, and transmitting enough records to various backends, a deeper understanding will eventually emerge. It will not.

The shortage is not telemetry. The shortage is discrimination at the point where telemetry is produced. We record more and more of what happened while still failing to express what mattered. And AI-generated software and agentic systems are about to expose that weakness brutally.

The world the tools were built for

For a long time, instrumentation looked like a manageable problem. You had services. Services made calls. Calls had latency and either succeeded or failed. Logs described notable events, metrics described changing quantities, traces described paths through distributed execution. Not perfect, but the shape of the tools matched the shape of the world.

A request came in.
A span started.
A database was called.
A downstream service timed out.
A metric crossed a threshold.
An engineer opened a dashboard.

That model was designed for a world where the primary challenge was reconstructing the execution process across different components. It was a forensic approach, and the evidence available was relatively limited.

The world has changed.

We generate code with models. We let agents change systems, call tools, retrieve documents, write summaries, compose actions, and hand work to other agents. Structured calls have given way to vast amounts of free-form context. The old world had parameters and API contracts. The new world has prompts, retrieved fragments, chat histories, tool results, policy instructions, model responses, generated code, and memory.

The reflex has not changed. It still says: collect it.

Capture the prompt. Capture the completion. Trace the tool call. Record the retrieval. Count the tokens. Store the generated code. Keep the plan, keep the failure, keep the summary the second model wrote about the first model’s failure. At some point this stops being observability and becomes hoarding.

Paying for the same contenT

The absurdity is that the same content is now paid for repeatedly. We pay to send context into the model. We pay to capture that same context as telemetry. Then we pay to store it, index it, govern it, query it, redact it, summarize it, and feed it back into another model so the second model can explain what the first one did. The model context becomes telemetry. The telemetry becomes future model context. That future context becomes more telemetry.

That loop is inflation wearing the costume of intelligence.

The industry will call this “GenAI observability” or “agent observability” or “AI-native monitoring.” Much of it is the old model with larger payloads. Prompts become span attributes. Completions become events. Token counts become metrics. Agent runs become traces. Conversation gets flattened into the language of distributed execution.

A conversation is not a call graph.

A trace can show that an agent called a search tool, then a code tool, then wrote a file, then returned a response. Useful. It will not tell you whether the agent understood the task, used the right context, stayed inside its authority, confused a user preference with a policy, or made a commitment it could not keep. Those are situational questions, and this is the gap the industry keeps stepping around.

Standardizing is not understanding

The vendors have become very good at collecting, transporting, indexing, correlating, and visualizing residue. They are much weaker at helping systems express local meaning. They built enormous machinery for moving evidence downstream and very little for helping a component say what it knows while the work is happening.

So the answer is always more data.
More traces for microservices.
More labels and pipelines for Kubernetes.
More dashboards for cloud.
More metrics for cost.

Correlation for complexity. Sampling for too much telemetry. AI summarization for too much complexity. And now, for agents: record the prompt, the completion, the tool calls, the context, the tokens, the whole interaction.

This is the same paradigm inflated.

The vendors are not innocent bystanders here. When Grafana or anyone else reports that observability complexity is rising, the report may be accurate, but it is not neutral. The major players, such as Grafana, Datadog, Dynatrace, Elastic, Splunk, New Relic, Dash0, the OpenTelemetry ecosystem, have defined the categories that now shape the problem.

They taught the industry to see modern systems through traces, metrics, logs, profiles, collectors, exporters, pipelines, dashboards, and alerts. But standardizing the exhaust of systems is a different act from making systems intelligible.

The industry standardized the movement of telemetry, not the meaning of behavior. The world changed and the ontology stayed put. We carry old categories and ask them to explain new realities. Logs, metrics, and traces were already strained by distributed systems. They strain further under agentic ones, where the live questions are what was believed, what was intended, what was promised, what was violated, and what should have happened instead.

A metric can tell you a value crossed a threshold. It cannot tell you a component broke a promise. A trace can tell you service A called service B. It cannot tell you that A handed off work B was no longer in a position to honor. A log can tell you an event occurred. It cannot tell you whether the event mattered in the situation.

These tools are not altogether useless. They are simply not enough, and the danger is treating them as if they were.

What agents require

In an agentic workflow the important facts are usually not low-level execution facts. They are facts about context, authority, expectation, and judgment.

What task was the agent trying to complete?
What context did it treat as authoritative?
What constraint was active?
What boundary was it forbidden to cross?
What did it commit to the user?
What condition would require escalation?
What evidence made it change course?
What did it know was uncertain?
What did it fail to notice?

When a system cannot express these distinctions locally, all we can do is reconstruct them later from residue.

That’s the forensic model: the system operates without awareness, releases emissions, and an external observer deduces what was significant after the event. As the system becomes more intricate, its sustainability diminishes.

Meaning must converge closer to the point of action.

A component should not only emit the fact that it performed an action but also provide information about its intentions and the nature of the boundary it crossed. It should not only indicate a call failure but also specify the commitment that the failure threatens. Additionally, it should not only report latency but also whether it remains within a promise made.

That is the difference between telemetry and signs.
Telemetry says: something happened.
A sign says: something happened that matters in this way.

The industry uses “signal” casually, and the word has gone soft. A metric is a signal, a span is a signal, a log line, a profile, a prompt, a token count — all signals. When everything is a signal, the word stops helping.

A sign in the stronger sense is a meaningful distinction made by a participant in the system. Late. Blocked. Overloaded. Contradicted. Out of authority. Commitment missed. Expectation violated. Escalation required. Context insufficient. Evidence weak. These sit above telemetry rather than replacing it. They turn raw happening into situated meaning.

How the inflation shows up

The current architecture’s weakest point is its lack of local judgment, understanding at the source, thin semantics, comprehension, awareness, and downstream guesses about residue. The problem is not that traces, logs, metrics, and profiles exist. It is that the industry treats them as the natural shape of observability. They are not natural. They are historical artifacts of an earlier age of systems, stretched far past their original limits. So every new thing gets forced into the same machinery. Serverless becomes cold-start metrics. CI/CD becomes pipeline telemetry. GenAI becomes prompt spans. Agents become traces with tool calls. The machinery expands while the imagination holds still.

Telemetry Inflation has a recognizable set of symptoms.

The first is capture without a theory of selection. Teams instrument out of fear of missing data later. The decision to capture is not grounded in an account of what will be learned, who will use it, how long it should live, or what risk its storage creates.

The second is duplication disguised as observability. The same content appears in the application, the model context, the trace, the logs, the analytics system, the audit store, and the incident summary. Each copy is justified locally. Together they are a governance and cost problem.

The third is semantic poverty under technical sophistication. The stack advances while the meaning stays thin. Better pipelines, storage, indexing, dashboards, compression, sampling, visualization — and still no good way to say what the system thought was happening.

The fourth is AI used as a mop. Having built a telemetry environment too large for humans to read, the industry adds AI to summarize it. That can help in places. It also adds another interpretive layer after the fact, and that layer needs context, and the context it consumes becomes another thing to observe.

The fifth is loss of proportionality. A small action throws a huge observational shadow. One agent interaction can produce prompt content, completion content, token metrics, tool traces, retrieval metadata, policy decisions, logs, code diffs, evaluation outputs, and summaries. The observation grows larger than the act.

That last symptom is the one to sit with.

A healthy observability system stays proportionate to the system it observes. It makes that system more understandable rather than spawning a second system just as hard to understand. Modern observability increasingly becomes a shadow system with its own costs, failures, complexity, access controls, retention rules, pipelines, and specialists. We built systems that require observability, then observability systems that require observability.

That is the model eating itself.

A difference That Makes A Difference

The sustainable path starts from a different default. Record what is justified, meaningful, and governed, rather than recording everything until someone intervenes. Systems need to emit more local signs. Instead of only “request started, tool called, completion received, span ended,” we want: task accepted, context judged sufficient, authority boundary checked, tool result contradicted plan, commitment made, commitment at risk, output failed contract, escalation required, memory write rejected, evidence weak, user intent unresolved, generated change touched a protected boundary. These signs carry meaning raw traces lack. They make the system more sighted and lighten the load on the external observer, because the participant is already drawing distinctions while the work happens.

This is not a call to abandon observability. It is a call to move past its current industrial form. The next step is not another backend, another dashboard, another “signal type.” Adding prompts to “logs, metrics, traces, profiles” only extends the old taxonomy. The next step is a different contract between a component and the world around it.

What do you promise?
What do you expect?
What do you depend on?
What boundary are you crossing?
What distinction have you drawn?
What departure have you detected?
What consequence follows?
What evidence justifies retention?

That is a richer model of observability, because it brings meaning close to action.

The industry has to stop treating more telemetry as more understanding. More telemetry helps when the missing thing is evidence. It hurts when the missing thing is judgment. Right now the missing thing is judgment.

When code originates from agents, our confidence in its authorship diminishes. We require more than just the code’s execution history; we need to understand the intent that shaped it, the assumptions that guided it, the boundaries it crossed, and the commitments it impacted. Multi-agent systems amplify this complexity. One agent delegates tasks to another, which in turn calls upon tools, modifies its state, informs another model, and ultimately responds to a user.

The trace meticulously preserves this chain of events. However, it’s important to recognize that the chain itself is not the situation. The situation encompasses trust, authority, evidence, role, expectation, and consequence. While a trace can accurately capture the sequence of events, it lacks the ability to bestow a system with a conscience.

That is the actual problem.

We are granting systems more autonomy without giving them enough ways to express the judgments autonomy requires. We make systems act more freely and watch them with tools designed for machinery that had no freedom at all. The gap between action and understanding widens, and Telemetry Inflation fills the void when we lack a better solution. It involves data, exhaust, transcripts, traces, dashboards, and AI summaries of those dashboards.

The path is unsustainable because it rests on a false hope: that enough after-the-fact evidence compensates for the absence of in-the-moment meaning. A system that cannot say what matters while it acts will always need someone else to reconstruct meaning later, and at agentic scale that reconstruction stops being feasible by brute force.

The future of observability is situated selection. Local signs. Systems that know enough about their own commitments to report when those commitments are strained, broken, contradicted, or at risk.

The industry provided teams with tools to observe distributed systems, standardized instrumentation, enhanced portability, and easier diagnosis of failures. However, it also created a restrictive framework that views the world as traces, metrics, logs, profiles, pipelines, dashboards, and alerts. This framework is inadequate for modern systems, as agents, generated code, conversational context, situational awareness, and even meaning are beyond its reach.

The industry can keep expanding the cage with more fields, conventions, storage, backends, summaries, retention rules, cost controls. Or it can admit the cage is the wrong shape. Telemetry Inflation is the warning. It tells us we have mistaken accumulation for awareness, that the cost of observing is outrunning the value of what is observed, that our systems produce more evidence than they can explain, and that the human is being buried under the very data meant to help them see. The way out is better sight. More discrimination. Evidence with purpose. More signs.

One question is enough to test the path: are we helping systems become more understandable, or only making their confusion more searchable? If it is the latter, the current path deserves to fail. Observability cannot survive the agentic era as a doctrine of record everything. It has to become a discipline of meaning, selection, and situated judgment.

The future is not more telemetry. The future is systems that can say what matters.