
How Tooling Consumed and Corrupted a Concept
Walk into any observability or site reliability engineering conference today, and you’ll find sessions on OpenTelemetry instrumentation patterns, OTLP optimization techniques, and “creative” ways to model domain problems as spans and metrics. What you won’t find is a serious discussion about what observability means, what it’s meant to achieve, or how it relates to the fundamental human challenge of understanding complex systems under conditions of uncertainty.
OpenTelemetry has taken all the oxygen out of the room.
This isn’t merely a complaint about tool dominance. It’s a diagnosis of an epistemological collapse. When an entire engineering discipline conflates the instrument of observation with the act of observing, and worse, with the purpose of observation, it has ceased to think critically about its own foundations. The consequence is an industry trapped within a conceptual ceiling that it doesn’t recognize as such.
The Conflation Problem
The pattern is now so normalized that few notice it. Engineers speak of “doing observability” when they mean “collecting telemetry.” Conference talks present OpenTelemetry adoption as a key to achieving observability maturity. The unstated assumption permeating the discourse is this: if we standardize telemetry export formats and emit enough structured data, understanding will emerge. This is cargo cult engineering at an industrial scale.
Consider what’s actually happening. A team instruments their service with OpenTelemetry. They export traces, metrics, and logs to a backend. They build dashboards. They set up alerts. They declare themselves observable. But observable with respect to what? Observable for what purpose? These questions rarely surface because the tooling has become both the means and the end.
The original concept from control theory carries a specific meaning: a system is observable if its internal state can be reconstructed from its external outputs. This is inherently purposive—you need to infer state because you intend to maintain control, to achieve homeostasis, to respond to perturbations. Observability isn’t data collection. It’s a property that enables effective action under uncertainty. What the industry practices is something else entirely. It’s instrumentation without intention. It’s a measurement without meaning-making. It’s the accumulation of data without the interpretation that’d elevate them to signs carrying significance.
The Situational Awareness Vacuum
Perhaps the most telling absence in contemporary observability discourse is any serious engagement with situational awareness—the very thing engineers presumably seek when they instrument their systems. Mica Endsley’s foundational model describes three levels: perception of elements in the environment, comprehension of the current situation, and projection of future states. This maps directly onto what engineers believe they’re doing, yet the tooling-centric approach addresses only the first level, and even that incompletely.
Perception requires not just data but relevant data filtered and organized within a frame of reference. OpenTelemetry gives you spans and metrics. It doesn’t give you relevance. It doesn’t give you context. It certainly doesn’t give you the interpretive framework that transforms raw observations into situational understanding.
Comprehension, understanding what the data means, requires moving from syntax to semantics. A trace showing high latency on a database call is syntactic information. Understanding that this indicates connection pool exhaustion due to a downstream service’s retry storm is a semantic interpretation. The former is what tools capture. The latter is what engineers need but must construct entirely in their heads, with no assistance from the observability stack itself.
Projection, anticipating what’ll happen next, requires causal models and understanding of system dynamics. This is where the current paradigm fails most completely. Traces are archaeological artifacts of past behavior. They tell you what happened, not what it means for what’s coming. The mental leap from historical telemetry to future state prediction happens entirely outside the tooling, in the cognitive processes of experienced engineers who’ve internalized models that no dashboard captures.
The result is an industry that has optimized for data export while ignoring the cognitive processes that actually constitute situational awareness. We’ve built elaborate pipelines to move bytes while neglecting the far harder problem of enabling meaning.
OpenTelemetry as Conceptual Ceiling
OpenTelemetry’s success is also its trap. By providing a standard way to instrument applications and export telemetry, it legitimized a particular way of thinking about observability, one centered on the three pillars of traces, metrics, and logs. This framework, while useful for certain purposes, has become the conceptual vocabulary within which all observability problems must be expressed.
This creates Maslow’s hammer at the civilizational scale. Can’t model your concern as a trace? Then it’s not a valid observability concern. Can’t express your insight as a metric? Then it’s not actionable. The tool defines not just the solution space but the problem space itself.
Consider phenomena that matter deeply for system understanding but fit awkwardly into the pillars’ model: emergent behaviors arising from component interactions, gradual state drift that manifests as subtle distribution shifts, coordination failures that leave no single clear trace, or the loss of semantic coherence when subsystems evolve independently. These are observability concerns in any meaningful sense—they affect our ability to understand and control systems—yet they resist expression in OpenTelemetry’s vocabulary.
The response from the community is telling: extend OpenTelemetry to handle these cases. Add semantic conventions. The tool must grow to encompass every concern because we’ve forgotten how to think outside it. This is how a conceptual ceiling operates, not by explicitly prohibiting alternative approaches but by making them cognitively invisible.
Meanwhile, the conferences reinforce the ceiling. Vendors sponsor talks that showcase their products’ clever uses of OpenTelemetry. Engineers present war stories framed as “how we bent OTLP to solve X.” The implicit message is consistent: this is the vocabulary, work within it. Critical examination of whether the vocabulary is adequate for the problems we face becomes professionally risky, even somewhat heretical.
The Theory Deficit
Beneath the tooling fixation lies a deeper problem: the observability discipline lacks theoretical grounding. Engineers adopt practices without frameworks for understanding why those practices should work or what assumptions they encode. This isn’t unique to observability; much of software engineering operates on folklore and best practices rather than theory, but the consequences are particularly severe here because observability sits at the intersection of complex systems, human cognition, and decision-making under uncertainty.
Without theory, we can’t distinguish between essential and accidental complexity in our approaches. Is the three-pillars model fundamental to observability, or is it an artifact of what was tractable to implement given mid-2010s infrastructure constraints? Without theory, we can’t evaluate alternatives. Is there a better way to represent system behavior than traces? How would we even know?
Control theory, from which observability borrows its name, provides such a framework. So does cybernetics, with its emphasis on feedback, regulation, and purposive behavior. Semiotics offers vocabulary for discussing how raw data becomes meaningful signs within interpretive contexts. Cognitive systems engineering addresses how humans develop and maintain awareness of complex systems. These theoretical traditions could inform observability practice, but they remain almost absent from the discourse.
The absence is self-perpetuating. Without theory, we can’t articulate what’s wrong with current approaches beyond vague dissatisfaction. Without articulation, we can’t propose alternatives. Without alternatives, tooling incumbents face no conceptual competition. The ecosystem ossifies around whatever gained early traction, regardless of whether it’s actually adequate for the challenges we face.
The Vendor Economics of Stagnation
We should be honest about what sustains this state of affairs: there’s no product revenue in conceptual advancement. Vendors profit from tool adoption, not from engineers developing better mental models of their systems. The conference circuit is largely vendor-funded, and the talks that get selected are those that drive adoption of sellable things—observability platforms, APM solutions, log aggregators.
A talk titled “Rethinking Observability Beyond the Three Pillars” is professionally risky for the presenter and commercially uninteresting to sponsors. A talk titled “Advanced Distributed Tracing Patterns with OpenTelemetry” is safe, on-brand, and reinforces the conceptual status quo that makes vendor products relevant.
This creates a knowledge production system optimized for incremental refinement of existing approaches rather than fundamental reconsideration. The smart people in the field, and there are many, direct their intelligence toward problems like “how do we handle trace context propagation in complex async scenarios” rather than “is trace-centric thinking adequate for understanding system behavior at all?”
The oxygen has been consumed not by malice but by economic selection pressure. Ideas that reinforce tool adoption survive and replicate. Ideas that question foundational assumptions struggle for airtime.
What Observability Could Be
If we could step outside the OpenTelemetry paradigm, even momentarily, what might we see? Observability could be understood as the discipline of enabling systems, including the humans operating them, to maintain adequate awareness of themselves. This reframing immediately opens different questions.
Rather than asking “what telemetry should we collect?” we might ask “what does this system need to know about itself to function effectively?” Rather than optimizing for data export throughput, we might focus on meaning-making: how do raw observations become actionable understanding? Rather than treating observability as instrumentation applied post-hoc, we might consider it an intrinsic system property designed in from the start.
This perspective suggests that the unit of observability shouldn’t be the trace or the metric but the sign, an observation that carries significance within a context. Signs emerge from interpretation, not from instrumentation. They exist at the intersection of what’s measured and what matters.
It also suggests that observability is fundamentally about enabling appropriate response, not about accumulating data. A system that generates petabytes of telemetry but can’t answer the question “what should I do right now?” has failed at observability regardless of how comprehensive its instrumentation is.
The AIOps Mirage
Into this conceptual vacuum steps artificial intelligence, promising to solve observability’s problems through machine learning and natural language interfaces. The pitch is seductive: let AI analyze your telemetry, detect anomalies, correlate signals, and explain what’s happening in plain English. Finally, the thinking goes, we’ll have the understanding we’ve been missing.
This is already perhaps the most dangerous development because it provides the illusion of progress while cementing inadequate foundations.
Consider what AI-powered observability actually does. It takes telemetry data structured according to existing paradigms and applies pattern recognition to surface correlations. It then generates natural language explanations of what it finds. The fundamental problem is immediately apparent: AI can’t extract meaning that isn’t present in the data. It can only reformulate what’s already there.
When an AIOps system says, “I’ve detected an anomaly in your service mesh that correlates with increased error rates in the payment service,” it hasn’t achieved understanding. It performed statistical correlation on events and presented the result in prose. The interpretation, what this means, whether it matters, and what should be done, still happen entirely in the human mind. The AI has translated from one syntactic representation (graphs and charts) to another (English sentences). Semantics remain unaddressed.
This is the crucial distinction the industry refuses to confront: there’s a vast gulf between information presentation and meaning-making. AI excels at the former. It can summarize, correlate, and verbalize. But meaning – the significance of observations within a purposive context – requires interpretation frameworks that don’t exist in the telemetry itself. No amount of machine learning can conjure models of causality, system intention, or operational relevance from data that encodes none of these things.
Worse, the AI overlay actively discourages fixing foundational problems. If an LLM can generate plausible-sounding explanations of system behavior from inadequate telemetry, why invest in better observability foundations? The prose sounds intelligent. The correlations seem insightful. The fact that it’s all syntactic manipulation of semantically impoverished data gets obscured by the fluency of the output.
We’re witnessing a doubling down on failure. Instead of questioning whether trace-centric telemetry captures what matters about system behavior, we train models to find patterns in traces. Instead of asking whether the three pillars adequately represent the system state, we build AI that cross-correlates pillars more efficiently. The energy goes into making inadequate representations more palatable rather than developing adequate representations in the first place.
The economics reinforce this. AI features are different. Natural language interfaces are marketable. “Ask questions of your data in plain English” sells better than “we’ve fundamentally reconceived how system behavior should be represented.” So vendors compete on AI capabilities applied to the same foundational telemetry, and the conceptual ceiling lowers further.
The tragedy is that AI could be genuinely transformative if applied to richer representations. If systems emitted not just events but signs, observations carrying semantic weight within interpretive contexts, then AI could reason about meaning, not just correlate syntax. If telemetry encoded models of system intention and behavior, AI could identify meaningful deviations from expected dynamics. But this requires the foundational work that the industry is using AI to avoid.
Instead, we get sophisticated prose generation about impoverished data. The AI explains, in increasingly fluent language, patterns in information that never captured what mattered. We waste enormous computational and cognitive energy regurgitating the same inadequate representations in different forms, mistaking verbosity for insight and correlation for comprehension.
This is what makes the current moment so precarious. AI provides just enough apparent progress to forestall genuine reconsideration. Teams can boast about their AI-powered observability platform, but the core issues—lack of semantic foundations, conflation of measurement with understanding, and neglect of situational awareness as a cognitive phenomenon—remain unaddressed and increasingly invisible.
The oxygen crisis deepens. Now we’re not just trapped in OpenTelemetry’s conceptual ceiling—we’re using AI to make the ceiling feel spacious.
Where Do We Go From Here?
Is there a way out of this? The honest answer is: probably not, at least not one that’ll gain traction within current structures. OpenTelemetry has achieved the network effects that make alternatives nearly impossible to establish. The economic incentives point toward tool adoption, not conceptual revolution. The conference circuits will continue optimizing for what sponsors want to hear.
Perhaps the most realistic outcome is that the current paradigm will persist until its inadequacies become undeniable. When agentic systems, with their complex autonomous behaviors and novel failure modes, prove intractable to trace-centric analysis, the limitations will become visible. When distributed AI systems exhibit emergent behaviors that leave no coherent trace, the three-pillar model will reveal its insufficiency. When the cost of not understanding becomes high enough, the conceptual constraints might finally be questioned.
Until then, those who recognize the problem face a choice. They can work within the paradigm, accepting its limitations while trying to solve real problems despite inadequate conceptual tools. Or they can work at the margins, developing alternative frameworks that may or may not find adoption, but at least preserve the possibility of different thinking.
What they can’t do is pretend the problem doesn’t exist. The oxygen crisis is real. OpenTelemetry has become the conceptual ceiling of observability, and most practitioners don’t realize they’re operating in an enclosed space. The first step toward solving any problem is recognizing its existence.
The second step, finding alternatives that can actually gain traction, remains unclear. Humainary’s Substrates is one way forward, but I am not sure anymore whether this combination of simplicity and sophistication can ever be appreciated by the engineering community, which still has a sysadmin mindset. But perhaps naming the crisis is enough for now. Perhaps acknowledging that contemporary observability has conflated tooling with understanding, standardization with progress, and data collection with situational awareness is a necessary precursor to something better. The alternative is to keep renaming TraceCon and pretending it’s about observability. That path leads nowhere interesting.
