Why Observability Needed a Systems Engineer

Imagine embarking on an ambitious project to construct a spacecraft capable of traversing the vast expanse of distant planets. As you assemble a team of experts, secure the necessary funding, and face a pivotal decision, you must determine who should lead the design of the navigation system. This choice could be made between a technician with exceptional skills in installing and calibrating measurement instruments, or an aerospace engineer who possesses a deep understanding of orbital mechanics and complex flight dynamics. You choose the technician.

The resulting spacecraft boasts flawlessly calibrated sensors, perfectly standardized data formats, and comprehensive measurement protocols. This instrumentation enables remarkable precision in measuring thrust, velocity, temperature, and radiation. However, despite these advanced capabilities, the spacecraft struggles to plot an efficient course through space. It collects perfect readings but lacks the intelligence to translate them into meaningful navigation decisions. The fundamental issue at the core of OpenTelemetry and modern observability lies in this.

How We Got Here

To understand how we arrived at this impasse, we need to examine the historical DNA of observability tooling. The culprit? The deeply ingrained habits of system administrators who spent decades living inside log files. For the traditional sysadmin, logging was the universal answer. Logs were the diagnostic lifeblood.

Everything left a trace, and everything was written to a file. When something went wrong, the admin’s sacred ritual was to grep, tail, and awk their way to the root cause. It was tangible, familiar, and deceptively simple—one line per event, all context compressed into a single entry. This logging-first mentality became the subconscious blueprint for how observability evolved. When systems grew more complex, instead of developing new paradigms, we extended old ones.

  • More logs equals better visibility. Wrong—more logs lead to more noise.
  • All data should fit in one log line. Wrong—data belongs in context, not static snapshots.
  • One event should capture the full picture. Wrong—meaning arises from sequences, not single entries.

The result? Our modern observability tools still rely on “fat” wide events—giant, bloated records where every conceivable detail is crammed into a single record. Not because this approach makes sense for distributed systems, but because sysadmins wanted logs they could read like familiar Unix log files.

Throughout this period, the community has introduced various labels for logging purposes, including “standard”, “structured”, and “semantic”. Tracing and metrics are collected similarly to logs. All tracing and metric tools are designed to generate event-based data, which essentially constitutes another form of logging. Consequently, tracing and metrics are frequently presented as lists of events or lists of values over time rather than as higher-order forms.

The Temple of Echoes

Deep within a forgotten temple, long buried by time, scholars of an ancient civilization inscribed messages onto stone tablets. These inscriptions took different forms—some used pictograms, others developed structured hieroglyphs, and a few introduced color-coded etchings to clarify records. Over generations, the scribes perfected their craft, chiseling ever more intricate symbols into the stone, convinced that better inscriptions would unlock deeper truths. Yet, all they had truly done was refine the medium of recording, not the understanding of the echoes within the temple. Hidden chambers remained unexplored, mechanisms lay dormant, and the temple’s true purpose—deciphering the patterns of sound that resonated through its walls—was overlooked. The scholars mistook better inscriptions for better wisdom, unaware that the temple itself was designed to reveal meaning through the way sound moved through its structure. Much like these scribes, modern observability has focused on refining how we log events—polishing the stone tablets—rather than unlocking how systems truly resonate over time. The real challenge isn’t in perfecting the format of the inscriptions but in understanding the patterns of echoes that reveal the underlying system’s nature.

Instrumentation Without Understanding

We’ve built complicated systems for collecting, standardizing, and transmitting telemetry from our digital landscapes to illuminate the living, breathing complexity underneath our systems. But somewhere along the way, we conflated the means with the end. We’ve allowed data collection to become synonymous with understanding.

The OpenTelemetry ecosystem looks to answer questions like:

  • How do we collect log-like messages across heterogeneous systems?
  • How do we configure data pipelines?
  • How do we standardize formats so tools can interoperate?

These are administrator questions—focused on operational consistency, coverage, and connection. They matter, but they’re merely prerequisites for the deeper questions we need to answer:

  • How do sequences of signs reveal emergent patterns over time?
  • What feedback loops are driving system stability or instability?
  • Which higher-order abstractions can help us predict behavior rather than merely respond to it?

These are systems engineering questions—focused on comprehension, causality, and control.

The Astronomer’s Lost Orrery

A forgotten celestial orrery, a complex mechanical model of the universe, lay buried within a ruined temple. Its intricate design, the product of ancient scholarship, once meticulously charted celestial movements, revealing the cosmos‘’’ inherent order. While entombed for centuries, its mechanisms, though dust-laden, retained their capacity to unlock the universe’s secrets. Modern digital archaeologists have painstakingly restored this ancient device, re-aligning its movements with the stars. However, its function extended beyond mere astronomical mapping; it aimed to decipher underlying forces, predict patterns, and unveil the hidden structure within the seemingly chaotic heavens. Similarly, contemporary telemetry systems endeavor to illuminate the dynamism and complexity of our digital realm, transforming raw data into actionable knowledge and revealing the inherent meaning within seemingly random digital activity.

From Lines to Sequences

Logs are like frozen snapshots of a system’s past. They show you what happened at a point in time, but they don’t tell you how things are connected. It’s like trying to understand a symphony by looking at individual notes on separate pieces of paper. You might have all the details about each note – its pitch, how long it lasts, and how loud it is – but you’d miss the big picture. You’d miss the melody, harmony, and meaning that comes from playing the notes together.

In complex, orchestrated systems, meaning emerges from sequences, not isolated events. A failure isn’t just an error message—it’s a pattern of deviations, retries, compensations, and state transitions unfolding across different services and subsystems. A single log line, no matter how bloated with context, can’t capture this emergent behavior.

OpenTelemetry, with its roots in the logging tradition, still fundamentally treats observability as a data collection problem instead of a pattern recognition challenge. It excels at gathering notes but struggles to hear the music.

Diagnostics Masquerading as Management

Consider the distinction between a car’s diagnostic system and an autonomous driving system. The former provides real-time alerts when components are experiencing malfunctions, while the latter possesses the ability to comprehend road dynamics, anticipate potential hazards, and navigate complex environments effectively.

OpenTelemetry has provided us with diagnostics tools, but we’ve mistakenly positioned them as management tools. We can detect when services are failing, connections are dropping, or latencies are spiking, but we remain largely blind to the deeper system dynamics that explain why these events occur and how they interrelate.

It’s like having a doctor who can measure your vital signs with perfect precision but lacks the medical knowledge to diagnose your condition or recommend treatment. The measurements are flawless, but the understanding is missing.

The Administrator vs. Engineer Mindset

This distinction reflects a fundamental contrast in professional orientation. The sysadmin mindset focuses on:

  • Collection and standardization
  • Operational stability
  • Measurement accuracy
  • Tool integration
  • Line thinking: examining isolated events
  • Immediate response

Whereas the engineering mindset instead prioritizes:

  • System modeling and abstraction
  • Dynamic relationships and feedback loops
  • Predictive capability
  • Structural understanding
  • Pattern thinking: interpreting sequences and relationships
  • Proactive design

OpenTelemetry looks to have been designed by a sysadmin rather than a systems engineer whose core competency is understanding complexity, systemic interactions, and meaningful abstraction.

Beyond the Three Pillars

The conventional “three pillars” of observability—metrics, logs, and traces—reflect this operational bias. They’re fundamentally decomposed, retrospective records that try to answer “What happened?” but struggle with “Why did it happen?” and “What is likely to happen next?”.

A systems engineering approach would transcend these data silos, focusing instead on the following:

  • Control systems that reveal how outputs feed back into inputs
  • State transitions that expose how systems change over time
  • Boundary conditions that indicate when systems approach critical thresholds
  • Emergence patterns that show how component interactions create system-level behaviors

In other words, we’ve perfected the art of collecting breadcrumbs while neglecting the science of mapping forests.

Breaking Free from the Logging Mindset

We need to recognize that logs—and their bloated JSON offspring—are holding us back. Instead of treating observability as an extension of old-school system admin, we need to treat it as a discipline of meaning-making.

Let’s be clear: this isn’t about throwing away operational expertise. But there’s a fundamental difference between operating a system and understanding a system—between maintaining machinery and designing it. Both perspectives matter, but the operational view has for far too long dominated the observability field.

  • Stop stuffing everything into single, monolithic events
  • Shift from “line thinking” to “pattern thinking”
  • Move past the idea that more data means better understanding

Countering Counterarguments

Before outlining the path forward, let’s address some likely objections to this critique:

“But OpenTelemetry is just plumbing—it’s not meant to solve the comprehension problem.”

This is precisely the problem. By framing observability as a data collection challenge, we’ve allowed the plumbing to define the architecture. When instrumentation becomes the central focus, understanding becomes secondary. The medium shapes the message. As long as we treat OpenTelemetry as “just plumbing,” we’ll continue building systems that excel at collecting data but struggle to extract meaning from it.

“We still need standardization—you can’t have understanding without data.”

Of course, standardization matters. But standardizing the wrong abstractions is worse than not standardizing. When we standardize around logs, metrics, and traces—snapshot-based data models—we’ve hardwired snapshot thinking into our tools and practices. Imagine if early database designers had standardized only on flat files instead of relational models. We’d have perfectly standardized, utterly inadequate data systems.

“Engineers can build comprehension tools on top of OpenTelemetry’s collection layer.”

This layered approach sounds reasonable but ignores how collection models constrain comprehension models. When your foundation is built to collect and process discrete events and metrics, the higher-order tools inevitably inherit these limitations. It’s like trying to understand fluid dynamics using only static photographs. No matter how sophisticated your analysis tools are, the static nature of the inputs fundamentally limits what you can perceive.

“But many organizations have successfully implemented OpenTelemetry.”

Success depends on how we define it. Organizations have standardized their telemetry collection, reduced integration costs, and improved operational visibility. But have they improved their understanding of system behavior? Have they moved beyond reactive firefighting to proactive design? Have they developed deeper insights into how their systems function? Standardized instrumentation is different from improved comprehension.

The Path Forward

The next evolution must come from a fundamental shift in perspective. We need to move beyond data collection to genuine system comprehension. This doesn’t mean completely abandoning OpenTelemetry’s standardization efforts, but recognizing its foundational flaws and conceptual limitations, and exploring new approaches.

We need to supplement our instrumentation tools with true systems thinking tools that help us:

  • Model dynamic relationships between components
  • Visualize feedback loops and their effects
  • Identify system archetypes and patterns
  • Track system trajectories, not just system states
  • Predict emergent behaviors before they manifest
  • Design controls that shape system behavior

This is the toolkit a systems engineer would have designed from the beginning—one that treats measurements not as ends in themselves but as inputs to deeper understanding.

Conclusion

This isn’t merely an academic critique; it has profound practical implications for how we build, operate, and evolve our digital systems. The current situation reveals a broader truth about how we approach complex systems: our tools reflect our thinking, and our thinking shapes our tools in return. By prioritizing administrator concerns in our observability strategies, we’ve developed measurement systems that still hinder our ability to comprehend the workings of our systems landscape. This doesn’t just lead to conceptual confusion; it also results in practical inefficiency, as teams allocate more time to data collection and organization than to extracting actionable insights.

True observability isn’t just about seeing what’s happening—it’s about comprehending why and anticipating what will happen next. It’s about moving from collecting footprints to understanding the journey. Logging got us here. But logging won’t take us forward. The future of observability won’t be built by system administrators who are still thinking in terms of log entries. It’ll be built by systems engineers—professionals who understand cybernetics, adaptive control, and how to extract meaningful patterns from noisy, complex interactions. That’s the difference between an administrator’s toolkit and an engineer’s vision—the evolution observability desperately needs.

Addendum: The Deeper Currents

Why have we approached observability this way in the first place? The answer reveals deeper philosophical and cultural underpinnings that have shaped our entire approach to understanding complex systems.

The Cartesian Legacy and the Limits of Reductionism

Our industry’s approach to observability reflects a deeply ingrained Cartesian mindset—the belief that observing systems is a matter of breaking them down into measurable, quantifiable parts. This reductionist worldview assumes that if we can just measure enough components with sufficient precision, we’ll understand the whole. This explains the relentless pursuit of more granular metrics, more detailed traces, and more comprehensive logs. It’s a philosophy that says that with enough data points, complexity becomes manageable. But complex distributed systems are fundamentally emergent—their behavior arises from interactions between components in ways that can’t be predicted from examining those components in isolation. They’re more like living ecosystems than mechanical clockwork. A log entry tells you as much about a distributed system as a fallen leaf tells you about a forest. This is why even perfectly instrumented systems can still surprise us with “unknown unknowns”—behaviors that emerge from interactions our measurement models never anticipated. No amount of logging can capture what we don’t know to look for.

Experience, Intuition, and the Human Element

In our rush to quantify everything, we’ve inadvertently pushed aside one of our most valuable observability tools: human intuition based on experience. Seasoned engineers often develop an intuitive “feel” for their systems—an ability to sense when something isn’t right, even before metrics show it. This intuition isn’t mystical; it’s the pattern recognition operating at a level our conscious minds can’t easily articulate. It’s the same phenomenon that allows a veteran mechanic to diagnose an engine problem from the quality of its sound. OpenTelemetry and similar approaches emphasize standardized measurements while implicitly devaluing this subjective human expertise. The administrator mindset privileges what can be measured over what can be sensed. Yet the most effective observability often comes from combining quantitative data with qualitative human expertise. The engineer mindset recognizes that measurements inform understanding but don’t replace it.

The Culture of Complexity Avoidance

Perhaps most fundamentally, our current approaches to observability reflect a deeper cultural anxiety about complexity. The obsession with collecting every possible data point, standardizing every data record, and monitoring every component betrays an underlying fear: if something happens that we didn’t measure, we’ve failed. This fear drives us to build ever more comprehensive instrumentation rather than more insightful abstractions. It leads us to value completeness over comprehension, and coverage over clarity. A true systems engineering approach would instead embrace complexity—not by trying to measure every aspect of it, but by developing mental models and abstractions that help us navigate it. It’d recognize that understanding complex systems isn’t about eliminating uncertainty but about developing the tools to reason about it. In this light, the administrator versus engineer distinction is more than a matter of professional training—it reflects fundamentally different relationships to complexity. The administrator seeks to tame complexity through comprehensive management; the engineer seeks to harness complexity through insightful modeling. The path forward for observability lies not just in new tools but in a new relationship with complexity itself—one that values understanding over mere measurement, patterns over isolated events, and human insight alongside machine precision.

The Data Addiction Problem

Let’s be brutally honest: our industry is grappling with a peculiar form of addiction—an obsession with data itself. Like many addictions, it all started as a rational response to a genuine need. When systems were opaque black boxes, more data genuinely meant better visibility. Each new metric, log, or trace delivered tangible value. However, as with many addictions, what began as a solution has transformed into a problem in itself. Organizations now collect data compulsively, far exceeding their capacity to truly comprehend it. They measure everything because they can, not because they should. They suffer from a “fear of missing out” on potential signals—what if the one metric we neglected holds the key to understanding a production issue? The classic signs of addiction are all present:

  • Increasing Tolerance: Need for increasingly larger “doses” of data to feel secure
  • Withdrawal Anxiety: Discomfort when suggesting we collect less data
  • Ignorance of Consequences: Maintaining massive observability infrastructures even as costs balloon
  • Neglect of Alternatives: Focusing on data collection at the expense of developing better mental models

Breaking this addiction won’t be an easy task. As with any dependency, change usually occurs when the pain of the current state surpasses the pain of making a change. We haven’t yet reached the crisis point where the costs and limitations of our data-hoarding approach have become intolerable enough to force a paradigm shift. Most organizations still believe that if they just collect enough data, if they just perfect their dashboards, if they just fine-tune their alerts, they’ll finally achieve the observability nirvana they’ve been promised. They’re not yet ready to accept that the fundamental approach might be flawed. The path forward for observability lies not only in new tools but also in a new relationship with complexity itself—one that values understanding over mere measurement, patterns over isolated events, and human insight alongside machine precision. However, we must recognize that this transformation will face the stubborn resistance of an industry that’s not yet ready to acknowledge its data dependency.