Unfortunately, many of the solutions promoted in the Observability space, such as distributed tracing, metrics, and logging, have not offered a suitable mental model in any form whatsoever. The level of situation awareness is still sorely lacking in most teams, who appear to be permanently stalled at ground zero and overtly preoccupied with data and details.
Author: William David Louth
Observability is Yesteryear’s Monitoring
Looking back over 20 years of building application performance monitoring and management tooling, little has changed, though today's tooling does collect more data from far more data sources. But effectiveness and efficiency have not improved; it could be argued that both have regressed.
The Solution is not Distributed Tracing
Science and technology have made it possible to observe the motion of atoms, but humans don’t actively watch such atomical movements in their navigation of physical spaces. Our perception, attention, and cognition have evolutionary scaled to an effective model for us in most situations. Distributed tracing spans, and the data items attached, are the atoms of observability.
Observability – The Two Hemispheres
Two distinct hemispheres seem to form within the application monitoring and observability space - one dominated by measurement, data collection, and decomposition, the other by meaning, system dynamics, and (re)construction of the whole.
Scaling Observability for IT Ops
The underlying observability model is the primary reason for distributed tracing, metrics, and event logging failing to deliver much-needed capabilities and benefits to systems engineering teams. There is no natural or inherent way to transform and scale such observability data collection analysis to generate signals and inferring states.
Humanizing Observability and Controllability
Humanism is a philosophical stance at the heart of what Humainary aims to bring to service management operations. It runs counter to the misguided trend of wanton and wasteful extensive data collection so heavily touted by those focused on selling a service rather than solving a problem, now and in the future.
Simplicity and Significance in Observability
As computing and complexity scaled up, the models and methods should have reduced and simplified the communication and control surface area between man and machine. Instead, monitoring (passive) and management (reactive) solutions have lazily reflected the complexity's nature at a level devoid of simplicity and significance but instead polluted with noise.
Observability – A Multitude of Memories
There are at least two distinct paths to the future of observability. One path that would continue increasing the volume of collected data in its attempt to reconstruct reality in high-definition on a single plane with little consideration for effectiveness or efficiencies. Another would focus on seeing the big picture in near-real-time from the perspective of human or artificial agents.
AIOps – A Postmodern Observability Model
We propose a model which can better serve site engineering reliability and service operations by being foundational to developing situational awareness capabilities and system resilience capacities, particularly adaptability and experimentation, as in dynamic configuration and chaos engineering.
AIOps – The Double Cone Model
The Double Cone Model is a valuable conceptualization in thinking about more efficient and effective methods to handle data overload and generate far more actionable insight from a model much closer to how the human mind reasons about physical and social spaces.