This article was originally posted in 2020 on the OpenSignals website, which is now defunct.
Failing in Observability
Numerous initiatives around Observability, sometimes referred to as Visibility in the business domain, fail to meet expectations due to engineers naively expecting that once data is being collected, all that needs to be done is to put up a dashboard before sitting back to stare blankly at large monitoring screens hoping for a signal to emerge from the pixels rendered magically. This is particularly so when users mindlessly adopt Grafana and Prometheus projects where data and charts have replaced or circumvented genuine understanding through patterns, structures, and models.
This anti-pattern seems to repeat consistently in organizations with insufficient expertise and experience in systems dynamics, situational awareness, and resilience engineering. Once the first data-laden dashboard is rolled out to management for prominent display within an office, it seems like the work is all but done other than to keep creating hundreds of more derivatives of the same ineffective effort.
The system is again given little regard, its dynamics and situations arising.
Many project teams fail in believing they can leap from data to dashboard in one jump.
Fear of Unknowns
This is not helped by many niche vendors talking up unknown unknowns and deep systems, which is akin to giving someone standing on the tip of an iceberg a shovel and asking them to dig away at the surface. There is nothing profound or fulfilling other than discovering detail after detail and never seeing the big picture of the system moving and changing below the visibility surface that comes from event capture that is not guided by knowledge or wisdom.
The industry has gone from being dominated by blame to fear, which shuts off all consideration of effectiveness.
Data != Information
Much of the continued failings in the Observability industry centers around the customary referencing and somewhat confused understanding of the Knowledge (DIKW) Hierarchy. Many next-generation application performance monitoring product pitches or roadmaps roll out a pyramid graphic, explaining how they will first collect all this data, lots of it from numerous sources, and then whittle it down to knowledge throughout the company’s remaining evolution and product development.
What invariably happens is that the engineering teams get swamped by maintenance efforts around data and pipelines and the never-ceasing battle to keep instrumentation kits and extensions up-to-date with changes in platforms, frameworks, and libraries.
Unfailingly the team slaps on a dashboard and advanced query capabilities in a declaration of defeat by delegating effort to users. Naturally, this defecating defeat is marketed as a win for users.
This sad state of affairs comes about because of seeing the hierarchy as a one-way ladder of understanding. From data, the information will emerge; from information, the knowledge will emerge, etc. Instead of aiming for vision all too often, it is data straight to visualizations.
The confusion is that this is a bottom-up approach, whereas the layers above steer, condition, and constrain the layers below through the continuous adaptive and transforming process. Each layer here frames the operational context of lower layers – direct and indirect.
A vision for an intelligent solution comes from values and beliefs; this then contextualizes wisdom and, in turn, defines the goals that frame knowledge exploration and acquisition processing.
One or more mental models are chosen for knowledge to emerge from information – a selection aligned to the overarching goals.
It is here where we firmly believe we have lost our way as an engineering profession. If we can call them that, our models are too far removed from purpose, goal, and context. We have confused a data storage model of trace trees, metrics, log records, and events as a model of understanding.
In the context of Observability, an example of a goal in deriving wisdom would be to obtain intelligent near-real-time situation awareness over a large, connected, complex, and continually changing landscape of distributed services.
Here, understanding via a situation model must be compatible and conducive to cooperative work performed by both machines and humans. Ask any vendor to demonstrate a situation’s representation, and all you will get is a dashboard with various jagged lines automatically scrolling. Nowhere to be found are signals and states, essential components of a past, present, and unfolding situation.
Downward Shaping of Sensemaking
Without a model acting as a lens and filter, there is never knowledge, augmenting our senses and reasoning and defining importance – the utility and relevance of information in context. Information is never without rules, shaped by knowledge, extracting, collecting, and categorizing data.
Data and information are not surrogates for a model. Likewise, a model is not a dashboard built lazily and naively on top of a lake of data and information. A dashboard and many metrics, traces, and logs that come with it are not what constitutes a situation.
A situation is formed and shaped by changing signals and states of structures and processes within an environment of nested contexts (observation points of assessment) – past, present, and predicted.
Models: Abstraction and Attention
Models are critical for grasping understanding in a world of increasing complexity. A model is a compact and abstract representation of a system under observation and control that facilitates conceptualization and communication about its structure and dynamics.
Modeling is a simplification process that helps focus attention on significance for higher-level reasoning, problem-solving, and prediction. Suitable models (of representation in structure and form) are designed and developed through abstraction and the need to view a system from multiple perspectives without creating a communication disconnect for all involved.
Coherence, as is conciseness and context, is an essential characteristic of a model.
This mismatch between what a developer conceptualizes at the level of instrumentation and what is presented within the tooling, visualizations, and interfaces is seen as an inconvenience – an inconvenient truth stemming from an industry that does far too much selling of meme-like nonsense and yesteryear thinking and tooling than educating in theory and practice. Focusing on systems and dynamics must win over data and details to return to designing and building agile, adaptive, and reliable enterprise systems.