The Data Fog of Observability

This article was first published in 2019.

Complexity and Change

The continued growth in complexity and increasing rate of change has many Site Reliability Engineering (SRE) and DevOps teams turning to Observability for an improved understanding of an evolving System of Systems (SoS) landscape. How effective the adoption of Observability based on yesteryear instrumentation, measurement, and collection approaches such as tracing, metrics, and logging is debatable.

The question we need to ask of Observability, as currently defined and employed, is whether it can scale up from the low-hanging fruit now visible and being picked off once we have moved from this initial frame of reference.

Exploration and Exploitation

The overemphasis on data instead of signals and states has created a great fog. This data fog leads to many organizations losing their way and overindulging in data exploration instead of exploiting acquired knowledge and understanding. This has come about with the community still somewhat unconcerned with a steering process such as monitoring or cybernetics. Within the thickening data fog, many miss the big picture, the changing landscape of the system state, and cannot pick up signals needed to orientate. Not all doing with data is useful when performed aimlessly. Any insights gleaned in trekking through a fog of data will likely only reveal more about forestry than the various local maximums in the landscape or the underlying tectonic and fundamental forces of system dynamics.

Organizations collect and process massive amounts of sensory data. Still, it comes with a cost when it overloads human cognitive capacities, reduces the visibility of what is significant and should be attended to, and makes orientation near impossible. More than ever, there is a need for expert guidance and adaptive tooling.

Abstraction and Simulation

In 2020, two broad movements will be taken to break away from the data fog that many organizations have found themselves floundering in – abstraction and simulation. Ever-increasing levels and layers of abstraction will be employed to rise above the fog and more accurately assess the situation and state of play at scale. Abstraction in data reduction and new higher-order model representations will better assist teams in identifying when to continue to exploit (move and expand rapidly) or to explore (slowdown and consolidate understanding).

Orientation of the situation gained from abstraction will bring focus and frame effort whenever an engineering team needs to dive deeply into the fog. Explorative mission briefings to the “unknown unknown” quadrant will primarily consist of intelligence based on communicated signals and inferred states along with maps of signposts. The effectiveness and efficiency of tooling will be judged on the precise targeting of incursions into the fog, the degree of learning acquired during, the cost and time spent in doing so, as well as the intelligence gathered and relayed to other teams moving elsewhere along the landscape – always guided and helped.

While abstraction looks to laws, formulae, and models to offer a more effective birds-eye frontage for operational teams and their activities, simulation attempts to recollect and reconstruct the foundational and functional fabric of reality of systems that are expected to be under some degree of change and control.

Simulation exists beneath the data fog, with data pushed into the background and used solely to power the play-back or play-forward of execution to be experienced and introspected immersively. Within the fog, data consisting of traces, metrics, and logs cloud the vision of reality under execution. The fog offers up hints of software execution behaviors. Still, the behavior is never genuinely experienced to the extent that it can be natively mapped to code, constructs, and context. Within the fog, engineering teams are always observing data and not actual execution—conversely, simulation deals exclusively with episodic memories of the past or the projection of future potential. While simulation is far more expansive and expressive, it is still very much simplified in the few fundamental and primitive elements of concern it contains. Teams will use simulations to train and tune decision-making tooling.