Simplicity and Significance in Observability

This article was originally posted in 2020 on the OpenSignals website, which is now defunct.

System Complexity

Over the last few years, complexity has risen within the computing infrastructure, especially with the movement to a finer granularity in deployment units. We’ve seen some companies adopt microservices so enthusiastically that what was once primarily considered a monolith is now broken up into hundreds, even thousands, of pieces of execution units that are connected by and large. One might naively expect that the tools and approaches employed should have done likewise. We believe this was, and still is, a grave misconception.

Wrestling with Data

As computing and complexity scaled up, the models and methods should have reduced and simplified the communication and control surface area between man and machine. Instead, monitoring (passive) and management (reactive) solutions have lazily reflected the complexity’s nature at a level devoid of simplicity and significance but instead polluted with noise.

Today, engineering teams are far too busy wrestling with and wandering around an ever-expanding data fog of metrics, logs, and distributed tracing. Fearful of the complexity, many engineering teams worry over their ability to collect, store, and analyze more and more data and details – but never to question and reflect on the effectiveness of such.

Firefighting with Fire

One could very well argue that complexity has been replaced with complicated. We are not understanding or solving the complexity problem; we are just attending to and acting on another problem because it feels much more familiar than today’s changing world.

There is seeing but no perceiving. There is doing but no direction. There is collection but no cognition.

Application monitoring and management solutions are far more complicated than needed.

A Single Pane (of Pain)

That single pane of glass many vendors talk up consists of hundreds of layers, tabs, views, charts, and navigation aids. It has become such a sorry tale that some vendors have created an onboarding experience that consists of a game leading users through a path to some golden nugget of information. The problem is that the data is detached from the service domain and systems dynamics unless one is a machine. This is a temporary band-aid for a far more troubling problem where data is valued over information and valuable models.

Imitating Intelligence

The Humainary initiative aims to bring simplicity and significance back into monitoring, observability, controllability, and management. The basic idea is pretty simple, as are many practical innovations: see, perceive, model, and reason about the computing world of microservices, much like how humans do so within societies and cultures consisting of multiple agents of offered services.

Communication and Cooperation

We find signals and inferred states at the heart of all human (and animal) communication and coordination. Signals are emitted or received. Signals indicate operations or outcomes – signs and traces of the past and a sliver of the present. Signals influence others and, over time, infer the state of others and ourselves on reflection. A signal is a direct and meaningful unit of information within a (social) context, much like an emoji. It is not a message that needs to be introspected in part and then interpreted. Humans emit and receive signals via body language and vocalization in any interaction and what is physically passed and contextually communicated.

Signal processing and transmission are paramount to practical cooperation and coordination. But the signals are just a means to an end, and that end is the assessment of ourselves and others – state inference.

Framing and Focusing

When it comes to monitoring environments, the focus and frame of reference should always be on the status of the operation of a service from the perspective of each other service that interacts with that service. An assessment of service quality should not be based on what a service tells us by way of published metrics – this misses the point that no service exists in isolation anymore within a network of high interconnectivity. Instead, an assessment should reflect how other services perceive a service through signals and the inference to a state, which can differ depending on the sensitivity to each service’s signals. Sensitivity manifests in the different weighting of signals and the decay rate of memories each service is configured for.

Representing Reality

Humainary brings simplicity and sensibility by way of a focus on what is effective and of significance to the vast majority of service management attention – what is the status of this service, this cluster (of services), or this system (of services)? The conceptual model and language (terms) are small, and the processing sequence is straightforward – a service creates a context representing the world. Within this context, the service is represented alongside the other services it interacts with. In the interaction, the service owning the context, acting like a mind or model, records signals against the representations of itself and the other services.

Signaling and Synthesizing

The recorded signals are then scored based on the configuration used to create the context and mapped to a status bucket for each possible status value per service represented. The scoring card will tally each bucket and make a generalized assessment of service with some decaying mechanism in play, much like a human memory system works. The context can transmit the status changes to other interested observers through a plugin, where collective intelligence can manifest in additional aggregation, ranking, weighting, etc.