The Humainary project aims to bring much-needed sensibility, streamlining, simplicity, and sophistication back to an area that seems to fight forcefully not to move past yesteryear technologies like logging and tracing.
When we started with this initiative to bring a more humane approach back into observability, various proposed standards were being pushed, such as OpenTracing and OpenCensus.
Even after many reasoned rejections from experienced systems and performance monitoring engineers, the only game in town seems to be OpenTelemetry. We consider this a significant problem, an evolutionary dead-end.
It has been clear from the outset of the OTel saga that the project aim was never to truly consider and address the needs of those managing modern self-adaptive systems services. No one thought that controllability and situation awareness were the end game.
With Google and others pushing hard to rubberstamp a 10-year-old and questionable approach to observability, the engineering effort was on how quickly they could make what was “out there in the wild” work under various guises.
This was all done without much in the way of serious design thinking or value analysis beyond, let’s ship similar expensive and intensive data collection, harvesting, and pipelining technology.
All targeted at an external, and invariably dumb, storage endpoint with the hope that machine learning will magically and miraculously solve the noisy big data problem just created and transported afield.
Tipping the Scales
It is clear from the current course taken within the site reliability engineering and observability communities that there is a severe imbalance between the perceived value and the ultimate utility of the bloated data pipelines and platforms, developed and deployed repeatedly without delivering promised operational improvements.
The emphasis on doing with data (collecting, storing, querying, debugging) instead of informed decision-making with appropriate models emphasizing signals, significance, and situations has resulted in very little progress.
We still fail to tackle system complexity, increasing rates of change, and the higher resilience requirements expected of systems.
We continue to entangle every aspect of life with computation in a remote cloud of components beyond our comprehension and control.
No one can be sure of the optimal solution to the increasing business and environment uncertainty and complexity coupled with cognitive overload caused by wanton data collection.
We are drowning ourselves and the environment, including machines and networks, in data, doing without deciding, reacting without reasoning, fetching without filtering, searching without signifying, and collecting without controlling.
The current approach is unsustainable, both for humans and the environment, in particular, systems of systems and services.
We have allowed our fear of complexity and inability to manage change to take hold of our cognitive and communication capacities.
We must change course; admit value isn’t evenly distributed across data points and pipelines.
We must experiment with novel approaches to modeling present situational states and predicted and projected futures.
We must consider the cost of data collection at holistic and system levels.
A design must reflect the communication constraints, coordination costs between humans and machines, and machine-to-machines.
We have to balance better collective intelligence distribution and the resulting spatial and temporal locality of actioning.
We need greater diversity in the design of instruments while maintaining a common culture, contract, and communication code.
We must employ a more pragmatic process of framing measurement, modeling, and memory.