A Modern Observability Library Toolkit

Arrival Immenient

We have had this website up for several years, with some notices indicating that we would be launching somewhere around 2019. But before the launch, Instana came knocking interested in technology we had recently successfully piloted at customer sites. So a few years on, after an exciting and innovative R&D detour into control towers for parcel logistics at a postal company, we are readying the first official milestone preview release of a suite of open source observability instrumentation libraries.

The aim is to bring much-needed sensibility, streamlining, simplicity, and sophistication back to an area that seems to fight forcefully to not move past yesteryear technologies like logging and tracing.

Looking Back

When we started with this initiative to bring a more humane approach back into the fold of observability, various proposed standards were being pushed, such as OpenTracing and OpenCensus. Even after many reasoned rejections from experienced systems and performance monitoring engineers, the only game in town seems to be OpenTelemetry. We consider this a significant problem, an evolutionary dead-end.

It has been clear from the outset of the OTel saga that the project aim was never to truly consider and address the needs of those managing modern self-adaptive systems services. No one thought that controllability and situation awareness were the end game.

Someone just picked up an old discarded tool, Dapper, found in a derelict cave in Mountain View and just started swinging it aimlessly around in fear of unknown unknowns and deep systems that had been shouted and screamed down to them by a charlatan high priestess strutting manically in front of a crumbling three-pillared temple of data, details, and decline.

Rubber Stamping

With Google and others pushing hard to rubberstamp a 10-year old and questionable approach to observability, the engineering effort was on how quickly they could make what was “out there in the wild” work under various guises. At least appear to work more consistently across languages, runtimes, and agent-vendor boundaries. This was all done without much in the way of serious design thinking or value analysis beyond let’s ship similar expensive and intensive data collection, harvesting, and pipelining technology.

All targeted at an external, and invariably dumb, storage endpoint and hope that machine learning will magically and miraculously solve the noisy big data problem just created and transported afield.

At the first distributed tracing workshop, held in Budapest, these and many other issues and concerns were raised, discussed, and half heartily accepted. Still, there was just too much attachment to the past-present without little course change.

We can well understand why application performance monitoring vendors like DynaTrace, AppDynamics, and NewRelic would clutch so hard to the past. However, for practitioners to do likewise shocked us and, to this day, is saddening.

Tipping the Scales

It is clear from the current course taken within the site reliability engineering and observability communities that there is a severe imbalance between the perceived value and the ultimate utility of the bloated data pipelines and platforms, developed and deployed repeatedly without delivering promised operational improvements. The emphasis on doing with data (collecting, storing, querying, debugging) instead of informed decision-making with appropriate models that emphasize signals, significance, and situations has resulted in very little progress.

We are still failing in tackling system complexity, increasing rates of change, and the higher resilience requirements expected of systems by society.

We continue to entangle every aspect of life with computation in a remote cloud of components beyond our comprehension and control – this is a Matrix without a Neo to save us.

Changing Course

No one can be sure of the optimal solution to the increasing business and environment uncertainty and complexity coupled with cognitive overload caused by wanton data collection. What we do know is that we are drowning ourselves and the environment, including machines and networks, in data, doing without deciding, reacting without reasoning, fetching without filtering, searching without signifying, collecting without controlling.

What is certain is that this approach is unsustainable, both for humans and the environment, in particular systems of systems and services. We have allowed our fear of complexity and our inability to continue to manage change to take hold of our entire cognitive and communication capacities, further exasperating the problem.

We must change course; admit value isn’t evenly distributed across data points and pipelines. More is less when it comes to decision-making at scales ill-suited to our minds.

Engaging Experimentation

We need to experiment with new novel approaches to observability, especially technologies designed to reflect present problems and predicted and projected futures. These new approaches must consider the cost (of collection) at holistic and system levels. Their overall design must better reflect the communication constraints and coordination costs between humans and machines, as well as machine-to-machines.

We have to balance better the distribution of collective intelligence and the resulting spatial and temporal locality of the resulting actioning change and coordination.

We need greater diversity in the design of instruments while maintaining a common culture, contract, and code of understanding and meaning – a pragmatic process of framing measurement, modeling, and memory in context.

Achieving AIOps

If artificial intelligence (AI) came into existence, we would speculate that it would in itself push sensing, reasoning, and actioning computation outwards to the peripheral (edge) points, augmenting and extending itself using the environment much like the human collective – this is not how we have approached observability today, ignoring controllability in doing so.

We expect a swing, maybe not as soon as we would like. When it does, we hope that the toolkit and libraries we continue to experiment with, designing and prototyping, will prevent the pendulum from swinging back to old defunct ways of doing with data for the sake of appearances only or vendor market dominance.