Inspectis – Observers for Observability

Project Inception

Last week we completed the first significant design milestone for the Inspectis project, the reference implementation for the Humainary Observers Instruments API. We are incredibly pleased with the result and see this as a confirmation of the rock-solid design of the Substrates project that is the foundation for all our instrument libraries and other product and service offerings. A post on the inspectis.io website will be going through the interfaces in detail later this week. In this post, we would like to explain the origin of the project and its underlying design thinking, which came about as we further developed new instruments and reflected on the past interface design choices made within opensignals.io.

Service Signaling

To better understand the Observer approach we have taken with the Inspectis project, briefly explaining the Humainary Services Service interface is essential. A Service is an instrument representing a named activity such as a task, stage, job, workflow step, service, action, method, etc. In the execution of the activity, one or more signals are emitted that fall into two categories: operation and outcome. The Service interface has 16 signatures for recording each possible Signal. A software developer, such as a site reliability engineer, maps the different execution paths of application code, including the throwing of exceptions, to one or more signaling signatures that are invoked on the Service interface.

Status Inference

Under the hood, the signals emitted by a service are processed by the service provider implementation (SPI) of the Humainary Services API to infer a Status for the service. The reference implementation of the Humainary Services API uses a configurable scorecard to do this, with the inferred state of the service accessible in the Service interface. So the Service interface, which extends the Instrument interface within the Substrates API, acts both as a push sensor (signals) and a model (status) of the sensory processing.

A Network Model

While OpenSignals is probably one of the most innovative technologies in the observability arena today in having the ability to infer service status within a local process without sending data outbound, we felt we could extend the spectrum of the subjective assessment of service level. Before OpenSignals, application performance monitoring offered only a single global status value for each service, displayed typically in a management dashboard. With OpenSignals, officially released in 2018, every process within a network contains a network model of all dependent services and its subjective view of their status derived from its own locally measured and monitored interactions. Each process needed only to configure the weights used by the scorecard that reflected the process’s own sensitivity to particular signals.

A Multi(Obser)verse

But what are those weights? How effective are the inferences with such weights? Could a different model and method be used instead of the scorecard to infer the state of a service? In thinking about how best to continue the development of OpenSignals that would allow a multiverse of service level management, we realized we need to more clearly separate the sensor inputs from the outputs – the status model. We needed to allow multiple inference engines to run side-by-side without reimplementing the whole Humainary Services API. Before pulling out the status from the Services API, we concluded that an observer framework, an instrument library, was required to create and manage multiple observers of the same signals or any other events emitted by other instruments. The Humainary Substrates API already supported the transmission of events across contexts; what we required, in addition, was a library that offered much of the plumbing out-of-the-box and abstracted the differences between push and pull of event and state delivery. And so the Humainary Observers API came to be.