Scaling Observability Resources

A significant challenge in building observability agents or heavily instrumented applications or frameworks is in scaling, both up and down, the resources consumed. There is a trade-off here in terms of time and cost. The more resources given to an agent, the faster it can process events and determine the current situation (state). The allocation of sufficient resources also reduces the performance impact on those calls that cross the boundary between application and instrumentation. But the more resources are assigned to an agent or instrumentation library, the more likely it will impact the application at an overall system service level.

Unfortunately, not one existing open-source observability library or commercial agent from an application performance monitoring vendor addresses resource management at the interface level beyond offering maybe a system property that sets the size of a bounded queue or the policy for dropping traces when overloaded. Callers of Instrument interfaces are oblivious to what is happening when invoking a method on an Instrument. There are no means to align and tailor resource consumption to the specifics of the application behavior. Software with critical performance requirements and required resource constraints are not served. Far too much is hidden and non-standards-based, creating portability issues and difficult-to-resolve in-process agent performance perturbations.

The Substrates API, on the other hand, provides the building blocks for effective and efficient resource monitoring and management via the Circuit and Conduit interfaces. An agent or application developer scales the underlying Event pipeline by creating one or more Circuits. Circuits are the primary resource allocation unit in the Substrates API, much like a CPU reservation. A Circuit logically equates to a single Thread and Event queue – it’s Current (flow). A Circuit performs work that Instruments submit via Inlets. An Instrument in the Substrates API has no control over the Event processing resources other than emitting values. This starkly contrasts with all other libraries today, where an Instrument, such as a Counter, Timer, or Tracer, creates and consumes resources, such as Threads, utterly unbeknownst to the application or agent. The Substrates API inverts this misaligned control.

A Substrates Instrument developer does not allocate resources beyond the mere object allocation for an Instrument instance; even then, this allocation is managed by a Conduit. A Conduit is a factory (and container) of Instrument instances. A Conduit cannot perform work alone; this comes from a Circuit, which provides methods to create one or more named Conduits. A Conduit provides the Inlet to the Instrument to allow it to emit values, which become Events, that the Circuit processes. Close the Conduit of an Instrument, and no work will be performed, such as dispatching Events to Subscribers and Outlets. Close a Circuit, and all Conduits created by that Circuit cease functioning. With other solutions, there are no means to release resources at will if needed.