Scaling Coherence
A significant challenge with siloed-oriented telemetry toolkits, such as OpenTelemetry, is that each instrument employs a distinct data collection and transmission process. Each pillar, including tracing, metrics, and logging, possesses numerous configuration parameters to manage underlying resource usage, such as buffers and execution threads. Furthermore, coordinating these pillars and instruments presents significant difficulties. Obtaining a unified, ordered view of events across these pillars is challenging due to the absence of a centralized component to orchestrate work execution across these pillars and instruments. There is no inherent mechanism to synchronize or share context among these systems, resulting in fragmented and potentially conflicting telemetry data.
Achieving coherence within a local process is already arduous. However, when the data is relayed to a backend endpoint, where it must be assembled at different levels of aggregation, the challenges become even more complex. Moreover, achieving coherence at scale, especially in distributed systems, further exacerbates these difficulties. Without a centralized synchronization mechanism, fragmented telemetry data hinders root cause analysis, performance tuning, and system observability.
Circuits Simplify
The Substrates API elevates resource management and data flow coordination to new heights with the introduction of the Circuit interface. At the core of a Circuit resides a transmission and transformation engine that processes all data flowing through Channels, or designated Pipes, established by Conduits in a single-threaded manner, akin to a single event queue. A Circuit functions as a Virtual CPU Core, enabling scalability by augmenting the number of Circuits and distributing the Conduits across them.
All Conduits and Channels created by a Circuit share the same event queue and executor, ensuring that data delivery is ordered. This design simplifies system reasoning and is well-suited for real-time telemetry systems with crucial coherent and ordered streams. The ability to scale horizontally ensures it can handle large, distributed systems. The Circuit and Conduit abstractions lend themselves well to modular, event-driven architectures where components process data independently but remain part of a cohesive whole. Circuits‘ lightweight nature and ability to scale via partitioning make this model an excellent choice for edge computing, where resources are limited and operations must remain predictable.
Quality of Service
The strategic allocation of Conduits across Circuits forms the foundation of quality of service (QoS) management in the Substrates API. The system can maintain optimal performance characteristics even under varying loads by thoughtfully distributing conduits, each with its channels and processing requirements, across different circuits. This approach resembles how a well-designed computer architecture allocates processes across CPU cores to prevent resource contention.
The assignment of Conduits to Circuits enables fine-grained control over resource utilization and processing priorities. High-priority Conduits can be allocated to dedicated Circuits to ensure consistent processing latency. Conversely, bulk processing Conduits can be grouped on separate Circuits to prevent them from impacting critical paths. This separation of concerns at the Circuit level creates natural processing boundaries that prevent slow consumers or heavyweight processing in one Conduit from affecting the performance of others. For instance, a Conduit handling real-time market data updates can be assigned to a dedicated Circuit, ensuring its events are processed with minimal latency.
Brain-Inspired
The Substrates API’s design of Circuits, Conduits, and Channels closely resembles the intricate architecture of neural circuitry in the brain. Circuits function like specialized cortical regions, while Conduits act as neural pathways. Just as neurons form intricate networks through synaptic connections, Circuits establish connections through Subscribers and registered outlet Pipes, creating a dynamic mesh of communication pathways. Each Circuit processes its signals (data) in a coordinated manner, similar to how specialized brain regions process information. Conduits, akin to axonal pathways, provide type-safe Channels for signal transmission. The registration of Pipes through Subscribers mirrors the formation of synaptic connections, where neurons establish precise communication links.
When multiple Circuits are connected, they form higher-order processing networks, reminiscent of how different brain regions collaborate through neural pathways to process complex information. This biological parallel extends to the system’s adaptability, just as neural pathways can be strengthened or pruned, the connections between Circuits can be dynamically established, modified, or removed through the Subscriber and Registrar interfaces. The single-threaded nature of each Circuit ensures ordered processing within its domain, similar to how specialized brain regions maintain specific firing patterns. Additionally, the ability to connect Circuits enables distributed processing across the system, akin to how the brain coordinates activity across multiple areas to accomplish complex tasks.
Substrates Benchmarking
The Circuit interface doesn’t reveal much of the underlying queueing and execution details. This is to allow vendors to innovate in implementation, which can be tailored to specific workloads and environments.
In the following benchmark reporting, we’ll focus on evaluating the performance of the reference implementation currently under development. Let’s begin by creating two instances of a Circuit.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
var cortex = cortex (); var ingress = cortex.circuit ( cortex.name ( "system.ingress" ) ); var egress = cortex.circuit ( cortex.name ( "system.egress" ) ); |
We start with a single sensor that is Long typed Pipe. We will revisit the construction of the Conduit later.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
var sensors = ingress.conduit ( Long.class, Inlet::pipe ); var sensor = sensors.compose ( cortex.name ( "benchmark" ) ); |
Our initial benchmark will emit one billion data points into the Pipe and through the Circuit using a for-loop. Because of queuing, we will await completion using the Queue interface accessible via the Circuit.
1 2 3 4 5 6 7 8 9 |
var value = Long.valueOf ( 1L ); for ( int i = 0; i < LIMIT; i++ ) { sensor.emit ( value ); } ingress.queue ().await (); |
The best run took 12.5 seconds to complete, which is 80 million emits/sec. Mind you, not much is being done with the data values. There is no transformation of the value or transmission onto an observer.
There’s a more efficient way to generate a billion data points. The current Circuit implementation optimizes for generating more internally generated data flow events than externally. In the code below we create a loopback by registering the Conduit as a Subscriber to itself. When a sensor emits a value, it forwards it to itself, creating a recursive loop until a limit is reached. Normally such subscribing is across circuits.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
var sensors = ingress.conduit ( Long.class, channel -> channel.pipe ( path -> path.limit ( LIMIT ) ) ); sensors .source () .subscribe ( sensors ); sensor.emit ( 1L ); |
The above emit will result in the firing of a billion emits in 8.25 seconds, that is 121 million emits/sec.
The Circuit’s execution model optimizes for internally generated events, mimicking neural activations. A single event triggers multiple internal events, similar to how a single action potential in a neuron can cause downstream firings. This amplification pattern is inherent in event processing systems. The Circuit’s single-threaded engine efficiently manages this multiplication, ensuring ordered processing and preventing queue saturation. This aligns with how neural circuits manage signal propagation, where each firing triggers multiple downstream neurons. The Circuit achieves this through intelligent queue management and event batching, processing related events in cohesive batches like neural circuits process activation waves. This optimization is crucial because internally generated emits usually outnumber external emits.
Let’s add work to the Circuit beyond the recursive loopback. In the code below another Conduit is created using the same circuit. This Conduit creates a pipeline that emits the running total for a sensor.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
var observers = ingress.conduit ( Long.class, channel -> channel.pipe ( path -> path.reduce ( 0L, Long::sum ) ) ); sensors .source () .subscribe ( observers ); |
With observers included, processing one billion sensor emits takes 20.75 seconds, 48 million emits/sec. However, with sensors calling emits on observers, the rate doubles to 96 million emit/sec. The remaining delta is due to the Long::sum reduce step. Moving the observers to another circuit requires a small change.
1 2 3 4 5 6 7 8 9 |
var observers = egress.conduit ( // switching circuit Long.class, channel -> channel.pipe ( path -> path.reduce ( 0L, Long::sum ) ) ); |
Because of the minimalistic nature of the workload, it does not improve the throughput, in fact, the benchmark now takes 50 seconds, 40 million emits/second. The reason for this is the overhead in moving workload from one circuit to another, remember the circuits are optimized for processing internally generated work. Crossing circuit boundaries, such as threads and queues, has a cost that should be offset by substantial work to be done. But there are solutions to this including chunking, batching, and some internal development work in our research workstream, which we will reveal and explore in a future post.
Wrapping Up
The Substrates API minimizes instrumentation impact on monitored applications by isolating Circuits that manage resources independently. The Circuit’s queue prevents backpressure on instrumented code. This separation is crucial for production systems where monitoring and instrumentation don’t interfere with business operations. Running Circuits as independent execution engines decouples telemetry tasks from application processes, preventing resource contention. This ensures lightweight instrumentation with minimal overhead, even in resource-intensive environments. Developers can confidently integrate observability without sacrificing application reliability or responsiveness.
Benchmark Setup
1 2 3 4 5 6 7 8 9 |
Model: Mac14,10 Chip: Apple M2 Pro Cores: 12 (8 performance and 4 efficiency) Memory: 32 GB Java(TM) SE Runtime Environment (build 23.0.1+11-39) Java HotSpot(TM) 64-Bit Server VM (build 23.0.1+11-39, mixed mode, sharing) |