Benchmarking Substrates

Measuring the Cost of Signal Circulation

Humainary’s Substrates is built for systems where one signal can lead to many. A reading can become a condition, a condition can become a judgment, a judgment can become a projection, and a projection can become a situation.

The performance question for such systems has two parts: the cost of signals entering the circuit, and the cost of signals circulating once they are inside.

This article examines the throughput characteristics of Humainary’s Service Provider Implementation of the Substrates Java API. Internally, JMH is used for micro-benchmarking. For this article, the examples are presented in a more direct form, using simple setup code and timing, so the benchmark scenarios are easier to follow.

The objective is to establish the operating envelope for local signal cascades.

Why Transit Matters

Substrates targets workloads where most computation happens after a signal has entered the system.

A digital twin for site reliability engineering may receive a single environmental reading, then produce a cascade of derived signals: state correlations, anomaly classifications, predictive propagations, and updates across a model of the production system. A data fusion system may converge multiple sensor streams into composite observations. Each composition may then produce its own downstream emissions. A granular computing system may decompose a coarse signal into nested local computations. An analytics pipeline may transform each ingress event into a tree of feature extractions, aggregations, and judgments. In each case, a modest external ingress rate generates internal traffic many times larger than the original input stream.

This internal traffic is the critical workload. Substrates separates two kinds of traffic:

Ingress traffic is traffic entering a circuit from outside its processing thread.
Transit traffic is traffic produced from within the circuit after processing has already begun.

Transit traffic is where higher-level meaning forms. Judgments, fusions, propagations, classifications, and derived signals all depend on the ability to circulate signals cheaply inside the circuit.

These benchmarks establish a boundary condition. They show how much raw signal circulation the substrate can sustain before higher-level constructs become the limiting cost.

Benchmark Units

Before looking at the numbers, it is useful to clarify what is being measured.

An emission is a value emitted through a pipe. An ingress emission originates outside the circuit’s processing thread. It enters the circuit’s execution path. A transit emission originates from work already running on the circuit’s processing thread. It remains inside the circuit’s execution path. A subscriber observes subjects and registers receptors. A receptor receives emissions. A stem dispatch propagates an emission upward through a hierarchical name structure, allowing observers to attach at different levels of granularity.

The following benchmarks isolate these costs one by one.

Ingress Traffic

The first benchmark pushes one billion emissions through a Pipe, with no subscribers registered. The objective is to gauge the cost of ingress traffic — calls originating outside the circuit’s processing thread, which enqueue work for the circuit to process.

var LIMIT  = 1_000_000_000;

var circuit  = CORTEX.circuit ();
var conduit  = circuit.conduit ( Long.class );
var pipe     = conduit.get ( CORTEX.name ( "pipe" ) );

// use the same emission for all calls to reduce GC pressure
var emission = Long.valueOf ( 1L );

var start = nanoTime ();

for ( int i = 0; i < LIMIT; i++ ) {
  pipe.emit ( emission );
}

// wait for *all emissions* to be completed
circuit.await ();

var time = nanoTime () - start;

A benchmark run produced the following output:

duration:  3792 ms
call cost: 3.79 ns
thru-put:  263.67 M/sec

This is the base ingress cost: emitting into the circuit from the outside, with no further work attached. More than a quarter billion emissions per second establishes the raw throughput available before any higher-level work is added.

Ingress with a Subscriber

The next benchmark adds a subscriber that registers a no-op receptor for each subject it is informed about.

conduit.subscribe (
  circuit.subscriber (
    CORTEX.name ( "no-op" ),
    ( subject, registrar ) -> {
      registrar.register ( 
        // register a no-op receptor
        emission -> {} 
      );
    }
  )
);

The benchmark output shows that the presence of a no-op receptor makes little difference.

duration:  3710 ms
call cost: 3.71 ns
thru-put:  269.53 M/sec

This is the base ingress cost: emitting into the circuit from the outside. The pipe enqueues every emission through the circuit’s ordering mechanism. The API specification requires this: subscription registration is itself a work unit on the same ordered queue, and the natural ordering between emissions and registrations must be preserved. A subscription registered before an emission observes that emission. A subscription registered after observes subsequent emissions. Every emission therefore carries the full cost of ordered enqueue. More than a quarter billion ordered enqueues per second establishes the raw throughput available before any higher-level work is added.

There is no short-circuit before the circuit!

Transit Traffic

Substrates is specifically designed for the generation of new emissions from existing emissions. This is transit traffic.

A signal enters from the outside. Once inside the circuit, it may produce further signals. Those signals may produce still more signals. This is the basic shape of local cascade computation. The pattern resembles a signaling pathway. One environmental stimulus enters. A chain of internal effects follows. Meaning forms through propagation.

In the next benchmark, a subscriber consumes emissions by dispatching each emission back to the same pipe through a pool. To bound the loop, a Fiber is installed that limits the number of emissions. A Fiber creates a processing pipeline for each named pipe (each subject). Since this benchmark uses a single pipe repeatedly, the limit on the fiber applies to the total emission throughput.

var circuit = CORTEX.circuit ();
var conduit = circuit.conduit ( Long.class );

var fiber   = CORTEX.fiber ( Long.class ).limit ( LIMIT );
var pool    = conduit.pool ( fiber );

// pipe the output of a pipe back into itself by way of a pool
var subscriber   = circuit.subscriber ( CORTEX.name ( "loop" ), pool );
var subscription = conduit.subscribe ( subscriber );

var pipe     = pool.get ( CORTEX.name ( "pipe" ) );
var emission = Long.valueOf ( 1L );

var start = nanoTime ();

// trigger the first and only ingress emission
pipe.emit ( emission );

// wait for all emissions to be completed
circuit.await ();

The benchmark output:

duration:  2569 ms
call cost: 2.57 ns
thru-put:  389.24 M/sec

The first emission is ingress traffic. It enters the circuit from the outside. The remaining emissions are transit traffic. They originate from a receptor already running on the circuit’s processing thread. Dispatch through the pool stays in-thread. The signal remains on a single execution path, and the per-emission cost reduces to the dispatcher’s own work.

This is the operating mode Substrates is built for: local signal circulation.

Stems

The next benchmark concerns hierarchical routing.

When creating a Conduit, a routing option selects dispatch behavior. The default routing mode is PIPE. In this mode, an emission is dispatched to subscribers registered against the target named pipe’s subject. The alternative mode is STEM. With STEM, an emission propagates upward through the name hierarchy, leaf first.

An emission to:

a.b.c

is dispatched to observers of:

a.b.c
a.b
a

Ancestors are created lazily. They serve as attachment points for hierarchical observation, without requiring pipes of their own.

Many operational systems are naturally hierarchical. A metric emitted at:

service.api.requests

may need to be observed at:

service.api.requests
service.api
service

Each level represents a different granularity of concern. The leaf belongs to the emitter. The parent belongs to a subsystem observer. The root belongs to an operational dashboard, aggregate, or situational model.

Hierarchical routing allows emitters to publish at the level they own, while observers subscribe at the level that matches their concern. The emitter publishes; the hierarchy delivers. That is the architectural value of stems.

Before STEM Routing

Hierarchical propagation can be manually implemented. When a subscriber receives a new subject notification, it traverses the subject’s name enclosure and registers each ancestor pipe as a receptor on that subject’s pipe. An emission from the leaf node is then forwarded to every ancestor pipe in sequence, and each ancestor pipe, being itself a subject, triggers the same registration traversal. The cascade unfolds through ordinary dispatch.

var circuit = CORTEX.circuit ();
var conduit = circuit.conduit ();

conduit.subscribe (
  circuit.subscriber (
    CORTEX.name ( "pre-stem" ),
    ( subject, registrar ) ->
      subject.name ().enclosure (
        prefix -> registrar.register ( 
          conduit.get ( prefix ) 
        )
      )
  )
);

The benchmark used the name:

a.b.c.d.e

This means that each leaf emission cascaded through five name parts.

The benchmark output:

duration:  24684 ms
call cost: 4.94 ns
thru-put:  202.55 M/sec

The cost reflects dispatch across the nested pipe structure. The duration grows with the depth of the hierarchy because each ingress emission produces work at every ancestor level.

The result is sound for many workloads. It also reveals a structural property: with manual propagation, hierarchy is expressed as user-level cascade logic running on top of dispatch. Each level pays the full per-dispatch cost. That observation motivates a more direct routing strategy.

STEM Routing

With the STEM option provided at conduit construction, hierarchical dispatch becomes part of the conduit’s routing behavior.

var circuit = CORTEX.circuit ();
var conduit = circuit.conduit ( Object.class, STEM );

The benchmark output with STEM routing and no registered subscribers:

duration:  6595 ms
call cost: 1.32 ns
thru-put:  758.04 M/sec

Five billion stem dispatch steps are completed in approximately 6.6 seconds. With STEM, hierarchy is part of the routing structure of the conduit. Named subjects, local circulation, hierarchical observation, and low-cost propagation through levels of concern align with the intent of Substrates.

What the Benchmarks Show

The benchmark results stand for the substrate, not for any particular application. Real systems perform real work. They classify. They aggregate. They allocate. They compare. They query. They persist. They communicate.

Those operations dominate the cost profile.

The purpose of these benchmarks is to establish that the substrate itself can support very high rates of signal circulation before domain logic is added. That matters for systems in which signals do not remain isolated. A sensor reading may update state. A state update may change status. A status change may alter situation. A situation change may trigger projection. A projection may produce recommendations, mitigations, or alerts. The cost of moving through that chain has to be low enough that architects can design with cascades as a normal form of computation.

The benchmark results show that this is feasible. Ingress traffic is fast. Transit traffic is faster. Hierarchical routing can be made extremely cheap. Together, these results define the performance envelope for signal-based systems built.

Why This Matters

Most observability and event-processing systems treat emitted data as material to be collected, transported, stored, queried, and interpreted elsewhere. Substrates is organized around a different assumption.

Signals should be able to circulate locally.
Meaning should be able to form near the source.
Observers should be able to attach at the level of concern they govern.
Derived signals should not require leaving the execution context unless the architecture requires it.

This shifts the design center.

Instead of treating signals as records waiting for later interpretation, Substrates treats them as active material inside a governed execution medium. That is why transit performance matters. If transit is expensive, internal meaning is expensive. If internal meaning is expensive, systems fall back to external interpretation. If systems fall back to external interpretation, they accumulate logs, metrics, traces, queries, dashboards, and human cognitive load.

Cheap transit opens another path.

Signals can become signs.
Signs can become status.
Status can become situation.
Situation can become action.

Not after the fact. Inside the system.

Hardware and Runtime Information

The benchmark run used the following hardware and runtime environment:

Chip: Apple M4  
CPU Cores: 10 total (4 performance, 6 efficiency)  
Memory: 24 GB

java version "26" 2026-03-17
Java(TM) SE Runtime Environment (build 26+35-2893)
Java HotSpot(TM) 64-Bit Server VM (build 26+35-2893, mixed mode, sharing)

Closing

These benchmarks establish that local signal circulation is cheap enough to serve as an architectural primitive. A signal enters. The circuit carries it. Observers attach. Derived emissions follow. Hierarchy forms. Meaning circulates. The performance question for systems of this kind is how much internal consequence the system can sustain once the first signal arrives. The results above answer that question by establishing the operating envelope. Signal cascades become first-class computational material.