From Pipelines to Pathways
Traditionally, observability data pipelining operates like a single assembly line, where workers halt, examine items, and consult manuals before proceeding. This approach is functional but slow due to inspections.
Instead, consider micro-assembly lines for each item type, pre-programmed with specific actions. These lines share a common conveyor belt and workforce for efficiency, combining dedicated processing with shared resources. This strategy enhances speed and reduces overhead, transforming the pipeline model into a flexible and adaptable pathway system, akin to a dynamic flow graph or neural-synapse circuitry.
Alternatively, envision a post office with a single sorting line for all packages, regardless of destination. Workers at each station look for package destinations in a reference book. Despite the single line, constant checking and routing occur. This is how all observability toolkits, products, and platforms operate today.
Now, imagine dedicated paths for each package destination, like pre-established routes. Once on a path, there is no need for further intervention, as the path itself guides them. Despite multiple paths, all packages are handled by the same shared processing resources – decoupling routing from resourcing.
Instead of thinking of observability as one long pipeline, think of it as a graph with many possible routes. Once we know where something needs to go, we can create a specific route for it, like how a map app creates a particular path for your destination instead of having to check a map at every intersection.
Routing through Registrations
In the last post in this series, we demonstrated how to subscribe to subject-specific emission feeds using the Substrates API’s Source, Subscriber, and Registrar interfaces. In this post, we explore the reason for the separation between subscribers and registration of pipes which is unlike the traditional approach in many data pipelines, whether it pertains to the transmission or transformation of observability data.
In stream processing or event-driven architectures, a subscriber or processor functions similarly to a stage in a pipeline. Data flows sequentially from one subscriber to another, with each subscriber encapsulating transformation, transmission, and state management logic. A subscriber extracts a key from the data, retrieves a value from a key-value store, updates it with the new value, and then forwards the data to the next stage. For instance, consider a counter that generates increments. In the pipeline, a stage maintains cumulative totals for each counter, keyed by name. This stage emits the cumulative total per key, which is then utilized in another stage to calculate averages over a time window for each key.
Logging a value to a hierarchical log category and delivering it to each enclosing log category is more challenging to implement as a traditional pipeline, instead, a subscriber must transverse the hierarchy, from child node to parent node, every time a message is delivered.
The Substrates API offers a unique approach to pipeline design. Unlike traditional pipelines, a Subscriber functions more as a factory of pathways (assembly lines) than a data processor. When a Subscriber receives the notification of a new Subject created within a Conduit, it constructs a subject-specific pipeline by registering one or more pipes with the Registrar for the Subject. The lookup that is repeated in other pipeline solutions is only done once. Let’s take the logger example and construct it with raw data pipes.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
var cortex = cortex (); var circuit = cortex.circuit (); var conduit = circuit.conduit ( String.class, Inlet::pipe ); var source = conduit.source (); var subscription = source.subscribe ( ( subject, registrar ) -> { var name = subject.name (); // print an announcement of the subject out.println ( name + " !!" ); // link the parent pipe with the child pipe name .enclosure () .ifPresent ( parent -> registrar.register ( conduit.compose ( parent ) ) ); } ); |
In the above code, we create a Conduit that serves up named Pipes with a String-typed emittance. The code then calls the subscribe method on the Source underlying the Conduit providing a Subscriber that inspects the name of the Subject in question and determines whether the name has a prefix. If there is a prefix it creates a direct connection between the subject’s pipe and the pipe of the subject’s parent.
Let’s observe data flowing along pathways after emitting two values into the “corp.dept.inbox” pipe.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
// sniff on the flowing of data source.consume ( capture -> out.printf ( "%s -> %s%n", capture.subject ().name (), capture.emission () ) ); var pipe = conduit.compose ( cortex.name ( "corp.dept.inbox" ) ); pipe.emit ( "hello" ); pipe.emit ( "goodbye" ); |
Below the console output, where the registration of a hierarchical named subject is only printed once followed by the chaining of emitted values from one pipe to another – extremely efficient pathways!
1 2 3 4 5 6 7 8 9 10 11 |
corp.dept.inbox !! corp.dept.inbox -> hello corp.dept !! corp.dept -> hello corp !! corp -> hello corp.dept.inbox -> goodbye corp.dept -> goodbye corp -> goodbye |