AIOps – Visibility and Cognition

Caller Callee

This is the fourth post in a series on forming a working definition and possible future realization of AIOps. This post focuses on the degree of visibility observers have into the events occurring within a service interaction between two components separated by a communications network: a caller (service consumer) and a callee (service provider). The network also separates the observer.

To keep the post simple and short, the remote procedure call style is assumed to be synchronous. The caller sends a request over a connection, and the callee replies to the received request with a response over the same connection. Below is a topology depiction.

Visibility Topology

There are at least three points within the topology where instruments can be deployed. The most obvious and valuable place is where an instrument is embedded within the software component and at the exact point of execution (experience) within the application workflow. Another option is where the instrument uses some underlying interception mechanism within the networking to capture packets. An alternative in the same networking domain is to employ a proxy. Each has drawbacks in what can be accurately verified.

All points of experience within a topology offer some visibility, but the language (codes, syntax) and model (concepts) employed can differ significantly. This is problematic when the goal is to determine the intent and outcome of an interaction’s operation(s).

The networking interception option deals with packets and attempts to reconstruct a request or response from them and then pair a request with a response. The network proxy (sidecar or service mesh) option observes the world of service interaction at the request and response level but cannot easily discern the application-level workflow. It is like seeing a SQL statement executed without enclosing transaction control, including sequencing of prior and post-related SQL statement executions.

While a network proxy can easily observe responses, it cannot determine how the caller will interpret them regarding the outcome assessment. It also cannot associate follow-up operations such as a recourse behavior like a cache fallback. Also, the proxy cannot even be sure that the payloads passing through it arrive and are acted on by the destination. It sorely misses execution context.

Instruments within the software component offer optimal visibility and association with the application and workflow context. Still, they are not without problems when there is a network between the caller and the callee. For example, it can receive an error from the networking layer for a request that was successfully received and processed but could not be replied to. Some edge cases can leave the caller or callee unsure of the outcome of an operation, and just like in transaction management, requires heuristics to be employed or a subsequent confirmation (acknowledgment) request to be made.

Cognition Distributed

In a perfect world, a super observer would reconcile each option’s discrepancies and visibility deficiencies. In reality, it is impractical, as each option observes, measures, and records the world differently, and to compound that report at different time resolutions and intervals. There is also the problem of each having a distinct way of referencing subjects (services, resources).

A more realistic approach would be to define a communicative language of signs (tokens) and model (concepts) that distributes some of the cognitive processing, of state assessment and situation awareness, to the points of experience (execution), bringing with it consistency and conciseness to transmissions. A mini-language that describes the behavioral semantics of service interaction independent of technology and topology that rises far above meangingless quantitative data points, enabling the ability to extract patterns of processual structures, infer system status (stability) dynamics, catalog symptoms (mapping of sequences to states), record causality (histories) and aid prediction (trajectories) by continuously narrowing probable paths (scenarios). With qualitative sign(al)s (tokens), this is all possible with the power of token sequencing, coupled with local contextual processing.