• Systems, Silos, and Simplicity
    Organizational silos form in complex systems when collaboration becomes costly or uncertain, leading to inefficiencies and communication barriers. Effective integration requires balancing standardization with simplification, fostering collaboration across units, and managing the tension between short-term metrics and long-term transformative work.
  • Observability: A to Z
    This article presents an A-Z glossary of key concepts related to observability in complex systems and software engineering. It covers topics ranging from Attention and Boundaries to Topologies, emphasizing the importance of intelligent data analysis, contextual understanding, and adaptive learning in monitoring and managing modern distributed systems.
  • Task-Centricity: The Future of Human-AI Collaboration
    In an era where AI is rapidly transforming our digital landscape, how can we ensure that human-AI collaboration reaches its full potential? The answer lies in a paradigm shift towards task-centricity.
  • Observability: New Tooling Metaphors
    The observability community should move away from traditional metaphors like pillars and pipelines and adopt new ones like substrates and circuits. By doing this, we can gain a new and innovative outlook on tools and techniques, leaving behind outdated thinking that prioritizes data over decisions and content over control.
  • Observability: Rethinking Metaphors
    The prevailing metaphors of pillars and pipelines in observability have limited our understanding and hindered progress. These metaphors promote siloed thinking and a focus on data collection over actionable insights.
  • Observability Standards: Backward vs Forward
    Forward-looking standards, also known as anticipatory standards, are designed to shape and guide the future development of technologies.
  • From Abstraction to Simplicity
    Abstraction and simplification are two fundamental principles that often work together in the design of systems. With abstraction, we reduce system complexity by focusing on the essential aspects in the area of structure, elements, and behavior.
  • The OSSification of Observability
    Here we explore why the industry needs to move beyond the legacy tools and embrace a more dynamic and adaptable approach to gleaning genuine value from the ever-growing ocean of data collected.
  • Climbing the Conceptual System
    As engineering systems grow ever more complex, the engineering community’s focus on simplistic measurement and reporting hinders achieving operational scalability by way of sensemaking and steering of such systems of systems.
  • Software Performance Optimization Heuristics
    To acquire the knowledge of suitable software performance heuristics, developers must experience software execution in a new, more modern manner – a simulated environment of episodic machine memory replay.
  • The Past, Present, and Future will be Simulated
    The mirroring of software execution behavior, as performed by Simz (online) and Stenos (offline), has the potential to be one of the most significant advances in software systems engineering. Its impact could be as significant as that of distributed computing.
  • Introducing Signals – The Next Big Thing
    This post introduces the reasoning, thinking, and concepts behind a technology we call Signals, which we believe has the potential to have a profound impact on the design and development of software, the performance engineering of systems, and the management of distributed interconnected applications and services.
  • Transcending Code, Containers, and Cloud
    There is always tension between adaptability and structural stability in engineering and possibly life. We want our designs to be highly adaptable. With adaptation, our designs attempt to respond to change, sensed within the environment, intelligently with more change, though far more confined and possibly transient, at least initially. But there are limits to how far we can accelerate adaptation without putting incredible stress on the environment and the very system contained within.
  • Beyond Big Data – Mirrored Algorithmic Simulation
    Today, the stimulus used to develop machine intelligence is sensory data, which is transferred between devices and the cloud – the same data that concerns many consumers. But what if instead of sending data related to such things as a thermostat’s temperature set point, what was transmitted mostly concerned the action taken by the embedded software machine – an episodic memory of the algorithm itself?
  • Circuits, Conduits, and Counters
    Our brain houses billions of neurons (nerve cells) that communicate with each other through intricate networks of neural circuits. These circuits play a fundamental role in various cognitive functions, sensory processing, motor control, and generating thoughts and emotions. Why should it be different for Observability?
  • Observability – The Significant Parts
    Most current observability technologies don’t fair well as a source of behavioral signals or inferred states. They are not designed to reconstruct behavior that would allow the level of inspection we would need to translate from measurement to signal and, in turn, the state effectively. They are designed with data collection and reporting in mind of the event, not the signal or state.
  • Observability – Flat and Stateless
    We should not differentiate whether an agent is deployed, especially with companies electing to manually instrument some parts of an application’s codebase using open-source observability libraries. Instead, we should consider whether the observer, an agent or library, is stateless concerning what and how it observes, measures, composes, collects, and transmits observations.
  • Bounded Observability
    Reducing and compressing measurements is critical, which is much helped by representations extracted from the environment via hierarchical boundary determination. When this is not done automatically, what happens then is that the custom dashboard capabilities of the Observability solution need to be used to reconstruct some form of structure that mirrors the boundaries all but lost in the data fog. Naturally, this is extremely costly and inefficient for an organization.
  • The Data Fog of Observability
    The overemphasis on data instead of signals and states has created a great fog. This data fog leads to many organizations losing their way and overindulging in data exploration instead of exploiting acquired knowledge and understanding. This has come about with the community still somewhat unconcerned with a steering process such as monitoring or cybernetics.
  • Observability in Perspective
    There are many perspectives one could take in considering the observability and monitoring of software services and systems of services, but here below are a few perspectives, stacked in layers, that would be included.
  • Change in Observability
    Observability is effectively a process of tracking change. At the level of a measurement device, software or hardware-based, change is the difference in the value of two observations taken at distinct points in time. This change detection via differencing is sometimes called static or happened change. Observability is all about happenings.
  • A Story of Observability
    Once upon a time, there was a period in the world where humans watched over applications and services by proxy via dashboards housed on multiple screens hoisted in front of them – a typical mission control center. The interaction between humans and machines was relatively static and straightforward, like the environment and systems enclosed.
  • Bi-directional Observability Pipelines
    Substrates changed everything by introducing the concept of a Circuit consisting of multiple Conduits fed by Instruments that allowed Observers to subscribe to Events and, in processing such Events, generate further Events by way of calling into another Instrument. But with the introduction of Percept and Adjunct, it is now possible for Observers attached to Circuit and its locally registered Sources to process Events that have come from a far-off Circuit within another process.
  • The Evolution of Substrates
    With the latest update to the Substrates API, the metamorphosis to a general-purpose event-driven data flow library interface supporting the capture, collection, communication, conversion, and compression of perceptual data through a network of circuits and conduits has begun.
  • A Situational Control Tower
    It is time for new direction closer aligned to goals, focused more on the dynamics of systems that humans are already highly adapted to with their social intelligence, within which situation is a crucial conceptual element of the cognitive model. Understanding and appropriately responding to different social situations is fundamental to social cognition and effective interpersonal interactions.
  • Observability: Disruptions
    Disruptions are one factor affecting the maintenance of service quality levels. A disruption is an interruption in the flow of (work) items through a network that can, for a while, make it inoperable or where the network flow performance is subpar. Depending on the severity of the disruption, a network may need to replan and restructure itself for a period afterward. There are two main categories of disruptions: disturbance and deviation.
  • Observability: Projecting Ahead
    The low-level data captured in volume by observability instruments has closed our eyes to salient change. We’ve built a giant wall of white noise. The human mind’s perception and prediction capabilities evolved to detect significant changes to our survival. Observability has no steering mechanism to guide effective and efficient measurement, modeling, and memory processes. Companies are gorging on ever-growing mounds of observability data collected that should be of secondary concern.
  • Priming Observability for Situations
    The Recognition-Primed Decision (RPD) model asserts that individuals assess the situation, generate a plausible course of action (CoA), and then evaluate it using mental simulation. The authors claim that decision-making is primed by recognizing the situation and not entirely determined by recognition. The model contradicts the common thinking that individuals employ an analytical model in complex time-critical operational contexts.
  • Observability: The OODA Loop
    The OODA loop emphasizes two critical environmental factors – time constraints and information uncertainty. The time factor is addressed by executing through the loop as fast as possible. Information uncertainty is tackled by acting accurately. The model’s typical presentation is popular because it closes the loop between sensing (observe and orient) and acting (decide and act).
  • From Data to Dashboard
    Data and information are not surrogates for a model. Likewise, a model is not a dashboard built lazily and naively on top of a lake of data and information. A dashboard and many metrics, traces, and logs that come with it are not what constitutes a situation. A situation is formed and shaped by changing signals and states of structures and processes within an environment of nested contexts (observation points of assessment) – past, present, and predicted.
  • Verbal Protocol Analysis for Observability
    VPA is a technique used by researchers across many domains, including psychology, engineering, and architecture. The basic idea is that during a task, such as solving a problem, a subject will concurrently verbalize, think aloud, what is resident in their working memory – what they are thinking during the doing. Using Protocol Analysis, researchers can elicit the cognitive processes from start to completion of a task. After further processing, the information captured is analyzed to provide insights that can improve performance.
  • Streamlining Observability Pipelines
    The next generation of Observability technologies and tooling will most likely take two distinctly different trajectories from the ever-faltering middle ground that distributed tracing and event logging currently represent. The first trajectory, the high-value road, will introduce new techniques and models to address complex and coordinated system dynamics in a collective social context rebuilding a proper foundation geared to aiding both humans and artificial agents.
  • Hierarchies in Observability
    When designing observability and controllability interfaces for systems of services, or any system, it is essential to consider how it connects the operator to the operational domain regarding the information content, structure, and visual forms. What representation is most effective in the immediate grounding of an operator within a situation?
  • Observability Scaled: Attention & Awareness
    Because of limited processing resource capacities, brains focus more on some signals than others – signals compete for the brain’s attention. This internal competition is partially under the bottom-up influence of a sensory stimuli model and somewhat under the top-down control of other mental states, including goals – this is very similar to how situational awareness is theorized to operate optimally.
  • Situational Awareness in Systems of Services
    Unfortunately, many of the solutions promoted in the Observability space, such as distributed tracing, metrics, and logging, have not offered a suitable mental model in any form whatsoever. The level of situation awareness is still sorely lacking in most teams, who appear to be permanently stalled at ground zero and overtly preoccupied with data and details.
  • Observability is Yesteryear’s Monitoring
    Looking back over 20 years of building application performance monitoring and management tooling, little has changed, though today’s tooling does collect more data from far more data sources. But effectiveness and efficiency have not improved; it could be argued that both have regressed.
  • The Solution is not Distributed Tracing
    Science and technology have made it possible to observe the motion of atoms, but humans don’t actively watch such atomical movements in their navigation of physical spaces. Our perception, attention, and cognition have evolutionary scaled to an effective model for us in most situations. Distributed tracing spans, and the data items attached, are the atoms of observability.
  • Observability – The Two Hemispheres
    Two distinct hemispheres seem to form within the application monitoring and observability space – one dominated by measurement, data collection, and decomposition, the other by meaning, system dynamics, and (re)construction of the whole.
  • Scaling Observability for IT Ops
    The underlying observability model is the primary reason for distributed tracing, metrics, and event logging failing to deliver much-needed capabilities and benefits to systems engineering teams. There is no natural or inherent way to transform and scale such observability data collection analysis to generate signals and inferring states.
  • Humanizing Observability and Controllability
    Humanism is a philosophical stance at the heart of what Humainary aims to bring to service management operations. It runs counter to the misguided trend of wanton and wasteful extensive data collection so heavily touted by those focused on selling a service rather than solving a problem, now and in the future.
  • Simplicity and Significance in Observability
    As computing and complexity scaled up, the models and methods should have reduced and simplified the communication and control surface area between man and machine. Instead, monitoring (passive) and management (reactive) solutions have lazily reflected the complexity’s nature at a level devoid of simplicity and significance but instead polluted with noise.
  • Observability – A Multitude of Memories
    There are at least two distinct paths to the future of observability. One path that would continue increasing the volume of collected data in its attempt to reconstruct reality in high-definition on a single plane with little consideration for effectiveness or efficiencies. Another would focus on seeing the big picture in near-real-time from the perspective of human or artificial agents.
  • AIOps – A Postmodern Observability Model
    We propose a model which can better serve site engineering reliability and service operations by being foundational to developing situational awareness capabilities and system resilience capacities, particularly adaptability and experimentation, as in dynamic configuration and chaos engineering.
  • AIOps – The Double Cone Model
    The Double Cone Model is a valuable conceptualization in thinking about more efficient and effective methods to handle data overload and generate far more actionable insight from a model much closer to how the human mind reasons about physical and social spaces.
  • AIOps – Visibility and Cognition
    All points of experience within a topology offer some visibility, but the language (codes, syntax) and model (concepts) employed can differ greatly. This is problematic when the goal is to determine the intent and outcome of an interaction’s operation(s).
  • AIOps – Why Service Cognition?
    Today’s data, such as logs, traces, and metrics, are too far removed to be the basis for a language and model that illuminates the dynamic nature of service interaction and system stability inference and state prediction formed across distributed agents.
  • AIOps – The Observer
    Observability is purposefully seeing a system in terms of operations and outcomes. In control theory, this is sometimes simplified to monitoring inputs and outputs, with the comparative prediction of the output from input, possibly factoring in history.
  • The Intelligence in AIOps
    It could be argued that no one fully understands what AIOps pertains to now in its aspirational rise within the IT management industry and community. AIOps is a moving target and a term hijacked by Observability vendor marketing. It’s hard to pin down.
  • Contextualizing with Circuits
    In interpreting a script or a scene within a movie, humans must identify the setting and actors and understand the dialog from the multi-sensory feed flowing into the brain. Observability is somewhat similar, except that solutions today have not had a billion years to evolve efficient and effective ways of detecting the salient features.
  • Contextualizing Observability
    Context is crucial when it comes to the Observability of systems. But Context is an abstract term that is hard to pin down. Does it represent structure as in the configuration of software components? Does it represent behavior as in tracing a service request? Does it represent some attributes associated with a metric? Does it encompass purpose?
  • Circuiting Clock Cycles
    Many software systems have self-regulation routines that must be scheduled regularly. Observability libraries and toolkits are no different in this regard, with the sampling of metric values or resource states being notable examples; another less common one would be the status of inflight workflow constructs.
  • From Event Pipelining to Stream Processing
    With an event-driven architectural approach as Substrates, whenever a value needs to be calculated from a series of events, a stateful consumer invariably must use another circuit component to continuously publish the result after each event processing. But there is an alternative option in the Substrates API.
  • Interactionless Observability Instruments
    The Substrates API has two basic categories of Instrument interfaces. The first category includes Instrument interfaces that offer a direct means of interaction by a caller. The second type of Instrument is one with no direct means of triggering Event publishing in the interface.
  • From Many to One Observability Model
    Today the approach to observability at various stages within the data pipeline, from Application to Console, has been to create different models in terms of concepts, structure, features, and values. But what if the model employed were the same across all stages in a data pipeline?
  • Multi-Event-Casting with Inlets
    The typical flow of execution for an observability Instrument is for instrumentation within the application code to make a single call to a method defined in the Instrument interface. But there are cases where a single call into an Instrument interface causes the dispatching of multiple events.
  • Scaling Observability Resources
    A significant challenge in building observability agents or heavily instrumented applications or frameworks is in scaling, both up and down, the resources consumed. There is a trade-off here in terms of time and cost.
  • Composing Instrument Circuits
    In this post, we walk through one of the showcases in the project’s code repository that demonstrates how the complexity of hooking up components in a Circuit is greatly simplified.
  • Designing for Extensibility
    The interfaces defined within Substrates API are designed with extensibility and evolution in mind, both from a client library and provider implementation perspective.
  • Pipelines and Projections
    An objective of the Substrates API is that it should be possible for developers to be location independent in that code can execute without change within the local process itself or a remote server that is being served the data via a Relay.
  • Simplifying Instrument Synthesis
    Three overriding principles are applied in the design of the Substrates API – consistency (standardizing), conciseness (simplifying), and correctness (sequencing).
  • API Design – A Moving Target
    Good design takes time, over many iterations (of converging and diverging design paths), in developing, discovering, discussing, discounting, and occasionally destroying.
  • Data Pipeline Segmentation
    The stages within a local data pipeline managed by the Substrates runtime are detailed, from a client call to an Instrument interface method to dispatching an Event to an Outlet registered by a Subscriber.
  • Playing with Pipelines
    A walk through of one of the Substrates showcase examples hosted on GitHub, demonstrating two of the most critical aspects of pipelines – the production and consumption of captured data and published events.
  • Pipelines Reimagined
    Using Substrates, the fusion of multiple streams of data from multiple sources, an essential process of any monitoring and management solution, can be done in-process and in real-time.
  • Circuits, not Pipelines!
    The first official release of the Substrates API is nearing completion. With that, it is time to explore one of the toolkit’s most significant differences compared to other approaches, such as OpenTelemetry.
  • Humainary vs OpenTelemetry – Spatial
    For Humainary, the goal is to encourage as much as possible the analytical processing of observations at the source of event emittance and in the moment of the situation. To propagate assessments, not data.
  • A Roadmap for an Observability Toolkit
    We have divided up our mission for the Humainary Toolkit into three phases, with each phase mapping to one or more layers: Measure, Model, and Memory
  • Measurement and Control 2022
    Since the very beginning of the hype of Observability, we have contended that the link with Controllability must be maintained for there ever to be a return on investment (ROI) that matches the extravagant claims from vendors pushing a message of more-is-better.
  • Humane Factors in Observability – Part 1
    This two-part series will discuss critical factors that weighed heavily in our rethinking of Observability and how they manifest in our toolkit under the headings: conceptualization, communication, coordination, collaboration, and cognition.
  • A Modern Observability Library Toolkit
    The Humainary project aims to bring much-needed sensibility, streamlining, simplicity, and sophistication back to an area that seems to fight forcefully not to move past yesteryear technologies like logging and tracing.