The future of observability is all about creating new tools that let us understand how systems work better, not just making the ones we have better. Companies that invest in extensible observability platforms get a big edge by learning more about their systems and making them work better, which means they’re faster, more reliable, and better for users. We need to think differently about observability, moving beyond the old three-pillar approach.
- Observability X – Pipes & Pathways
- Observability X – Sources
- Observability X – Contexts
- Observability X – Naming Percepts
- Observability X – Staging State
- Observability X – eXtensibility
- Observability X – Location Agnostic
In this age of AI, how can we make sure humans and AI work together as best as they can? The answer is to change how we think about working with AI. Instead of thinking of AI as a whole, let’s focus on the specific tasks it can do. By doing this, we can bridge the gap between humans and machines and make sure they work together seamlessly.
It’s hard to say exactly what AIOps means right now, since it’s still a new and evolving concept in the IT world. But most people agree that the goal of AIOps is to help humans manage complex software systems and microservices more efficiently.
- Climbing the Conceptual System
- Streamlining Observability Pipelines
- AIOps – A Postmodern Observability Model
- AIOps – The Double Cone Model
- AIOps – Visibility and Cognition
- AIOps – Why Service Cognition?
- AIOps – The Observer
- The Intelligence in AIOps
We’ve got a fresh idea for improving site engineering reliability and service operations. We’re proposing a model that sets the stage for developing situational awareness and system resilience. This model focuses on adaptability and experimentation, which are key to dealing with dynamic configurations and chaos engineering. Our framework is clear and concise, and it’s designed to create effective and efficient layered cognitive structures and processes that allow machines and humans to work together seamlessly.
- Bounded Observability
- The Data Fog of Observability
- Change in Observability
- A Situational Control Tower
- Observability: Disruptions
- Observability: Projecting Ahead
- Observability: The OODA Loop
- Situational Awareness in Systems of Services
- Observability – The Two Hemispheres
- Scaling Observability for IT Ops
- Humanizing Observability and Controllability
- Observability – A Multitude of Memories
- AIOps – A Postmodern Observability Model
- AIOps – The Observer
- Measurement and Control 2022
All
- Observability X – Pipes & PathwaysTraditionally, observability data pipelining operates like a single assembly line, where workers halt, examine items, and consult manuals before proceeding. This approach is functional but slow due to inspections. Instead of thinking of observability as one long pipeline, think of it as a graph with many possible routes. Once we know where something needs to go, we can create a specific route for it, like how a GPS creates a particular path for your destination instead of having to check a map at every intersection.
- Observability X – SourcesObservability has been reduced to the straightforward processing of data collected and its forwarding to a remote centralized endpoint. While frameworks such as OpenTelemetry have standardized this, they overlook a fundamental aspect of true observability: the dynamic nature of observation itself. A more effective and efficient observability approach establishes live, adaptable connections between information sources and observers, both locally and remotely, as well as online and offline.
- Observability X – ContextsIn our pursuit of comprehending complex systems through observability, we’ve developed increasingly convoluted tools that focus on capturing the structural context—the static backdrop of components, configurations, and infrastructure against which our systems function. However, these tools overlook the rich tapestry of behavioral and situational contexts that lend meaning to system events.
- Observability X – Naming PerceptsIn the realm of observability, naming holds paramount importance. It directly influences measurement tracking, correlation, and interpretation, significantly impacting system visibility and problem-solving capabilities. Names are essential for distinguishing the diverse range of system measurements collected by observability tools.
- Observability X – Staging StateManaging state in observability instrumentation presents unique challenges, particularly when dealing with concurrent operations. The Humainary Substrates API offers an elegant solution by offloading state management to pipeline processing. Through its Stage interface and built-in concurrency controls, developers can create custom instruments (percepts) without getting entangled in thread safety issues.
- Observability X – eXtensibilityToday’s observability toolkits represent a one-size-fits-all approach that fails to capture the nuanced needs of modern systems. This post advocates for extensible observability toolkits that empower teams to construct custom instruments (percepts) tailored to their requirements.
- Observability X – Location AgnosticOur approach to observability has been constrained by the notion that instruments and observers are fundamentally distinct entities. This post offers an alternative perspective: observers are themselves instruments, constructing more comprehensive and insightful observations by integrating data from lower-level instruments.
- The Complexity of SimplificationSimplification initiatives in organizations often paradoxically increase complexity due to misinterpretation and uncoordinated implementation across different levels. Achieving meaningful simplification requires a holistic approach, clear communication, and an understanding of complex systems dynamics to avoid the pitfalls of oversimplification or mere tactical efficiency improvements.
- Systems, Silos, and SimplicityOrganizational silos form in complex systems when collaboration becomes costly or uncertain, leading to inefficiencies and communication barriers. Effective integration requires balancing standardization with simplification, fostering collaboration across units, and managing the tension between short-term metrics and long-term transformative work.
- Observability: A to ZThis article presents an A-Z glossary of key concepts related to observability in complex systems and software engineering. It covers topics ranging from Attention and Boundaries to Topologies, emphasizing the importance of intelligent data analysis, contextual understanding, and adaptive learning in monitoring and managing modern distributed systems.
- Task-Centricity: The Future of Human-AI CollaborationIn an era where AI is rapidly transforming our digital landscape, how can we ensure that human-AI collaboration reaches its full potential? The answer lies in a paradigm shift towards task-centricity.
- Observability: New Tooling MetaphorsThe observability community should move away from traditional metaphors like pillars and pipelines and adopt new ones like substrates and circuits. By doing this, we can gain a new and innovative outlook on tools and techniques, leaving behind outdated thinking that prioritizes data over decisions and content over control.
- Observability: Rethinking MetaphorsThe prevailing metaphors of pillars and pipelines in observability have limited our understanding and hindered progress. These metaphors promote siloed thinking and a focus on data collection over actionable insights.
- Observability Standards: Backward vs ForwardForward-looking standards, also known as anticipatory standards, are designed to shape and guide the future development of technologies.
- From Abstraction to SimplicityAbstraction and simplification are two fundamental principles that often work together in the design of systems. With abstraction, we reduce system complexity by focusing on the essential aspects in the area of structure, elements, and behavior.
- The OSSification of ObservabilityHere we explore why the industry needs to move beyond the legacy tools and embrace a more dynamic and adaptable approach to gleaning genuine value from the ever-growing ocean of data collected.
- Climbing the Conceptual SystemAs engineering systems grow ever more complex, the engineering community’s focus on simplistic measurement and reporting hinders achieving operational scalability by way of sensemaking and steering of such systems of systems.
- Software Performance Optimization HeuristicsTo acquire the knowledge of suitable software performance heuristics, developers must experience software execution in a new, more modern manner – a simulated environment of episodic machine memory replay.
- The Past, Present, and Future will be SimulatedThe mirroring of software execution behavior, as performed by Simz (online) and Stenos (offline), has the potential to be one of the most significant advances in software systems engineering. Its impact could be as significant as that of distributed computing.
- Introducing Signals – The Next Big ThingThis post introduces the reasoning, thinking, and concepts behind a technology we call Signals, which we believe has the potential to have a profound impact on the design and development of software, the performance engineering of systems, and the management of distributed interconnected applications and services.
- Transcending Code, Containers, and CloudThere is always tension between adaptability and structural stability in engineering and possibly life. We want our designs to be highly adaptable. With adaptation, our designs attempt to respond to change, sensed within the environment, intelligently with more change, though far more confined and possibly transient, at least initially. But there are limits to how far we can accelerate adaptation without putting incredible stress on the environment and the very system contained within.
- Beyond Big Data – Mirrored Algorithmic SimulationToday, the stimulus used to develop machine intelligence is sensory data, which is transferred between devices and the cloud – the same data that concerns many consumers. But what if instead of sending data related to such things as a thermostat’s temperature set point, what was transmitted mostly concerned the action taken by the embedded software machine – an episodic memory of the algorithm itself?
- Circuits, Conduits, and CountersOur brain houses billions of neurons (nerve cells) that communicate with each other through intricate networks of neural circuits. These circuits play a fundamental role in various cognitive functions, sensory processing, motor control, and generating thoughts and emotions. Why should it be different for Observability?
- Observability – The Significant PartsMost current observability technologies don’t fair well as a source of behavioral signals or inferred states. They are not designed to reconstruct behavior that would allow the level of inspection we would need to translate from measurement to signal and, in turn, the state effectively. They are designed with data collection and reporting in mind of the event, not the signal or state.
- Observability – Flat and StatelessWe should not differentiate whether an agent is deployed, especially with companies electing to manually instrument some parts of an application’s codebase using open-source observability libraries. Instead, we should consider whether the observer, an agent or library, is stateless concerning what and how it observes, measures, composes, collects, and transmits observations.
- Bounded ObservabilityReducing and compressing measurements is critical, which is much helped by representations extracted from the environment via hierarchical boundary determination. When this is not done automatically, what happens then is that the custom dashboard capabilities of the Observability solution need to be used to reconstruct some form of structure that mirrors the boundaries all but lost in the data fog. Naturally, this is extremely costly and inefficient for an organization.
- The Data Fog of ObservabilityThe overemphasis on data instead of signals and states has created a great fog. This data fog leads to many organizations losing their way and overindulging in data exploration instead of exploiting acquired knowledge and understanding. This has come about with the community still somewhat unconcerned with a steering process such as monitoring or cybernetics.
- Observability in PerspectiveThere are many perspectives one could take in considering the observability and monitoring of software services and systems of services, but here below are a few perspectives, stacked in layers, that would be included.
- Change in ObservabilityObservability is effectively a process of tracking change. At the level of a measurement device, software or hardware-based, change is the difference in the value of two observations taken at distinct points in time. This change detection via differencing is sometimes called static or happened change. Observability is all about happenings.
- A Story of ObservabilityOnce upon a time, there was a period in the world where humans watched over applications and services by proxy via dashboards housed on multiple screens hoisted in front of them – a typical mission control center. The interaction between humans and machines was relatively static and straightforward, like the environment and systems enclosed.
- Bi-directional Observability PipelinesSubstrates changed everything by introducing the concept of a Circuit consisting of multiple Conduits fed by Instruments that allowed Observers to subscribe to Events and, in processing such Events, generate further Events by way of calling into another Instrument. But with the introduction of Percept and Adjunct, it is now possible for Observers attached to Circuit and its locally registered Sources to process Events that have come from a far-off Circuit within another process.
- The Evolution of SubstratesWith the latest update to the Substrates API, the metamorphosis to a general-purpose event-driven data flow library interface supporting the capture, collection, communication, conversion, and compression of perceptual data through a network of circuits and conduits has begun.
- A Situational Control TowerIt is time for new direction closer aligned to goals, focused more on the dynamics of systems that humans are already highly adapted to with their social intelligence, within which situation is a crucial conceptual element of the cognitive model. Understanding and appropriately responding to different social situations is fundamental to social cognition and effective interpersonal interactions.
- Observability: DisruptionsDisruptions are one factor affecting the maintenance of service quality levels. A disruption is an interruption in the flow of (work) items through a network that can, for a while, make it inoperable or where the network flow performance is subpar. Depending on the severity of the disruption, a network may need to replan and restructure itself for a period afterward. There are two main categories of disruptions: disturbance and deviation.
- Observability: Projecting AheadThe low-level data captured in volume by observability instruments has closed our eyes to salient change. We’ve built a giant wall of white noise. The human mind’s perception and prediction capabilities evolved to detect significant changes to our survival. Observability has no steering mechanism to guide effective and efficient measurement, modeling, and memory processes. Companies are gorging on ever-growing mounds of observability data collected that should be of secondary concern.
- Priming Observability for SituationsThe Recognition-Primed Decision (RPD) model asserts that individuals assess the situation, generate a plausible course of action (CoA), and then evaluate it using mental simulation. The authors claim that decision-making is primed by recognizing the situation and not entirely determined by recognition. The model contradicts the common thinking that individuals employ an analytical model in complex time-critical operational contexts.
- Observability: The OODA LoopThe OODA loop emphasizes two critical environmental factors – time constraints and information uncertainty. The time factor is addressed by executing through the loop as fast as possible. Information uncertainty is tackled by acting accurately. The model’s typical presentation is popular because it closes the loop between sensing (observe and orient) and acting (decide and act).
- From Data to DashboardData and information are not surrogates for a model. Likewise, a model is not a dashboard built lazily and naively on top of a lake of data and information. A dashboard and many metrics, traces, and logs that come with it are not what constitutes a situation. A situation is formed and shaped by changing signals and states of structures and processes within an environment of nested contexts (observation points of assessment) – past, present, and predicted.
- Verbal Protocol Analysis for ObservabilityVPA is a technique used by researchers across many domains, including psychology, engineering, and architecture. The basic idea is that during a task, such as solving a problem, a subject will concurrently verbalize, think aloud, what is resident in their working memory – what they are thinking during the doing. Using Protocol Analysis, researchers can elicit the cognitive processes from start to completion of a task. After further processing, the information captured is analyzed to provide insights that can improve performance.
- Streamlining Observability PipelinesThe next generation of Observability technologies and tooling will most likely take two distinctly different trajectories from the ever-faltering middle ground that distributed tracing and event logging currently represent. The first trajectory, the high-value road, will introduce new techniques and models to address complex and coordinated system dynamics in a collective social context rebuilding a proper foundation geared to aiding both humans and artificial agents.
- Hierarchies in ObservabilityWhen designing observability and controllability interfaces for systems of services, or any system, it is essential to consider how it connects the operator to the operational domain regarding the information content, structure, and visual forms. What representation is most effective in the immediate grounding of an operator within a situation?
- Observability Scaled: Attention & AwarenessBecause of limited processing resource capacities, brains focus more on some signals than others – signals compete for the brain’s attention. This internal competition is partially under the bottom-up influence of a sensory stimuli model and somewhat under the top-down control of other mental states, including goals – this is very similar to how situational awareness is theorized to operate optimally.
- Situational Awareness in Systems of ServicesUnfortunately, many of the solutions promoted in the Observability space, such as distributed tracing, metrics, and logging, have not offered a suitable mental model in any form whatsoever. The level of situation awareness is still sorely lacking in most teams, who appear to be permanently stalled at ground zero and overtly preoccupied with data and details.
- Observability is Yesteryear’s MonitoringLooking back over 20 years of building application performance monitoring and management tooling, little has changed, though today’s tooling does collect more data from far more data sources. But effectiveness and efficiency have not improved; it could be argued that both have regressed.
- The Solution is not Distributed TracingScience and technology have made it possible to observe the motion of atoms, but humans don’t actively watch such atomical movements in their navigation of physical spaces. Our perception, attention, and cognition have evolutionary scaled to an effective model for us in most situations. Distributed tracing spans, and the data items attached, are the atoms of observability.
- Observability – The Two HemispheresTwo distinct hemispheres seem to form within the application monitoring and observability space – one dominated by measurement, data collection, and decomposition, the other by meaning, system dynamics, and (re)construction of the whole.
- Scaling Observability for IT OpsThe underlying observability model is the primary reason for distributed tracing, metrics, and event logging failing to deliver much-needed capabilities and benefits to systems engineering teams. There is no natural or inherent way to transform and scale such observability data collection analysis to generate signals and inferring states.
- Humanizing Observability and ControllabilityHumanism is a philosophical stance at the heart of what Humainary aims to bring to service management operations. It runs counter to the misguided trend of wanton and wasteful extensive data collection so heavily touted by those focused on selling a service rather than solving a problem, now and in the future.
- Simplicity and Significance in ObservabilityAs computing and complexity scaled up, the models and methods should have reduced and simplified the communication and control surface area between man and machine. Instead, monitoring (passive) and management (reactive) solutions have lazily reflected the complexity’s nature at a level devoid of simplicity and significance but instead polluted with noise.
- Observability – A Multitude of MemoriesThere are at least two distinct paths to the future of observability. One path that would continue increasing the volume of collected data in its attempt to reconstruct reality in high-definition on a single plane with little consideration for effectiveness or efficiencies. Another would focus on seeing the big picture in near-real-time from the perspective of human or artificial agents.
- AIOps – A Postmodern Observability ModelWe propose a model which can better serve site engineering reliability and service operations by being foundational to developing situational awareness capabilities and system resilience capacities, particularly adaptability and experimentation, as in dynamic configuration and chaos engineering.
- AIOps – The Double Cone ModelThe Double Cone Model is a valuable conceptualization in thinking about more efficient and effective methods to handle data overload and generate far more actionable insight from a model much closer to how the human mind reasons about physical and social spaces.
- AIOps – Visibility and CognitionAll points of experience within a topology offer some visibility, but the language (codes, syntax) and model (concepts) employed can differ greatly. This is problematic when the goal is to determine the intent and outcome of an interaction’s operation(s).
- AIOps – Why Service Cognition?Today’s data, such as logs, traces, and metrics, are too far removed to be the basis for a language and model that illuminates the dynamic nature of service interaction and system stability inference and state prediction formed across distributed agents.
- AIOps – The ObserverObservability is purposefully seeing a system in terms of operations and outcomes. In control theory, this is sometimes simplified to monitoring inputs and outputs, with the comparative prediction of the output from input, possibly factoring in history.
- The Intelligence in AIOpsIt could be argued that no one fully understands what AIOps pertains to now in its aspirational rise within the IT management industry and community. AIOps is a moving target and a term hijacked by Observability vendor marketing. It’s hard to pin down.
- Contextualizing with CircuitsIn interpreting a script or a scene within a movie, humans must identify the setting and actors and understand the dialog from the multi-sensory feed flowing into the brain. Observability is somewhat similar, except that solutions today have not had a billion years to evolve efficient and effective ways of detecting the salient features.
- Contextualizing ObservabilityContext is crucial when it comes to the Observability of systems. But Context is an abstract term that is hard to pin down. Does it represent structure as in the configuration of software components? Does it represent behavior as in tracing a service request? Does it represent some attributes associated with a metric? Does it encompass purpose?
- Circuiting Clock CyclesMany software systems have self-regulation routines that must be scheduled regularly. Observability libraries and toolkits are no different in this regard, with the sampling of metric values or resource states being notable examples; another less common one would be the status of inflight workflow constructs.
- From Event Pipelining to Stream ProcessingWith an event-driven architectural approach as Substrates, whenever a value needs to be calculated from a series of events, a stateful consumer invariably must use another circuit component to continuously publish the result after each event processing. But there is an alternative option in the Substrates API.
- Interactionless Observability InstrumentsThe Substrates API has two basic categories of Instrument interfaces. The first category includes Instrument interfaces that offer a direct means of interaction by a caller. The second type of Instrument is one with no direct means of triggering Event publishing in the interface.
- From Many to One Observability ModelToday the approach to observability at various stages within the data pipeline, from Application to Console, has been to create different models in terms of concepts, structure, features, and values. But what if the model employed were the same across all stages in a data pipeline?
- Multi-Event-Casting with InletsThe typical flow of execution for an observability Instrument is for instrumentation within the application code to make a single call to a method defined in the Instrument interface. But there are cases where a single call into an Instrument interface causes the dispatching of multiple events.
- Scaling Observability ResourcesA significant challenge in building observability agents or heavily instrumented applications or frameworks is in scaling, both up and down, the resources consumed. There is a trade-off here in terms of time and cost.
- Composing Instrument CircuitsIn this post, we walk through one of the showcases in the project’s code repository that demonstrates how the complexity of hooking up components in a Circuit is greatly simplified.
- Designing for ExtensibilityThe interfaces defined within Substrates API are designed with extensibility and evolution in mind, both from a client library and provider implementation perspective.
- Pipelines and ProjectionsAn objective of the Substrates API is that it should be possible for developers to be location independent in that code can execute without change within the local process itself or a remote server that is being served the data via a Relay.
- Simplifying Instrument SynthesisThree overriding principles are applied in the design of the Substrates API – consistency (standardizing), conciseness (simplifying), and correctness (sequencing).
- API Design – A Moving TargetGood design takes time, over many iterations (of converging and diverging design paths), in developing, discovering, discussing, discounting, and occasionally destroying.
- Data Pipeline SegmentationThe stages within a local data pipeline managed by the Substrates runtime are detailed, from a client call to an Instrument interface method to dispatching an Event to an Outlet registered by a Subscriber.
- Playing with PipelinesA walk through of one of the Substrates showcase examples hosted on GitHub, demonstrating two of the most critical aspects of pipelines – the production and consumption of captured data and published events.
- Pipelines ReimaginedUsing Substrates, the fusion of multiple streams of data from multiple sources, an essential process of any monitoring and management solution, can be done in-process and in real-time.
- Circuits, not Pipelines!The first official release of the Substrates API is nearing completion. With that, it is time to explore one of the toolkit’s most significant differences compared to other approaches, such as OpenTelemetry.
- Humainary vs OpenTelemetry – SpatialFor Humainary, the goal is to encourage as much as possible the analytical processing of observations at the source of event emittance and in the moment of the situation. To propagate assessments, not data.
- A Roadmap for an Observability ToolkitWe have divided up our mission for the Humainary Toolkit into three phases, with each phase mapping to one or more layers: Measure, Model, and Memory
- Measurement and Control 2022Since the very beginning of the hype of Observability, we have contended that the link with Controllability must be maintained for there ever to be a return on investment (ROI) that matches the extravagant claims from vendors pushing a message of more-is-better.
- Humane Factors in Observability – Part 1This two-part series will discuss critical factors that weighed heavily in our rethinking of Observability and how they manifest in our toolkit under the headings: conceptualization, communication, coordination, collaboration, and cognition.
- A Modern Observability Library ToolkitThe Humainary project aims to bring much-needed sensibility, streamlining, simplicity, and sophistication back to an area that seems to fight forcefully not to move past yesteryear technologies like logging and tracing.