First, a philosophical curiosity.
Have you ever wondered whether you experience colours the same as other people?
Most of us agree that an orange is orange and the sky on a sunny day is blue. But is your sky blue the same as my sky blue? When we look up at the sky together and agree (as we surely would) that it’s blue, are we agreeing that we’re seeing the same thing? Or are we merely agreeing that the thing we’re both seeing is best described by the word “blue”? If you were able to hop into my brain and stare out through my eyes, what’s to say that the sky wouldn’t appear orange, or an orange blue?
Come on, hop aboard, what do you see?
Though it doesn’t really matter. As long as our perceptions are internally consistent (i.e., a clear daytime sky doesn’t appear to change colour randomly), we can agree and make sense of the world, without worrying whether the image projecting onto the visual cortex of your brain is identical to the one in mine.
The point here is that our perception of reality is just that: a perception. We expect our perceptions to offer an accurate illustration of reality, such that we can make sense of the world, both as individuals and collectively. It’s helpful, essential even, to be able to experience the world and agree on what’s being experienced and most of us, most of the time, experience perception and reality to be one and the same.
And what does this have to do with incident response? Quite a bit it turns out.
Software, unlike physical objects can’t be experienced directly. Rather, software exposes itself to the world through interfaces: GUIs, APIs etc. While alignment between perception and reality in the physical world is the subject of philosophical conjecture, we know for sure that there is practically zero alignment between our perception of software and what’s going on behind the screen.
- That button isn’t really a button.
- That API endpoint isn’t ‘hittable’.
- That database table isn’t really a table.
- The edge, isn’t really on the edge of anything.
- That line of code isn’t the thing that runs.
- Neither is that line of assembly.
With software, all the way “down” from UI to binary digits, there’s no “there” there to be perceived directly in any useful way. And why is that important? It’s important because our entire understanding of software lives inside our heads as mental models.
Now they say “all models are wrong, and some are useful”, and this is more true in software than anywhere. However, when models exist entirely in our heads, how can you tell if my (useful, wrong) model is consistent with yours? Whereas it really doesn’t matter if you perceive colour the same way as I do, it really does matter if we’re dealing with a production outage at 3am and you have a different mental model of the system to me. It matters and it’s inevitable.
Modern technology led businesses tend to be so complicated and complex that no single individual can maintain a perfect mental model of its inner workings. This is one of the reasons why incident response is a ‘team sport’, requiring multiple individuals to engage in joint activity in attending to surprising, ambiguous and uncertain scenarios. While team members observe, probe and operate on an IT system via visual interfaces, their metal models, which are both the source of their diagnostic and therapeutic choices and the target of their results, remain visible only to the individuals themselves. So we’re all working with our own, unique mental models. Some are general and broad, some narrow, highly detailed and specialised, some are similar, many are overlapping and none identical.
With this realisation, it would appear imperative for individuals to take steps to externalise their mental models in order to compare, contrast, learn and update from those of their peers. A key element of effective incident response is the mindful maintenance and development of mental models amongst responders. Everyone’s mental models will be different, but it helps if they’re coherent.
It’s also clear that incidents offer opportunities for updating mental models. Incidents demonstrating fundamental surprise are only surprising because the scenario was incompatible with our mental model when it occurred.
So what practical steps can we take?
- Communicate. The standard API for human mental models is speech! “What are you thinking?” may be an annoying question, but you can spare folks the ignominy of having to ask by volunteering your thoughts, especially if they contradict your mental model.
- Externalise – diagram, write, draw, visualise. Visualisations frequently form key elements of mental models. If folks share a common understanding of a system through a shared visualisation, then coherence will increase. Ensure static visualisations (diagrams) are updated to reflect changes in mental models.
- Learn from incidents together. How many times have your had that moment during/after an incident …”Oh that’s how it works!” That’s the sound of your mental model updating. And if yours is updating it’s useful if others’ are updated too.
- Learn together why incidents didn’t happen. Incidents are not the only learning opportunities. You’ll be surprised how much can be learned from understanding why an incident didn’t happen!
- Practise together. Incident response is a team sport. Understanding your peers and their mental models can only be achieved through active engagement and practice. You can wait for real incidents to occur or you can be proactive and practice together regularly via drills/simulations.
The concept described in this post comes from the ‘above the line/below the line’ framework, first described in the Stella Report from the SNAFUcatchers Workshop on Coping With Complexity, 2017. John Allspaw, who was also part of this SNAFUcatchers workshop breaks the concept down here, and in a conference talk here. Follow these links for a much deeper dive into this topic.