How albelli-Photobox Group is using Uptime Labs to nurture a healthy incident response culture

How albelli-Photobox Group is using Uptime Labs to nurture a healthy incident response culture

albelli-Photobox experiences a high level of variation in seasonal demand and therefore the likelihood and impact of incidents varies similarly throughout the year. albelli-Photobox initially engaged with Uptime Labs to fast-track their Incident Management preparedness for the busy 2022 winter holiday season.

Key Benefits
  • 1
    Increased confidence of incident responders
  • 2
    Increased numbers of engineers able to cover support rotas
  • 3
    Improved incident management performance during genuine severity 1 & 2 incidents
  • 4
    Improved understanding of and adherence to company incident response conventions
About the Interviewee
Alex Hibbit

Engineering Director, SRE & Fulfilment at albelli-Photobox Group. A Technologist and a Leader with a proven history of creating and nurturing high- performing teams across a range of technological disciplines

Overview

albelli-Photobox Group are experts in printing high-quality personalised photo products, from photo books and wall decor, to calendars, cards and more. Since their recent merger, albelli and Photobox together serve over 7 million customers across Europe, enabling them to design, print and deliver personalised products, via their online software platform.

The Challenge

albelli and Photobox joined forces in early 2022. The newly combined company is supported by over 1,000 employees across the United Kingdom and Europe. The merger brought together two distinct technology stacks and two distinct organisational cultures, presenting a challenge in creating a single, coherent incident management capability. Their aim is to build a single technology organisation, and a key part of which is how to respond when things go wrong. They’re building a culture where employees are passionate about resilience, incidents and incident resolution and they’ re looking to establish practices for continuous improvement and learning. Unlike some companies who silo their incident response capability to a single team, albelli-Photobox wish to establish a widespread competency throughout their technology department.

albelli-Photobox experiences a high level of variation in seasonal demand and therefore the likelihood and impact of incidents varies similarly throughout the year. albelli-Photobox initially engaged with Uptime Labs to fast-track their Incident Management preparedness for the busy 2022 winter holiday season.

Technology and Tooling vs Culture

Effective incident response is as much a cultural challenge as it is about technology and tooling. How organisations think about incidents and how they organise, communicate, problem-solve and learn under stress is just as important as the tools and technology they employ. albelli- Photobox are well equipped with excellent observability, monitoring and alerting tools but wished to employ Uptime Labs to “seize the people and process piece”. Alex Hibbitt, Engineering Director at albelli-Photobox explained: “Incidents aren’t just about muscle memory, they’re a cultural thing, how you think, how you respond, how you manage stakeholders, which makes it really hard to train. You need something that’s close to a real event”.

The Solution

Establishing a healthy incident response culture requires practice. Like any team endeavour, practicing together is essential to ensure that the team has mutual understanding and trust when the game is real. Unfortunately, practicing incident response frequently enough to establish proficiency tends to be time consuming, expensive, inconvenient or simply too unrealistic to engender beneficial learning. albelli-Photobox chose to work with Uptime Labs to address this problem, giving them the ability to practice incident response scenarios in a realistic setting, quickly building confidence and expertise through experience.

What We Did

During the 8 weeks prior to the 2022 holiday season, 5 albelli-Photobox engineers employed Uptime Labs to experience 33 simulated incidents. Uptime Labs scenarios were specifically selected to provide experiences that emphasised the first principles of incident response. In addition to using Uptime Lab’s autonomous simulation platform, albelli-Photobox benefitted from coaching and feedback sessions with Uptime Labs experts.

“Incidents aren’t just about muscle memory, they’re a cultural thing, how you think, how you respond, how you manage stakeholders, which makes it really hard to train. You need something that’s close to a real event.”

“Incidents aren’t just about muscle memory, they’re a cultural thing, how you think, how you respond, how you manage stakeholders, which makes it really hard to train. You need something that’s close to a real event.”

Alex Hibbitt

Results

During simulated Incidents, Uptime Labs detects participant/player behaviours, providing visual feedback on positive actions and hints if the required actions have not been performed. Uptime Labs scores participant performance against critera indicative of the quality incident response. Such critieria include time to restore, triage, communication quality.

During the initial 8 week period, 4 out of 5 participants demonstrated significant overall performance improvement. Participant 3 demonstrated improvement in some areas and inconsistency in others, leading to a flat overall trajectory.

Participant improvement between initial and later sessions

Transfer to Real Life

We were pleased to observe that the Uptime Labs participants/players were able to transfer their confidence to real-life scenarios. Following the inital Uptime Labs engagement, albelli-Photobox has been able to fast track employee’s progression to performing on incident rotas and these people have performed effectively on Sev 1 and 2 incidents since.

Next Steps

Establishing a healthy incident response culture is an exercise of continual improvement and improvement comes through practice. Uptime Labs continues to work with albelli-Photobox on their journey.

Participant Feedback

“We’ve been able to create a set of people who are really excited about what the future of incident management at APG could be, just by having experienced the uptime labs piece. It’s been a really pleasing effect of having uptime labs and it’s exactly what I hoped for.”

— Alex

“Those that hadn’t done I’m before had their eyes opened. It helped them understand not only what being an Incident Manager is like, but helped them think about how they respond to incident managers. This was an unexpected outcome.”

— Alex

“Very very useful, actually seeing how to breakdown to structure it, technical, process and personnel. Its very helpful as it helps me create a framework around how a run a postmortem itself – what type of areas you want to look into and for what reasons; and then i think for each of them find a space.”

— Emanuel

On practising running an incident postmortem

“I’m way less anxious, and I’m less afraid of asking for updates or giving commands as I was at first.”

— Valeria