Kintaba – Incident Management Done Right
Today, we’re talking with John Egan, Co-Founder&CEO at Kintaba, incident management platform. John is a serial entrepreneur, a co-founder of Caffeinated Mind, a startup company that tackled the enterprise problem of big data transfer with a tool called Expresso and was acquired by Facebook in 2012. In Facebook, John co-founded and led Facebook’s first major enterprise offering: Workplace by Facebook. In 2019 John co-founded Kintaba, a modern incident management tool that helps companies build a more resilient future.
In this interview episode, we talk about the challenges for a startup working in the Incident Management area, the common misconceptions about the incident management process, and the way Kintaba’s offerings make it easy to onboard the right process with the right tool.
Listen & Watch the Full interview episode here.
John, how do you define Kintaba as a business?
Kintaba is your off-the-shelf software solution to implement best incident management practices rather than hiring a consultant and learning how to better manage major incidents in just three hours.
In the long term, if you want to implement a process that survives the life of your organization, it has to be a tool, and Kintaba is that tool. It’s everything from where you declare your incidents, where employees can see what the current incidents are, manage and mitigate those incidents and bring together the responders, postmortem, and learn the process after the incident is closed. It’s all in one product suite that any company can implement.
CEO&Co-Founder at Kintaba
For Incident Management, it’s essential to build a knowledge base of how to deal with typical issues to improve the quality of service and reduce the overall cost of support. How can Kintaba help achieve these goals?
When I think about knowledge basing and how Incident Management does this, it’s both the process of making sure you write up incidents and then putting them into one place available to everyone.
In the modern Incident Management, you’re adding to a document library, where you’re tracking all of the learnings, and distributing that document as soon as it’s written to everyone who has subscribed, responded to, or anyone who’s interested and comes back to the Kintaba’s tool to understand knowledge better.
How can Incident Management be integrated into product release processes?
The DevOps process implementation is the least well-defined piece of how you implement Incident Management. Typically, we see them running alongside each other – should something happen in the DevOps process that would trigger an incident, you would immediately kick off your incident management process with that.
What are the challenges for a startup working in the Incident Management area?
Ten years ago, for most companies, incident management meant routing an alert from a monitoring system to wake up and dealing with the problem. It has evolved over the last five years, especially after Google published their SRE handbook, which laid out how they deal with the incidents across the organization, not only within the SRE teams.
For many companies that take their first steps in implementing incident management practices, it becomes a challenge to define who should be involved in the process. At Kintaba, we take the stance that the whole organization should be part of this process because if you have an outage, it affects everyone from your engineering teams to sales organization to your PR teams. We believe there should be self-service as anyone in the company should adopt it without forcing people to go through retraining documentation and forcing people to read Sidney Decker’s books. That’s our offering and the uniqueness of Kintaba as a product.
What are the most common things that companies take wrong about Incident Management?
Many organizations draw the line for an incident that it’s not an incident until the site is down. We, on the contrary, are establishing a process that lowers the barrier to creating incidents. We want to capture sev 2s, sev 3s, all the near misses.
If something affected a small group of customers, but there are still learnings to be gathered from, you ought to go through the process, declare the incident, record the mitigation process, and write your little post mortem document.
Often, it’s challenging to go from a world where you’re only documenting the sev 1s because it’s justified for the amount of damage that was potentially done from the incident. Though, it’s tough for an organization to go from that to documenting sev 3s that maybe only lasted for five minutes, and we only need to write a five-sentence postmortem about this thing. It’s what most organizations get wrong – they almost shy away from recording those incidents or putting them through the process.
Who are these companies that are willing to adopt your incident management tool to improve their work?
Our early adopters are the companies that are absorbing employees from the large organizations that are already practicing incident management. When you look at our customer list, Gusto, Canix, Vercel, you’ll see that these organizations are already practicing incident management, and they need tools to make it easier over time.
How is Incident Management at large organizations different from small or medium-sized companies?
Across Facebook, there’s a common toolset that tracks incidents and makes them available to the entirety of the organization. You can be a new sales hire who’s been at the company for one day, and you have 100% visibility into all of the major incidents currently happening within the company. This openness is the critical piece of their incident practice.
The company distributes not just responsibility (you’re responsible for creating these tasks) but the good part of accountability – the person who created that task, who created that incident, should be the person who’s accountable to it in a positive way. It maps into a core tenant of incident management best practices – the person who is within the context of the incident should be the carrier of the actual learnings. It’s the propagation of responsibility across the company.
John, what does ‘Kintaba’ mean?
Kintaba’s name comes from a Japanese art form called ‘kintsugi’ – reassembling broken pottery by using golden inlay where the cracks were. Your end result becomes even more valuable. It’s a beautiful metaphor for the movement that’s happening now throughout the Valley around resiliency, which is hiding breakages and trying to prevent or get all of our numbers down and say we break less, not more. At Kintaba, we go the other direction and say no breakages happen all the time but what matters is whether we respond well to them internally.
Where do you aspire to take Kintaba in the next couple of years?
At Kintaba, we want a more resilient world. We want anyone to feel comfortable practicing real-time operations that address critical situations beyond the SRE world.
Three key takeaways about Incident Management:
● Strong incident management practice and radical openness to incidents is critical for business revenues;
● Incidents are inevitable for companies that are growing and constantly innovating;
● Not all incidents are created equal. Be prepared to learn from sev 2s, sev 3s and all the near misses.
John, thanks for the good vibes and for enabling better and more efficient incident management for companies across many industries. I look forward to watching the progress of the Kintaba team!
Stay tuned for more great interviews coming your way!
Managing Chaos and Improving Reliability with Gremlin
with Matthew Fornaciari, CTO & Co-Founder at Gremlin Inc.
Resilience to failure with Chaos Engineering.
Automating Cloud Infrastructure with Checkov 2.0
with Matt Johnson, Developer Advocate Lead at Bridgecrew.io
Open-source tools for infrastructure security.
Operations Management in Digital World
with Chad Kalmes, Vice President, Technology and Risk at PagerDuty
Expand digital operations by flexible and agile platform.