Be Stubborn or Quit: Observability, Success and Sacrifices
This week we are delighted and honoured to welcome Charity Majors to The Prime View. Charity is a co-founder and CTO at Honeycomb, which builds robust observability tools for distributed systems. Before starting her venture at Honeycomb, she worked at Facebook, Parse, and Linden Labs. Charity co-authored “Database Reliability Engineering” by O’Reilly.
In this episode, we talked about how Observability is different from Monitoring, modern trends in DevOps and their impact on the development of product suite at Honeycomb, the new round of investment and how Honeycomb will use it, the sacrifices of a startup founder, how to build diverse teams, advice for young women who aspire to build their careers in the technology.
Co-Founder&CTO at Honeycomb
Charity, could you tell us what steps in your career have led you to starting your own company?
Before Honeycomb, I did Operations Engineering for Linden Lab, Second Life, for Parse and Facebook. We worked on cutting-edge engineering, and we worked on microservices before they became known as microservices. There we realized that the last generation of Ops tools didn’t help us find and solve our problems. So we started using some Facebook tools as a basis and built new tools around them. That’s what led us to build Honeycomb.
Recently Honeycomb got a new round of funding. How do you plan to use the funds? Are you going to invest it in R&D and build new products? Or do you plan to increase Sales and Marketing operations to get a higher share of the market?
Part of our mission and what we’ve promised our customers is to stay on the leading edge, not the bleeding edge. We need to continue to innovate; we need to continue bringing Observability to more people, making it more accessible, making it discoverable, making it fast to get started, and making it more understandable to teams. So we’ve got work to do there.
We’re going to be investing in our free tier. Not many people know that we have a robust free tier where engineers work, and it’s not like a 60-day trial. You can sign up, and it’ll run forever, and you could run real production workloads on it. So we’re going to be pouring many resources into making that a better experience and making sure more people know about it. Because the more people we can get to use Observability tools, they never want to go back. So, to sum up, it’s going to be both R&D and raising awareness.
We could see many of your posts on Twitter about how Monitoring is different from Observability. So, how is Observability distinct from Monitoring?
Observability is very distinct from Monitoring. Monitoring is about your known unknowns, and Observability is about your unknown unknowns. At the technical level, this goes all the way down to how you write bits out to disk. Monitoring uses aggregates, and Observability uses arbitrarily wide structured data blobs, which you can slice and dice to answer complex questions. Observability is about how we can understand the inner workings of a system just by asking questions from the outside.
How have trends in DevOps impacted the development of your product suite at Honeycomb?
Over the past five years, we’ve seen a gravitational shift from products about the production tooling to production tooling to working on live systems. Whether it’s Chaos Engineering, like Gremlin, or feature flags with LaunchDarkly or durability with us, it’s all about the idea that only in production you can find the most exciting problems, where you’ve got the scale with millions of connections. This environment is not replicable in staging.
Testing in production is another trend of continuous integration and continuous deployment. Because deployment happens a million times a day, now you should be able to find what went wrong in production by leveraging your tools. And that’s not a trivial task.
The solution starts with capturing the raw logs that can do complex correlations across events. They aggregate around the request from the user where we ask specific questions and then slice them into specific answers in real-time. We also start with SLOs, which basically is your error budget – you have an agreement between your engineers and your business side that you can tolerate a certain rate of error for your business. If you get alerted that you’re running out of your budget too quickly, then you come to Honeycomb to see what exactly is causing you to go over budget and fix it.
How does Observabilityy help engineering teams to see what is happening inside the systems in real-time?
Observability allows you to ship code, look at it in production, and ask if it is doing what you expected. You can combine it with progressive deployment so that you’re running side by side the old version and the new one, and you’re comparing these versions to each other. With Observability, you are gaining confidence in the code that you’ve written.
It’s easier to build your systems with Observability, and it becomes essential when you’re working on a large scale. Otherwise, you’re just developing in the dark, and when you’re shipping the code off, you hope nothing will come out. But most bugs are too subtle; they’ll slide under the radar and then get bigger. By the time you find a bug, you will have forgotten what you were building there.
Have microservices added up to the complexity of debugging?
The most complex problem in distributed systems isn’t debugging the code. It’s figuring out where the code lives that you need to debug.
What kind of people you’re looking to bring to your organization to have this innovation culture and build your best product?
One of our company’s values is that we hire people capable of taking care of themselves, who can ask for help when they need help. We’re not looking for people who work themselves to death. We are parents, and we want our jobs to be compatible with parents lives. That’s why from the very beginning, we emphasized having a distributed culture. Even though we do have a central office, we wanted it to be okay for people to take meetings from soccer games. I think decoupling work from a location is super important when it comes to diversity.
Another one of our values is about having autonomy and acting with ownership over your stuff. When you’re a startup, everyone’s an owner. You might own three or four different things within each of which might have a whole team of the larger company. Loving that kind of work and being okay with that, and being able to juggle responsibilities is key.
How difficult is it to find the talent on the market?
We have almost a hundred people, and we’re just hiring our very first recruiter. It’s been amazingly easy for us. I chalk that up to our philosophy for teams and our philosophy for running the company. To many people, it’s appealing, and they want to run this experiment with us.
Often founders ask me how to build a more diverse engineering team, and then I ask back – “Do you have any females on C-level? Do you have any female VPs? Are they technical?” Because if you do, diverse candidates will flock to you. Men more than women are sick of reporting to teams of execs of just all white men. Maybe we’ll fail, but at least we’ll fail in new and exciting ways.
Can you say that diversity is part of the strategy at Honeycomb?
I don’t particularly like to talk about diversity, but I would rather let my actions speak for themselves.
What would be your advice for those who’d like to start their own company?
Don’t do it! If I had to start all over again, I wouldn’t do it. Our first four or five years were miserable – I lost my marriage and most of my friends. If you’re stubborn, maybe you’ll succeed, but most startups don’t succeed; almost all startups fail. Christine and I are pathologically stubborn, and we don’t give up for better or for worse.
Is there anything else to your success except stubbornness?
We were fortunate to be working on very cutting edge software and observing the Facebook tool ecosystem. It gave us a glimpse of the future. And once we tasted it, it’s really hard to go back.
Could you distil some advice for young women who aspire to build their career in technology?
While you’re Junior, keep your head down, try not to be too sensitive, try to learn everything you can. Don’t take things too personally, and get to be a senior software engineer. And once you’re a senior engineer, once you have some power, allow yourself to get sensitive and on behalf of others.
Research shows that women who advocate for themselves get punished for it, but women who advocate for others get rewarded. So what we can do as women try to reward each other, and hopefully, it’ll come back.
Charity, thanks so much for sharing and for taking the time. Hope we can have you on the show again sometime soon.
Stay tuned for more great interviews coming your way!
Dynamic Observability at Scale with Live Logger
with Shahar Fogel, Co-Founder&CEO at Rookout
Rookout builds Dynamic Observability in Kubernetes production and pre-production environments.
Komodor: Comprehensive Kubernetes Troubleshooting
with Ben Ofiri, Co-Founder&CEO of Komodor
Understanding and troubleshooting Kubernetes made simpler.
Kintaba – Incident Management Done Right
with John Egan, CEO&Co-Founder at Kintaba
Strong incident management practice is critical for business revenues.