The VOID
The VOID makes public software-related incident reports available to everyone, raising awareness and increasing understanding of software-based failures in order to make the internet a more resilient and safe place. This podcast is an insider's look at software-related incident reports. Each episode, we pull an incident report from the VOID (https://www.thevoid.community/), and invite the author(s) on to discuss their experience both with the incident itself, and the also the process of analyzing and writing it up for others to lean from.
The VOID
Episode 4: Emily Ruppe and The Inaugural LFI Conference
In this episode we take a delightful detour from our usual VOID programming to have Emily Ruppe, a Solutions Engineer at Jeli.io and member of the Learning From Incidents (LFI) community, on the program to discuss the upcoming LFI Conference happening in Denver in February. Find out more about the goals and some of the featured speakers for the event, and we hope to see you there!
Discussed in this episode:
Jeli.io
Learning From Incidents
The LFI Conference (Feb 15-16, 2023 in Denver, CO)
Hello, and welcome back to the first episode of the VOID podcast for 2023. I am joined by Emily Ruppe, a solutions engineer at Jeli, and she's here to talk about the inaugural Learning From Incidents Conference. Welcome, Emily. Thank you for joining.
Emily Ruppe:Thank you for having me, Courtney. I'm excited to be here.
Courtney Nash:Okay, so let's start with where you work, Jeli. Can you tell me a little bit about what Jeli does? We are big fans over here of Jeli at the VOID.
Emily Ruppe:Yeah. Jeli.io is an incident management tool. We have, a free incident response bot that we make available to anyone who wants to use that. We think that incident response and making that easier should be table stakes for folks and shouldn't be super involved to set up and, and kind of integrate and get going. And we have our incident analysis platform with the tool that, makes it easier for people to build timelines, gain insights from incidents, and then look across different incidents to really understand trends and things that are happening within their organization and find places to make improvements and, and changes. So if they can learn from incidents.
Courtney Nash:Aha. So that's what we're here to talk about. We're here to talk about, well, first of all there's the, the Jeli and the VOID to some degree have both emerged from this larger community that Nora, the CEO of Jeli was really integral in starting a few years back, the Learning From Incidents community or LFI as we often like to call it. Talk to me a little bit about that and then we'll talk about what that, what's happening this year that's very exciting.
Emily Ruppe:Yeah. The LFI community is this group of researchers and practitioners, folks who are on the ground doing incident response and analysis, in the tech industry, but also in a lot of industries across, the globe really. we have folks who are in medicine who are in, say, planes.
Courtney Nash:Oh, a fun time. Just, just for the audience following along at home. Today that we are recording is Wednesday, January 11th and this morning the United States FAA flight system, for lack of a better term, one of their critical flight systems, just sort of stopped working and every single plane coming to or leaving or on the ground, or in the air, in the United States, sort of had to just stop what it was doing. And so that's, that's a, we'll come back to that maybe. but yes, we have people in the aviation industry involved in the community.
Emily Ruppe:It's a really cool group of folks because it, it makes it accessible to talk to people who are dealing with, I mean, like issues at the scale of the FAA event today are, you know, that's, that's at a scale that is massive and involves just a lot of people, a lot of different systems and hardware. And being able to kind of discuss that within groups of folks who have different areas of expertise, it's really fascinating. It's really helpful for day-to-day practices of engineers, people who just trying to do work to understand how we can kind of solve these problems and learn from different areas of how to make this work easier for ourselves.
Courtney Nash:Yeah. And as a member of that community myself, one of the things I'm most struck by is, in particular, I think a lot of people don't understand that there are different mindsets in the world about safety. What it even means, how one achieves it, how one, how people maintain it, how large groups of of people maintain it. And I think one of the really cool things about LFI is the effort to not just advance our own practices, you know, in our industry, other industry, but to try to get the rest of the world to come along to maybe some new ways of thinking and doing things. And to that end, this year is the first time that we are going to have a Learning From Incidents conference—an in-person event, which is super exciting. So maybe give me all the, the details about that.
Emily Ruppe:Yeah. Our first ever Learning From Incidents conference is going to be in Denver this year on February 15th and 16th. We really want to normalize talking about failures and being transparent with handling incidents and difficulties and successes, things that work really well, and, and just a way for software folks to collaborate on learning from incidents, not just software folks, but really kind of bring folks together to have these conversations.
Courtney Nash:Do you know how many speakers are going to be there?
Emily Ruppe:Oh, 60.
Courtney Nash:Oh wow. Okay.
Emily Ruppe:But yeah, we have a, a ton of really exciting, talks lined up and we're really also focusing on, providing a really robust hallway track. The magic of what the Learning From Incidents community has been is connecting people who usually wouldn't have access to each other to talk through. Like,"Hey, how do you deal with this?" Or, or,"Hey, in this talk, you, you said that you all kind of confronted this issue in this way, or, you know, you, you dealt with thousands of planes being grounded at once." Like, how did you even approach this? What stuff do you know? Every conversation I have have is adding to my reading list of, of papers and amazing reports,
Courtney Nash:There is that side of it, it's a reading crowd.
Emily Ruppe:mm-hmm. But yeah, we, we also have some really wonderful talks. Dr. David Woods, John Allspaw, let's see, Dr. Lauren Hochstein from Netflix. Laura Nolan, Dr. Laura Maguire, who's with us at Jeli. Courtney Nash, I believe.
Courtney Nash:Oh! She's all right...
Emily Ruppe:I'm really excited to hear from her. there's a lot of really awesome folks, from all over tech industries as well as, other kind of safety sciences and resilience, areas that I'm really excited to, to nerd out with and hear everything I have to say.
Courtney Nash:Is the hallway track the same thing as sort of like the mini Cases Conf, because I know that Laura McGuire, Dr. Laura McGuire and, and Nora have done a Cases Conf at a couple of other events, which is really just people getting up and talking about... our(LFI) nomenclature. In the LFI world it's call these things"cases." Right. is that what the hallway track is or is there a whole separate sort of just really case focused track as well?
Emily Ruppe:I think there are numerous talks that we have in here that are people talking through incident cases, specific, cases which are, are really exciting. There's some that I'm very interested to hear from. but yeah, the hallway track. I think also we might have some individual kind of Cases Conf type events going on. I kind of like to refer to them as like incident ghost stories or like, we're kind of gathering around the campfire and talking about these times they're, it's so much fun to just hear about, incidents in this way. It's one of the ways I have internally as an incident commander and an incident analyst at other companies have encouraged the best way to, to learn and retain knowledge about incidents that have happened and understand that kind of stuff. And I think that doing that with other companies, adhering those similar themes and really familiar stuff is. oddly comforting to know like, oh, yes, you, you all also are dealing with, you know, these, this specific switches give you issues. Or like, oh, there's always this OOM kill that has to happen. Or, oh, we forgot to turn Chef off. And it's, it's always really
Courtney Nash:It's like a very healthy form of group therapy. I've found at times, right, where you're like, I am not alone. Sometimes we really feel that way too in, in the more remote our work becomes, that it's harder to have that kind of connection and that remembrance that we are all human in these systems that frustrate us immensely at times.
Emily Ruppe:Yeah. So the, the hallway track should be really exciting. I'm not great at sitting still and, and, paying paying attention for long periods of time, but I really love talking to people, so I'm really excited for everything that we are going to be offering, and we really hope that folks are able to take away, one that feeling that they're not alone, that there are a lot of other people who are trying to make this progress. I know that for me personally, it kind of felt like I was an island trying to say"Hey, we could be doing this differently. This doesn't have to be so hard." And finding the LFI community helped provide me with resources and, and ways to help kind of make my case and, and really get other people on board with that. And primarily making it feel like you're not alone. There are other people who are trying to, to make the same kind of progress and we can kind of share, pool our resources and help each other out. But also really learn some incredible stuff from people who are on the cutting edge of, of system safety and resilience.
Courtney Nash:It's very pragmatic in the way of like marrying theory and practice together. you know, one, one can go read all the art. Architectural, whatever, you know. But to talk to people who both pay attention to the research out in the world, right out in, in academia, in in industry research and are connecting that to what we know on the ground, I think is, is really great. Sometimes in the tech industry, we are really good at just doing stuff and making things up and hoping it'll work.
Emily Ruppe:Mm-hmm.
Courtney Nash:And I like seeing that research, be brought in more formally, but, in a way that I think will, people won't find, onerous or, you know,
Emily Ruppe:It makes it accessible. I think, especially pairing like the research folks with people who are doing this practically, because we can have those really open conversations of Yes, in an ideal world, I would like to do these things, but I have to, like, I'm working within an existing culture, with an antiquated, bunch of different systems that are working together. And so how do I start, how do I measure any sort of progress here? For me at least, the only place where I've had that kind of interaction with the folks who are talking about the theory and doing the research to be able to say okay, but how do I actually do this thing? How do I actually put this into motion? I've gotten so much advice and value out of having access to those kind of folks in LFI, so I'm excited for us to bring that to the conference. Learning From Incidents.io is our site where you can check out all of our conference information and register there. We have a ton more information about where it is, who's speaking. You can see a lot of details and yeah, register to, to come hang out with us.
Courtney Nash:I will put all of the various things that we've talked about in the show notes for the episode, we'll get that out very shortly so people still have time to buy tickets and, I hope that we are all actually able to fly there in February.
Emily Ruppe:Yeah. Yes.
Courtney Nash:and lots of love to the people on the ground trying to solve what sounds like a very unfun problem today
Emily Ruppe:It's the thing that I think it's, it's on our site, but it's been like the kind of core place of LFI. We are a bunch of incident nerds. When I woke up this morning, I was reading about it. My husband came in, he was like, do you want coffee? I was like,"The FAA is down," And he was like, he was like,"okay..." I was like, this is, this is unprecedented. I don't think you understand like the wealth of information we're gonna get about how all of these systems work and how to deal with things at this kind of scale. Like my little nerd brain was just like, we're gonna learn so much from this! And I think that's kind of the core of us is we're not ambulance chasers, but when events like this happen, we're like, oh my God, what an incredible opportunity to learn about
Courtney Nash:yes.
Emily Ruppe:all of these different systems and how in real life we actually can recover from things like this.
Courtney Nash:For me, the immediate thing is just the like gut punch of like, what these people are going through right now. Right?
Emily Ruppe:Yeah. That immediate like, oh no, I that, that in your chest that got punched. I'm like, oh, okay. Like, it's, it's, yeah. It's a physical hit of, we kinda all relate to. but like it's a thrill. And also the Oh, sinking feeling of something, oh this is a big one.
Courtney Nash:Well maybe next year at the second annual LFI Conf, someone from the FAA will come and tell us true tales of how they tackled this. So we shall see. Thank you so much, Emily, for joining me. Yeah. Thank you, Courtney. I wish you an incident free day
Emily Ruppe:Thank you.