I'm really happy to have Justin Richter on stage. He will talk about Let the Robots Help Protecting the Build and Deployment Chain.
Justin, what can we expect within the next 20 minutes? What will you share with us?
Well, it's a short presentation, only 74 slides, so... 74! I only have 20 minutes, so I think we can just get started. Perfect. Then go ahead. All right. Good afternoon, everybody. My name is Justin Richter. I'm the CTO of a company called UberEther, and today I am here to talk about the build and deployment chain of how we get software.
Usually a lot of the work that I do is talking about API security, and so you have an API, you wrap it with security, and then when the bad guys show up, apparently they're invisible on this version of the PDF, that's interesting, they get foiled by your great security. I'm not here today to talk about this particular part of the problem. What I am here today to talk about is how we get to that little squishy bit in the middle, the actual API. This seems like it would be a fairly straightforward process, right? You have a developer, they create some code, and you get the API out the other end.
This is how you build your running systems. But of course, we know that that is an oversimplification of the process, and in order to think about this right, I want you all to think about getting a glass of water.
Now, this seems like it is also a very simple process. To get a glass of water that you can drink, you go up to the tap, you fill the glass, and then you put the water in you. Pretty straightforward, right?
Well, I am of a generation that when I was growing up, there was this book series and later television series called the Magic School Bus, and the Magic School Bus taught us that the water system, the water delivery system, is vastly more complex than our little minds had assumed, that we figure we can go to the tap and just get some water. But it turns out there is a lot of different parts to this.
You need to have clean water at your source, you need to store it in a way where it doesn't get contaminated, but you still have to treat it and filter it and then deliver it through a system that doesn't recontaminate it, and then it can finally get to you. And each part of this system is in turn fairly complex. There's a lot of different moving parts here. This is a filtration diagram. This is just one step of that overall process for how you treat water that you are already storing at a fairly clean reservoir and then getting out to the distribution system. But it doesn't stop there.
If you're like me, you may have a device in your house that actually filters the clean water coming out from your tap before you drink it. Mine, it's built into the refrigerator. It also chills the water on the way out. It's really handy. And even though I know that the water coming out of my tap is safe and reasonable and stuff like that, I still filter it at the end. Why?
One, it's a little bit, you know, can make it a little bit more pleasant. There might be something left over from the distribution system that maybe didn't get caught, but it's also a bit of peace of mind. I have another filter in front of things that I am checking at the end of the system. But it doesn't actually stop there, does it? You have clean water at the source, you have filtration, you have clean distribution, you're even filtering it in your house, but you still wash your glasses.
If you put all of that into a dirty glass at the end of the day, well, then you've kind of undone everything that gets to that point. And this last stage is the API security that we're not talking about today. But it's still very important. So I will say that in some use cases, there are different levels of cleanliness that apply. I don't know if people drink from the garden hose in Germany, but this is how my generation absolutely grew up.
Water delivery is a very complex system in that there are a lot of moving parts and they change different aspects, they address different aspects, and they have checks on each other as they go through to get us through, to get the water through the end of the system. This is far from the only complex system in the world. Another one that I really like to talk about is biometric authentication.
Now this is from the NIST SOFA B paper, and one of the things that I love about this diagram of biometric authentication is that it shows the five main parts of the biometric authentication process and the 12 places that those five parts can actually be exploited by attackers. Now this is not to say that biometric authentication is worthless or unusable or anything. It just says that this is a complex system. There are a lot of different ways that things can go wrong.
If I can say that, all right, you've protected the sensor enough that I can't override it, I'm just going to override the decision down at the far end. We need to protect every part of the system and we need to understand how those parts of the system come together.
Well, guess what? Building software goes through a similar process. There are a lot of different moving parts here between the developer and actually getting out to a running system. And that's what I would like to really talk about today, because software development is itself also similarly a complex system, but the right tools can help us. And I think that there's a lot that we can do today with modern systems that can really help us create better software. First step, we start at the source. We need to develop good code.
And there's a lot of practices that we can do and a lot of tooling that we can use to help us develop good code. First of these, and I know it's a little bit hard to read up there, is code formatting. This seems like a really trivial thing. Developers just want to go in and hack something in and paste it in and have it work. Going back and finding out what that code actually meant and what it does at a later date is a hugely important problem. So something as simple as a format scanner is a really, really simple tool you can add to your build chain that is hugely, hugely valuable.
Now I will say that there are tools that claim to automatically format your code so that it's well formatted. I haven't had good luck with those yet. This is what I would like the code to look like and this fits good formatting standards and this is what the automatic formatter gives me. This is much less readable and understandable than this to a human, and that is ultimately the goal what's going on. So right now at a lot of the tooling, detection is beyond what creation can actually give us for the automation. So therein is our first lesson is apply the tools where they are most powerful.
Automatic analysis, though, goes beyond formatting to tell us, like, you know, maybe this module is too large or maybe you're calling out to too many sub functions or maybe you're not using bracing and looping in ways that you think that you are. There are some really great tools out there that you can, again, apply automatically to code as it goes into the system. Speaking of which, it's 2024. I shouldn't have to say this to as many people as I do, but use source control for crying out loud.
I can't tell you the number of systems that I've gone to even today that just have a version of a script sitting on a computer somewhere, and that's it. But it's not enough to just throw everything in Git. You have to use source control, again, apply it in the way that it's good at. Make sure your commits are atomic. Make sure that you're actually saying things in your logs, not just made a change, right? This is for humans to understand. The whole point of using source control is so that you can comprehend the history that got us to today. Tag your releases, use branches.
I worked with a company a few years ago that when I joined, it had started as a single developer project, and that developer was just basically committing everything to the main branch, and whatever the current state of the main branch was, that was the production. When I showed up, I was like, can we have versions on things? Can we maybe put dev on a different branch? And this was kind of a radical idea for that company at the time. Another step that we're starting to take in the industry is signing those releases.
Now, a lot of source control systems already have authenticated check-ins, but we can go a step beyond that so that the check-ins are actually auditable and tied back to the individual developers. Now, these systems, they're not necessarily new, but how they're getting deployed is new, so we have to ask the questions. What does the key represent? Is it something that is just installed on the developer's laptop all the time? Is it something that is in a managed key checkout system? There's a lot of good tooling that we can use to apply this.
This is all automation that can be built into the system. And a really important point here, you don't write all of your own code.
Now, I'm not just talking about the stuff that you copied and pasted out of Stack Overflow, which, let's face it, is a lot. There's all of the dependencies and libraries and stuff that you pulled in.
So, for heaven's sake, use dependency management. There are at least three dependency management systems for every major language out there these days, it seems. Make use of them. And this isn't a panacea. This doesn't solve all of your problems, but at least it gives you a toehold on figuring out what it is that you depend on and how you depend on it. Because it's not just about what you use. It's about what the libraries that you use also use. Right? And this is where things start to get really complex, and you start to get things like a software bill of materials.
This is something that allows you to look at a package and say, this is everything that is tied up into it. Think about SBOMs, though. They don't work if you don't read them. I have seen so many systems that go out of their way to create complex SBOMs, and then they get posted in the system, and nobody ever checks them.
So, if there's a zero day in some random library that's depended on by a tertiary dependency injection system, are you going to be aware of that? You need to check these. You need to know what's in your system for real, and importantly, at a human level, how it's being used. Because just because you pulled in a library doesn't mean you're necessarily using it in a way that a particular vulnerability actually works.
So, it is fairly complex to know how these different things fit together. And then there's the question about what happens if your dependency goes away. Who here has heard of the tale of left pad? No? It is an incredibly simple node module. All it does is adds padding to the left side of a string.
Really, really simple. The author of this decided to unpublish it one day. Just took it off of NPM entirely. And immediately broke thousands, thousands of projects across the internet, including some very major commercial systems that had transitive dependencies on open source software. So much so that the package manager scrambled to republish it and then change their policies about unpublishing. What this means is that software has a supply chain with dependencies that you don't necessarily think about when you're going and building it. Because when things are working, everything's good.
So, my recommendation these days is, you know, if you have a system that you know works, keep cached copies of it so that you can always build even if your dependency tracking system goes down. You know what you have. You know the versions. And you know, hopefully, the hashes and everything that you're building against. Importantly, though, use all of this for more than just the code itself. Software today isn't just source code. Configuration tells software what to do.
And I think it is absolutely unreal so many systems that we have today don't actually manage production systems don't actually manage their configurations with the level of rigor that they manage their code. And so, we really need to start doing that. We need to treat the configuration of these systems as a code. Your data schemas as you're changing your database tables over time, manage that in code, too.
Another system that I worked on years ago when I showed up, any time that there was a database change, the lead engineer would just change the dot SQL file and then it would be the DevOps engineers that would have to go look at the diffs and then apply that by hand to the database with alter table commands. I got them to use Liquibase. It's a much better system for schema management. There are a lot of automated systems out there. One of the best things with that, though, is that that also gave us automated rollback when something failed.
The automated systems gave us a way to say we are at this point. We need to be at this point. Something went wrong. We can figure out how to get there. If you're doing these things by hand, if you're doing things with a human, you kind of lose that. Documentation also needs to be in control like this because documentation changes over time. And it should change over time. And so you need to track it and manage it just as a similar asset. And finally, secrets management. I'm not going to get too deep into that. But just let me tell you, people do secrets management really, really, really poorly.
Every single one of these is a particular specific incident that I have personally dealt with over the course of my career. My personal favorite is the publish to a hidden URL on the production server. So the attackers don't know the URL, so they won't be able to get the private key. That was a fun one. It's not just enough to build good source code. We need to build reliable software. And that's where testing comes in. Unit testing is great.
Still not used nearly enough in our industry, but hopefully we're seeing some growth in that space to make sure that the test coverage is actually testing things that matter. And speaking of testing things that matter, it's not good enough to just do unit testing. You have to test things in context. Especially in today's world of services and libraries that connect in certain circumstances, you need to test things out in those circumstances.
And so it might work just fine through all the unit tests, but when you actually plug it into something that's giving it weird input in a particular way or the timing is off or something like that, it could fall over. So it's really important to have tests that exercise things in a realistic way.
And again, this is something that you can do in an automated fashion. This on screen is the OpenID Foundation's test suite for the OpenID family of protocols, including OAuth and others. And I know a couple of companies that use this as part of their continuous integration pipeline. They actually run this suite of full live integration tests every time they push out a new version. And as you can see, sometimes it works better than others. And that's a really important note here. Never rely on just the happy path when you're doing all of your testing.
I can tell you the number of times I've talked with a junior engineer that built out a unit test and it tested that the right inputs gave you the right answers, and then the bad inputs also gave you the right answers, which was weird. And if you think that that's kind of rare, think about a federated identity system like SAML. If I build a system and I test that every time that I have a signed assertion, the user is able to log in, that's great.
Now, I turn off the signature check. This happens way more than it should, but it happens quite a lot. I turn off the signature check, guess what? I come in with an assertion with a valid signature, I'm going to be able to log in. The happy path still works. I come in with an invalid signature, and guess what? I can also log in. That's a big problem.
So, all of our testing needs to not focus on just the things that we know are supposed to work. Now, this next bit I think is really kind of fascinating is the deployment process. This is where we need to really start stretching as an industry what we're doing here. I remember back in the day the build environment, I would compile the production binary on my workstation and then upload it to the production server and restart a service.
So, that means that the company was dependent on my workstation being in a safe and trusted secure configuration at the time that I did the compilation. Do we really want to have that level of trust in all of our systems? Do we really want to trust every individual laptop? And trust me, an engineer has done all sorts of weird things to their environment in order to make it like they want it to. And if you haven't, please go read the treatise on trusting trust, which is about adding an undetectable back door to a compiler system. This is from the early 1980s.
So, this is a problem that has been known for a very long time. But we as an industry are just getting around to addressing this.
Now, why is that? Well, because back in the day when I had to go deploy to a server, this is what server meant.
Now, these days we go to the cloud and this is what the cloud looks like. But the important difference is that it's somebody else's computer and how I access it and how I integrate with it is different.
So, all of the things that I used to have to do by hand, go set up the OS, set up the OS dependencies and do the installation and all of that, I can actually add automation to all of that because I can have OS images, I can have microservices that are containerized and all of these things that I can actually then package a known state that then doesn't change when I need to alter other pieces of this. So, you need to be controlling all the different parts of the production system when you're pushing things out.
But also, importantly, that last point, automate the installation. Automate that. Because if you have all of this great configuration and then it has to be applied by a human, humans are going to miss things. The instructions might be out of date by the time somebody actually goes through it. There's a lot of problems that can happen. But most importantly, it's not nearly as auditable when a human actually goes and does stuff.
So, understand all of your images. And this goes to all of the parts of the system. You should have a known good piece of software that builds your environment. That copies it to the right place. Because that copy and code, if an attacker gets a hold of that, well, then doesn't matter how good your build system is, doesn't matter how good your production system is. If they get a hold of that one piece of code that copies from one to the other, they can swap out whatever they want and drop it into the right place. Each part of the chain has to be trusted and checked.
So, everything that you're doing, wrap it up in as much automation as you can so that you can automate it, so that you can audit it and you can understand how the different pieces fit together. And then there's users.
Oh, users. They make everything more difficult. Because we now have to ask, sure, we've got all of this automated build and deployment and all of this. Can a user trigger a release? How does this get triggered? Is it only checking into the main branch? Who's allowed to check into the main branch? Does it have to be reviewed by multiple people? And importantly, can that automated system be overridden in a case of an emergency? Can somebody decide that, hey, I have a new version of software that I want to immediately deploy to every Windows system on the planet.
I'm sure nothing will go wrong with that. Too soon? No CrowdStrike folks in the audience? We need to be careful because there will be overrides. There will be manual processes because we need those escape files sometimes. We need to be very judicious about how they're applied and how we deal with them. Every user should have least standing privilege.
So, for that sign check-in, that sign code check-in, can I do that all the time or do I need to escalate? Do I need to authenticate? Do I need to do something in my process? Add a little bit of friction to make that work. Speaking of which, friction is exactly what engineers try to avoid.
So, if you make something too hard to do, an engineer is going to find a workaround. That's kind of what we pay them for is solving hard problems to make life easier. And the path that they go down is going to be the easy path, whether or not it's the right path.
So, you need to make the right path the easy path. When you do find workarounds, because they will exist regardless of how great your system is, when you find those workarounds, you need to understand why they're there, how they got put there, and then incorporate that into the system because it was created for a reason. And I've got a whole separate talk about that. Ultimately, at the end of the day, change is hard. And it's our job to manage how change in the system actually works. There's always a cost to an automation, to taking a manual process and automating it.
And XKCD, comic author, actually went and did the math on this about how much you can invest in a project to automate it in order for it to actually pay back. I'm not going to go into those numbers. But the point here is that there is always a cost.
But often, if it's repeatable, that's what actually ends up paying off. And one of my engineers had this great observation. If you're doing things manually, you're always fighting fires. You're always focused on what the immediate problem is. Somebody needs to go turn off the gas main that stops the fires in the first place. This is where automation can actually help. If you go take the time to invest in turning off that gas main and automating key parts of the system, then you're going to have fewer fires to fight.
Then, of course, there's everything at runtime with our somehow invisible attacker. And I would be remiss to not talk about AI at this conference. Because a lot of people have been talking about this all week. I want to point out that everything that I've been talking about today has been about automation. Not about AI. Automation is about following an action based on a programmed process. We have a thing. We need to do it. We need to take an action.
AI, to me, at least AI as we apply it today, is all about decision making. It's all about sense making. I think that AI does fit as a tool. But it's a tool as input into the expertise. It's something that I can use to help make the decision about which actions to take. I do think that today, as systems are today, it is a step too far to let AI actually take the actions. And we've seen that in a few talks earlier this week.
So, sense making, just very quickly, one of the places I think AI fits a lot is being able to look at the massive amount of data that these fully logged and automated systems give us and start to make sense of it. Tell me where the anomalies are. Computers are starting to get really good at searching for things like that. And all of this needs to be commoditized so that it's easy for users to apply, right? In your systems, go make things repeatable that you're doing all the time and make your actions auditable. This is what automation fundamentally gives you.
By getting the human out of the loop, we actually can build a system that is more robust, more resilient. So, the biggest takeaway today is, of course, stay hydrated. Because the water system is very, very important. But in your code and in your development systems, let the robots help. Let them do what they're good at. Thank you. Perfect. Thank you very much, Justin, for this great presentation. Any questions from the audience this time?
Otherwise, I will walk around and ask randomly. No? Really not? Okay.
Then, thanks again, Justin. Really great presentation. Appreciate it.