Software (and Python) is growing to power ever more of the world, and it’s only getting more complex. How do we handle large distributed systems? Handle cascading failure? Prepare for emergencies and downtime? Aviation has been through it all, and we’ll look at how we can learn from them.
As we move into this world, we bring ourselves deeper into the realm of distributed systems - where failure happens in new and exciting ways. We’ll take a journey through some of these problems, and how to start addressing them - looking to aviation for inspiration, who have been tackling issues like this for decades, and have managed to turn flying into one of the safest forms of transport we know today.
We won’t just look at code problems - we’ll also look deeper at the social side of engineering, and the changes it takes to let teams work on large, distributed projects well. How should you deal with cascading errors and alerts? How do you work on a single large project without stepping on each other’s toes? How do you handle incident response when something inevitably goes wrong?
There’s changes we can all make to the way we work as engineers - from solo projects to gargantuan, multi-company ones - to write better, more reliable software. This talk will show you some of the options out there that can help you, as well as showing how you should, always, be building for failure.
Watch 'You Have Control: Learning lessons from aviation' on PyCon AU's YouTube account
Andrew is a member of the Django core team, and has been working with Django since 2007, on projects such as South, Django Migrations, and Channels. He works at Eventbrite as a Senior Software Engineer.
When he’s not writing open source or travelling around the world speaking about it, he enjoys flying light aircraft, archery, electronics, and developing games.