The software system is struggling, and the logs won’t say why. To make the software run smoothly again, Detective Plum Bago must solve the mystery! This case study shows how to use metrics to identify performance bottlenecks in distributed systems, to ensure the right culprit is captured.
Have you ever wanted to know “Where is the bottleneck in my distributed system/microservices/job queue?” This talk will level up your detective skills for investigating software systems performance problems.
PLUM BAGO INVESTIGATES: The development team is making steady progress on the new software release, when suddenly the staging server is found slumped on the floor. System load is sky-high and database writes are backing up in a troubling fashion. A workload that could previously be handled easily is now choking the application. Several logs witnessed the attack but none could identify a culprit. A crack team of expert detectives is assembled to solve the case - and everyone has their own pet theory. Suspects include The Network, The Messaging Queue, The Database, The Software Configuration and the well-known fugitive, SSL. Detective Plum Bago is the head sleuth charged with solving the case.
When attempted murder of a software application’s performance is the crime, and logs are not enough to identify a culprit, how can an investigator enable the victim to speak up and provide clues? If it’s no longer premature to optimise, how can metrics be extracted across a complex application? Join Plum on a journey towards understanding and justice.
Watch 'The Case of the Mysteriously High System Load' on PyCon AU's YouTube account
Brianna Laugher
Planet Innovation
Brianna Laugher is a lead private investigator for mysterious software behaviour at Planet Innovation, working on software for medical devices. She is a past contributor to pytest and currently organises PyLadies Melbourne.