Sweeping complexity under the infrastructure carpet

Some projects seem to get into the habit of keeping the application code relatively clean and simple by hiding complexity in the infrastructure layer. My made up term for this is sweeping complexity under the infrastructure carpet. It feels appropriate that this is a bit of a mouthful to say.

This anti-pattern seems to creep in via an attitude that complexity in infrastructure “doesn’t count”, so it’s OK to let it grow and fester there while the application code appears to be more healthy. Maybe complex infrastructure is even a goal to aspire to, as working with high-powered cloud architecture components seems more worthy than tinkering with classes and functions in application land.

This isn’t a complaint about good software infrastructure management. IAC (Infrastructure-as-Code) tools are great and often a good choice for managing infrastructure. We get the carpet sweeping situation when a lack of checks and balances on the complexity of infrastructure seems to attract complexity that might have been better managed elsewhere.

Large functions and classes will rightly draw pushback and refactoring, while ever enlarging infrastructure code seems to go unnoticed until its roots run deep and it’s too difficult to untangle it. Once it’s got that complicated, the temptation to keep adding ever greater complexity in the infrastructure is even stronger. Infrastructure complexity is exponential.

What could have been business logic in application code (where it could be more easily developed, tested and maintained) ends up as obscure infrastructure arrangements such as a call-stack via a series of pub-sub topics, or distributed sagas across multiple servers, cloud functions and datastores. Sometimes the infrastructure seems to be elaborate just for the sake of it, and the underlying situation it’s trying to address is actually not that complicated.

Software complexity is a bit like the Hydra in Greek mythology – trying to chop it up just makes it grow more. Splitting complexity across pub-sub topics makes it worse. Splitting complexity across separate datastores makes it worse. Splitting complexity across separate services makes it worse. Sometimes those approaches are necessary, but they should be seen as a last resort to unavoidable complexity rather than the first thing we reach for.

Debugging complex business logic can be tricky. Distributed debugging makes this much harder still. With labyrinthine infrastructure, sometimes it’s difficult even to track down what is running, let alone how or why.

Infrastructure code is also harder to test than application code. Anything that looks vaguely like business logic is much better off in application code where the complexity can be addressed using well-established techniques, and kept in check with straightforward testing methodologies.

All roads lead to Rome, and all discussions of distributed computing anti-patterns lead to the first rule of distributed computing: don’t distribute your computing. Also, try not to sweep complexity under the infrastructure carpet.