For Robust Solutions
Why organizations should embrace big systems and get rid of duct tape solutions
By Jeppe Johansen
In the recent blog post Against Robust Solutions, Joachim argues that big systems1 are problematic for three reasons: first, they introduce more unmanageable risk, second, the problems they solve are in general of a trivial nature, and finally, the issue they mitigate tend to be of the past. I disagree with this analysis on multiple points. It assumes that systems primarily address problems solved by already an existing duct tape solution2 (DTS), and don’t solve problems that are directly caused by these. However, consider this example where 1/3 of all genetics papers contain errors due to Excel auto-correcting the gene sequence. The argument Joachim makes assumes that big systems solve yesterday’s problems. However, a good system is ideally designed for extensions and with the future in mind. Concretely, the system uses some good abstractions on the domain that is modelled and exposes a method for communication with new or separate systems. Take as an example the design of a new tax system. A good system designer will consider future use cases, such as the introduction of new tax codes, so that future extensions are easy to incorporate.
Beyond not believing that we need be so pessimistic about large systems as Joachim suggests, there are at least three good reasons for actively promoting the implementation of large systems in organisations, which I outline below. However, I must admit that a poorly designed system usually makes for a terrible user experience and good systems hinges on implementation. And if we as a society believe that the systems we are interacting with on a daily basis are bad, it is probably not because systems themselves are bad, but because we are bad at implementation.
Systems allow for governance
The first argument for big systems is that they allow for governance if they are well built. Since they usually come with a good interface, you can inspect the system. Issues of governance can come in many forms, but all hinges on the ability to easily reason about the system’s internal state with very little friction. Take these two examples. First, questions like: “How many customers have we had?” or “Would this person be legible for a new loan?” can be answered in an easy way, due to the system being transparent. Contrast this with a DTS where five different employees might have been solving the same problem in five different ways. How do we ensure that everyone was treated the same way? Second, what if an employee leaves? Is the excel sheet still on his personal drive – can we ensure we can access it? I.e., a DTS might be dynamic, changing over time, making it hard to easily reason about how former decisions were made. Both examples show how the lack of transparency in DTSs can make it an almost impossible exercise to ensure good governance practices.
More generally, DTSs’ tendencies to be untransparent has made me suspect that a lot of the very public catastrophes of big IT systems have been a consequence of a “transparency bias”. We know about the catastrophes because we can know about them, due to the system’s ability to be transparent. Think of it this way; if you have 100 different excel sheets that are used to allocate public grants, how would we effectively search through them and see if any fraud has happened? You would need to manually go through each individual spreadsheet and get comfortable with its internal logic to spot if anything fishy was going on. The bureaucratic burden would be unmanageable. DTSs, I speculate, have a long string of undiscovered catastrophes due to them being opaque.
Systems reduce human single point of failure
Very much in the same vein as the point above, one of the most pernicious ways problems can arise is when a single unlucky or bad actor creates a mess for the larger community. These problems, I suggest, are more common in DTSs. Because these systems have usually risen organically, they are not made and tested against weird cases, strange circumstances, or small errors. Consider the case where Fannie Mae had an error that was caused by an honest mistake3 in an Excel sheet that cost them north of 1 billion US dollars. This probably really was an honest mistake, and such a mistake could easily be introduced in a system as well. The contrast, however, is that a good system would allow for transparency and heavily reduce these types of single-point failures, caused by humans. Especially considering that modern software allows for good test suites and any change to the software can be run against the tests and easily detect if the system as a whole still works as expected. Humans don’t need to have bad intentions to make mistakes, and how do you catch those in a highly distributed system? In general, I believe, uses of DTSs cling to an implicit hope that that many small errors do not accumulate or have systemic influence over other crucial parts of the business, but this is a hope not a guarantee, and a stakeholder should consider if this hope is warranted.
Systems allow for integration
Most DTSs have no good interfaces. That implies that it’s hard to make other parts of the organization efficiently integrate into them, i.e. if your budgeting is done in an Excel spread sheet and the same is true for your revenue forecasting, getting these to efficiently communicate is hard. One thing I still cannot wrap my head around is why no good alternative to the Excel + MS Word + Powerpoint suite with an easy-to-use user interface has been invented. Concretely, when using the MS Office suit, an Excel sheet will be used for some part of the analysis and the results (figures and tables) will need to be manually dragged and dropped into a Word document or PowerPoint presentation. Yes, there exists integrations that help with this, but in my experience, these are terrible. How many organizations around the globe, use all three pieces of software and manually move graphs and tables between them? It is not a trivial number of workhours that could be saved each day considering the scope! Now, a system would allow for efficient transmission of data because it would have a thought-out interface for other programs to interact with. Basically, good systems allow for streamlined information processing. This can be considered the equivalent of plumbing – just for digital infrastructure. This might sound like a small thing, but anyone who has tried to apply for a J1-Visa to America knows this is not the case. You will need to interact with 6 or 7 different systems for no other apparent reason than the information flows poorly between different parts of the organization/government.4
More generally, every time I am in contact with some official government body (or private institution for that matter), and I receive a word document, with forms to fill out, I know, I am at the mercy of a DTS. And I know the very real consequence of being in contact with a DTS. DTSs are slow, unreliable, and error-prone. It does not allow you to get immediate feedback on whether you have supplied information correctly. Maybe the clerk handling your documents is overworked, sick, or on parental leave, and you have simply no guarantees other than your own tenacity, whether your application will be processed in due time – or even processed at all.
Jeppe Johansen is a regular writer at Unreasonable Doubt, where he writes about aliens, economics, the integrity of institutions, and everything in between – if anything really. Jeppe is a Ph.D. fellow at the Center for Social data science at the University of Copenhagen.
If you liked this article, you might also like our post Against Robust Solutions.
Think of a system as some piece of software, with one or more dedicated interfaces (UI) and/or API, that solves some sort of data processing task. Whether this is a new tax system or a customer relationship management program, they both obey the definition.
I will use Duct Tape Solution as a catch-all term for small ad hoc. solutions, such as Excel Sheets with macros floating around in organizations. This is based on how seemingly everything can be held together by gaffa tape, but it’s rarely the most robust construction you end up with.
The formulation according to this website was: “There were honest mistakes made in a spreadsheet used in the implementation of a new accounting standard”. I have, however, not been able to find a public statement from Fannie Mae, where they say this.
To receive a J1-Visa you will need a physical copy sent from the US to DK (I assume this is the case for every country), fill in a few pieces of information, and deliver it to the American Embassy with your passport, so they can give you a stamp that allows you to enter the country.