Don’t overdesign processes: balance prevention with detection

Processes are supposed to create value. For example, the value of a software development process is in enabling users to use the developed code to create value.

Processes can also produce defects–outputs that do not meet specifications. For example, the developed code breaks under high load or does not work as specified.

There are two fundamental ways to limit defects:

  • Prevention: keeping defects from happening in the first place, through checks and other restrictions
  • Detection: defects/errors are noticed afterwards (and are then corrected)

Process controls can also be added to reduce the risk of someone not following the process. Developers might put code straight into production, or not get the needed requirements. To combat the risk of a defect, people tend to add controls or “error-proofing” that forces people to follow the process correctly.

Error-proofing and controls are not entirely bad. Segregation of duties, for example, is an important concept. It’s important that people have to badge in to get physical access to servers. However, error-proofing and controls do not add value to a process–they create process waste.

When designing a process to minimize defects, please consider whether it’s more helpful to prevent the defect, or to detect the defect.

Prevention vs detection on the bus

As an example, let’s look at people paying for public transit and contrast Vienna with Atlanta. In Atlanta, if you’re going to ride a bus , you need to buy a ticket and then use the ticket in order to board. Everyone must queue up to prove they have paid before they can board. In Vienna, you are expected to buy/swipe a ticket but you do not need to use the ticket to board. People just jump on the buses or street cars at any entrance. In Vienna’s model, rather than prevention they use detection: police randomly check people’s tickets and you get a heavy fine (say $300) if you don’t have a valid ticket.

Why does Vienna’s model work? The consequences of not following the process are very high. If you don’t buy the $2 bus ticket, you have a 1% chance of being caught and given a $300 fine. In the arithmetic of risk that’s a $3 risk (1% * $300 = $3). Vienna can change the price of the fine and the amount of enforcement to change the risk equation, too.

However, Americans especially are used to the Atlanta version: you have to prove up front you are doing what’s expected.

Prevention controls and their implication

Here are a few common “prevention”-based controls–steps added to a process for “error-proofing:”

Certainly none of these controls are bad in themselves. However, it’s very easy to add steps to your process, and it’s usually much harder to remove steps.

These preventive checks are often added as a quick “solution” to reduce risk rather than to improve quality. If everyone in the department has signed off on something, you won’t get blamed if it breaks. However, there can be a subconscious negative message that accompanies these controls: people do not have good judgment. These checks erode zones of control.

If reducing risk is a high priority, there may be other systemic issues at play. If people are blamed for mistakes then they are naturally going to want to take fewer risks. See also our presentation, “Problems are Treasures,” about moving from a culture of blame to a culture of safety and trust.

Some defects are not worth the costs of prevention

Certainly no one wants defects. But, if the party responsible for creating the defect is also responsible for correcting the defect, you have a natural closed-loop process. People tend not to want to create work for themselves.

The trick is in understanding what is a show-stopping defect that absolutely must not happen, versus a still-serious defect that must be addressed but does not warrant over-engineering the process. This is a risk management issue: don’t try to prevent low-risk defects.

For example, it is easy for systems administrators to mess up systems; one of the most common problems is when an administrator is in the wrong window when they issue a command and they, say, make the production server reboot rather than the development server.

To prevent administrators from rebooting production systems, you would need to create a large set of controls. This can be done, and for a bank or investment firm would certainly make sense. But in higher education (at least in 2013), it’s probably over-engineering your processes to require systems administrators to be under observation when operating as a superuser.

The potential defects or errors are quite serious, but the cost of prevention is higher. Just many times people jump to preventing the defects before checking on whether the cost of prevention is higher than the cost of detection and correction.

Aside: Quality Is Free

One struggle I had in writing this article was Philip Crosby’s famous quote, “Quality is free.” The premise behind this quote is that quality improvements pay for themselves.

How could I reconcile Crosby with the above article?

For one, many preventive controls are not rooted in quality improvement. They are instead workarounds put in place to minimize risk. By requiring a sign-off you distort zones of control. Were other ways considered to address the risk besides adding a sign-off?

Quality improvements can certainly pay for themselves, especially when you learn what your root cause issues are. I posit that addressing low-risk defects by adding more waste to your processes does not count as improvement.