This is the first of a series of articles on conducting a root cause investigation. The model applies to a Corrective Action/ Preventive Action or CAPA investigation, as well as any other type of investigation. In this premier article, we describe a model for conducting a science-based, systematic investigation leaving future articles to delve into more detail as the individual steps are explored, specific tools are highlighted, and example investigations are reviewed.
To us an investigation is directly related to measuring performance. We may measure the performance of a:
When we measure performance we hope to see a fairly steady, predictable result (Figure A). Occasionally, however, that performance drops (Figure B).
A drop in performance doesn’t happen by itself; something changed in the product, machine, test, process or transaction, and this change caused the performance to drop. To find and eliminate that change we conduct an investigation.
Figure C is our model for the investigation – we call it the Investigation Roadmap. We’ll introduce it today and explore it in greater detail in future articles. The investigation methodology consists of Seven Steps, represented by numbered boxes. Within each box icons symbolize tools that can be used with that step. The icons are not intended to be limiting. There are hundreds, if not thousands, of tools that can potentially be used during an investigation. The Seven Steps alone present a very theoretical approach. It’s the tools that make the methodology extremely practical. We’ve organized the Seven Steps into the five phases of Define, Measure, Analyze, Improve, Control (the DMAIC)methodology used by Lean Six Sigma. In fact, we have integrated concepts into our roadmap from many improvement strategies.
We begin the investigation with an investment many organizations fail to make: defining the performance problem. Without a fundamental understanding of the issue investigators are doomed to wasting a tremendous amount of time and heighten the risk of failure. To gain an understanding of the problem itself we advocate describing the problem in 8 dimensions using the IS / IS NOT Diagram, which immediately places boundaries on the investigation thereby providing focus and narrowing the search for the change. Additionally, for each process being investigated we need to understand those process steps and their inputs. After all, we’re looking for a change that caused our drop in performance. The change we’re looking for may be a change in the process itself or a change in one of the inputs to a process step. We strongly prescribe describing each process through a process flow diagram modified to identify the key inputs to each process step.
A second investment often overlooked is collecting data, which is vital to:
A common error with this step is people relying solely on the fishbone diagram. The fishbone is a good tool for developing a list of possible causes, however, it is only a form of brainstorming. There are other very good brainstorming tools and additional tools much more powerful than a fishbone. We’ll explore some of these in future articles. We stress using multiple tools to develop a fairly extensive list of possible causes. If, when we finish step three, the real root cause (the change) is on our list, we will be able to find it. If it’s not, we’re on our way to failure.
Now we reap the rewards of our Step 1 and Step 2 investments. We take one possible cause at a time and test it against the facts in our IS/ IS NOT diagram. Roughly 85 percent of the possible causes will be quickly ruled out and only a few probable causes remain for further investigation.
The technical root cause is the change we have been searching for; the technical reason for our drop in performance. To identify the technical root cause from the few remaining probable causes we need to do more.
Once the technical root cause is known we can use the 5 Whys to identify systemic root causes, system failures, which allowed the change to occur or failed to detect the change.
We now determine the corrective or preventive actions. All root causes will fall into two categories,
For the first, mistake proofing is applied to eliminate or reduce the probability of the mistake, find the defect, or mitigate the drop in performance. For the second, variation reduction and optimization techniques are used.
Recognizing that the corrective / preventive actions are themselves changes, risk mitigation is applied to reduce the probability that new problems may occur when they are implemented. Risk mitigation tools include the use of:
Next, a control plan is developed to assure the performance problem does not return.
Finally, we need to assure that the corrective / preventive actions are actually implemented and then measure the performance to assure it returns to the level it was at before performance dropped. There is also the opportunity to capture the knowledge gained through the investigation and share it with other parts of the organization and key partners so that they may take additional preventive actions.
We have summarized a systematic, science-based methodology for conducting an investigation. Future articles will explore the steps, tools, and example investigations in greater depth.