1. Introduction
Consider a company called RMC Panels that makes exterior panels for commercial buildings. Most of the panels are typically made from glass, steel, or concrete, and can cover one or two stories of a building. They are often used with steel or concrete buildings because they can be quite heavy. Glass panels are made of self-supporting panels that are hung from concrete slabs with anchors. They can give commercial buildings a look of glass from top to bottom.
One of the key steps in the manufacturing process of these exterior panels is to cut and shape the panels with equipment known as a CNC Router. A CNC router station consists of a computer-controlled machine that uses a rotating cutting tool to remove and shape material. The acronym CNC stands for Computer Numerical Control, which refers to the computer software and electronics that control the machine. These machines are very expensive, so reducing downtime is critical to realize a good return on investment. Figure 1 provides an example of a CNC Router.
Figure 1. A CNC router.
In this study, we will use spreadsheet software and go through Root Cause Analysis for a Downtime Reduction project. We define downtime as the time period where the CNC machine is not available for production due to maintenance, breakdowns or lack of needed manpower. It is the author’s belief that just a small set of basic quality tools goes a long way in process improvement projects (Rispoli, 2021).
Reducing downtime is an important concept in many manufacturing processes. It is often a component in the measurement of Overall Equipment Effectiveness (OEE). OEE is a lean management concept that measures the effectiveness of a production process. It identifies the percentage of manufacturing time that is productive. Operationally, by identifying: a Utilization rate (the opposite of downtime), Performance Percentage (ratio of goods produced to maximum possible), and Quality (percentage of defective parts), OEE is obtained by:
OEE Percentage = (Utilization) × (Performance) × (Quality).
By tracking OEE, manufacturers gain insight into how well their equipment and process are functioning, allowing them to pinpoint areas for improvement and optimize production efficiency. OEE is an example of an often-used Key Performance Indicator. See (Sowmya & Chetan, 2016) or (Patel & Deshpande, 2016) for a more detailed discussion of OEE.
2. Root Cause Analysis
A root cause is defined as a cause of a problem such that once removed, the problem does not return. Root Cause Analysis (RCA) is a popular approach to process improvement projects, such as reducing downtime. The methodology is often data-driven and proceeds through some variation of the following sequence of steps: Defining the Problem and Scope, Apply Immediate Corrective Actions, Gather Data and Evidence, Identify Casual Factors and Determine the Root Causes, Develop and Implement Permanent Corrective Actions, Validate and Monitor Effectiveness. The methodology is well-known. For more detailed references, see (Okes, 2009), (Gano, 2003), or (Andersen & Fage, 2006). A recent overview of the current state of RCA is given in (Oliveira et al., 2023) and in (Pietsch et al., 2024). Applying RCA to various downtime reduction projects has been occasionally looked at, for example (Kiran et al., 2013).
Defining the Problem and Scope
The problem studied here is to reduce the downtime at CNC/Routing stations over the course of 8-hour day shift. The initial average downtime of 30.23 minutes per day is based on historical data over a period of 3 months. In the long run, the hope is to achieve a goal of increasing panel production and decreasing the average downtime by over 50%. A simple description of the process at one station is given in Figure 2. The critical part of the problem is enclosed in the rectangle.
Figure 2. A level 1 process map of work at a CNC station.
The scope of the problem is to focus on the management of available manpower resources, equipment maintenance, and supply inventory at the stations. Additional manpower is often needed to help load and unload material, scan program sheets and unload skids. Other issues will be considered including inventory management of supplies, and equipment maintenance.
Most RMC Panels produced are 4 by 10-foot panels that generate a profit of roughly $600 each panel. The demand for these panels is strong, so a 50% reduction in downtime could lead to two more panels being produced each day. Over the course of a year, with 250 working days per year, this yields an annual increase in gross profit of approximately $300,000.
Typically, in this first RCA step, one would also include the identification of team members, and a team leader. It is also common to include a primary metric, which in our case, would be average downtime per day, plus any secondary metrics such as OEE Percentage.
Immediate Corrective Actions
For problems such as a lack of manpower, immediate corrective actions that were being used are to pay excessive amounts to induce workers to work overtime. It is also common to have some highly skilled workers doing entry level jobs of loading and unloading when manpower shortages exist.
For supply inventory shortages, it was common to communicate with supply chain management to expedite needed material delivery. This carries an additional charge for fast delivery. For equipment failures, it is common to have to stop production and recalibrate machines to restore function. All of these issues add to downtimes which lower daily production. These actions clearly increase costs and reduce long-term profit.
Gathering Data and Evidence
To look deeper into the issue, we began by looking at baseline data for a recent 20-day period. Graphs are given in Figure 3. The horizontal green line in the bar graph indicates the average downtime per day, which is 30.55 minutes. Figure 3 also provides box plots for downtime at each station collected over the 20-day period. Both these graphs raise the question: Is there a significant difference in average downtime between the two stations? Hypothesis tests for this were eventually carried out.
Figure 3. Baseline downtime data at the two CNC stations.
Next, data was obtained to try to determine the most significant reasons for the downtime. The set of possible reasons was narrowed down to four categories: lack of manpower for loading and unloading skiffs, shortage of needed inventory, machine breakdowns and a catch-all category for all other reasons. These breakdowns were studied and tallied on a monthly basis, for all of 2024. The percentages are illustrated in the stacked bar chart given in Figure 4.
Identifying Causal Factors and Determining Root Causes
To go a little deeper, we developed a comprehensive list of causal factors, and then ultimately a set of potential root causes. This can often be carried out using some basic root cause analysis tools, such as a Cause-and-Effect Diagram and a Five Whys analysis. For the downtime reduction project, we utilized a Causal Tree which is a cross between these two methods. These are also called Logic Trees for Causes, for a reference, see (Okes, 2009). The main idea is to begin with the problem and then keep asking why until a potential root cause is reached. But different from a Five Whys analysis, a comprehensive answer is given to the why questions. This leads to the construction of a tree structure, as opposed to the linear straight path of reasoning.
The causal tree for downtime is given in Figure 5 with red text boxes indicating potential root causes. A causal tree may be constructed using the Smart Art Horizontal Hierarchy in Excel.
Figure 4. Percentages for downtime causes during 2024.
Figure 5. The causal tree for the downtime reduction project.
At this point, hypothesis testing is used, and the team examined the Causal Tree as well as the Downtime Percentage Breakdown chart to help develop a prioritized list of factors. Ideally, the list is broken down into a set of significant factors and a second set of non-significant factors. Independent two-sample t-tests were carried out for the following hypotheses.
1) There is a significant difference in the average downtime at the two CNC stations.
2) The average bathroom break time is significantly different at the two CNC stations.
3) The average weekly scrap produced is significantly different at the two CNC stations.
A result of significance would indicate that the variable being tested is critical, and if managed better, could potentially lead to a significant improvement. All tests for the above issues indicated a significant difference. The details are given in Table 1. Equal variance t-tests were used for Hypotheses 1 and 3, an unequal variance test was used for Hypothesis 2. The data sets satisfied tests for normality.
Table 1. Hypothesis test results.
Hypothesis |
Sample Size |
Means |
Standard Dev. |
t |
p-value |
1 |
20 |
27.8 vs. 33.4 (mins) |
8.81 |
−2.09 |
0.043 |
2 |
25 |
9.8 vs. 12.6 (mins) |
4.55 |
−2.05 |
0.047 |
3 |
25 |
124.4 vs. 137.6 (sq. ft.) |
19.67 |
−2.49 |
0.016 |
The hypothesis test results led the team to realize that the absenteeism and bathroom breaks need to be managed better. Also, the training is inconsistent. The team then developed the following list of root causes given below.
1) The likelihood of an absent worker and the number of people available to work at each station each day.
2) A lack of accountability when it comes to bathroom breaks and lunch times.
3) The amount of training provided, and the lack of cross training provided.
4) The frequency of machine maintenance outside of planned maintenance.
5) A lack of space.
No one single item on this list may result in significant improvement. But when corrected together, the team believes that a significant improvement will be obtained.
Developing and Implementing Permanent Corrective Actions
To develop permanent corrective actions, the team must identify and evaluate potential effective corrective actions. A Causal Tree is also quite helpful here. To reduce downtime the three branches on the Causal Tree were considered with a goal of matching corrective actions to the root causes. The lack of labor to load and unload materials was addressed mostly through the appointment of an operations manager, who will develop a schedule for each station, taking into account an additional worker, close monitoring of bathroom and lunch breaks, and working with Human Resources to make sure that cross training takes place.
To improve inventory management a layered inventory system was developed. Layered inventory refers to a system of organizing and managing supplies where items are categorized and stored in distinct layers based on factors such as demand, shelf life, and product type. The goal is to improve efficiency, making it easier to track, retrieve, and replenish items. The panel manufacturer created layers in its inventory by placing high-demand items at the front for easy access, and slower-moving items will be stored further back. This helped optimize space and streamline operations, reducing the time and effort needed to locate and manage supplies.
Machine breakdowns were addressed by improving the Total Productive Maintenance Program (TPM). Human resources were tasked with hiring an additional part-time employee with a focus on keeping equipment at the CNC stations up to date by increasing the frequency of service.
To obtain more space, the team encouraged management to conduct a space utilization study. The team noted that often 10% - 20% of floor area appears to be underutilized. So “freeing up” space through layout redesign and Lean principles would be the most efficient way to resolve this issue. This addresses the Lack of Space root cause identified in the causal tree. Item 1 below addresses the Lack of an Operations Manager and the Lack of Accountability identified in the causal tree. Item 2 addresses the Insufficient TPM Program root cause and the lack of labor and training.
1) A CNC Stations Operations Manager must be identified. Responsibilities of the Stations manager include:
Determine the optimal number of people to schedule at each station every day.
Make sure workers are accountable, monitor bathroom breaks and lunch breaks.
Communicate with Maintenance, Human Resources and Information Technology.
Downtime at each station must be closely monitored and posted.
Make sure that the inventory will be maintained using a layered approach.
2) Two additional employees must be hired. Responsibilities of Additional Employees:
Help load and unload material.
Scan program sheets in and out.
Help with swapping out skids and cleaning tables.
Work with the TPM program to make sure that the equipment at CNC stations is maintained in a timely manner.
3) Training must be improved.
CNC operators must be certified and receive cross-training, so they are also fork-lift certified.
A training program will be designed by the Stations Operations Manager who will also find instructors.
Training courses will be offered quarterly, possibly by a Manufacturing Extension Partnership Center or similar training organization.
Once these changes have been put into action for a six-month period, daily downtime data was collected for a 30-day period. The results were then compared to the baseline data. The new data showed that average downtime was reduced to 16.6 minutes per day. This represents a downtime reduction of roughly 46%.
Validate and Monitor Effectiveness
To maintain improvements, the team has determined that the Station Operations Manager must meet periodically, perhaps quarterly, with Human Resources and the TPM team to make sure Training and TPM are carried out. It was also determined that it would be helpful to closely monitor downtime on a weekly basis. Having achieved a 46% reduction in average downtime, the team identified the following items to ensure long-term sustainability.
The CNC Stations Operations Manager will have quarterly meetings with Human Resources to discuss how the quarterly training sessions will be carried out.
The CNC Operations Manager will check to make sure that the equipment is properly maintained and will communicate with TPM periodically.
The RCA team will develop a new Standard Operations Procedure.
Monitor utilization and downtime rates using gauge meters posted near the CNC Stations. The gauge meters will be structured as displayed in Figure 6. Gauge meters can be constructed using the doughnut charts in Excel.
Figure 6. Gauge meters used to monitor rates.
3. Implications
Next, we examine what the improvement implies in terms of Operating Equipment Efficiency Percentage. Recalling that OEE percentage is given by:
(Utilization Rate) × (Performance Rate) × (Quality).
Now, consider an 8-hour shift, which we look at in terms of 480 minutes. The original average downtime was 30.55 minutes. This yields a utilization rate, given by operating time divided by planned production time, equal to
or 93.6%.
The Performance metric measures how fast the equipment was running compared to its maximum possible speed. The calculation is given by:
Suppose we know that the machine can produce 50 units per hour, and it actually produced 300 units during an operating time of 420 minutes. Using actual production (over the 420 minutes) divided by the maximum possible production, we get 300/420 or 71.4% for performance.
Finally, the Quality metric measures the percentage of good units produced versus total units produced. If 10 units were defective out of the 300 produced, then Quality is given by 290/300 or 96.66%. Thus, prior to the process improvement project, the OEE percentage is given by
(0.936) × (0.714) × (0.966) = 0.6465, or roughly 64.6%.
After the project was completed, the average downtime was reduced from 30.55 to 16.6 minutes per 8-hour shift. Moreover, the amount produced daily improved by 2 to 302. This leads to an improved utilization rate of 0.9654 and a slightly improved performance of 0.7193. The new OEE percentage is now 0.6710, or roughly 67.1%. So, an increase of 2.5% in Operating Equipment Efficiency percentage.
In financial terms, the additional costs of the corrective actions would amount to roughly $100,000. The 15 minutes per day saved represents a 46% reduction in average daily downtime. The additional panels produced, plus the reduction in having to pay both excessive overtime costs and expediting inventory costs, lead to a $300,000 increase in gross profit. So, there is a net increase in profit of $200,000. The team concluded that the original goal was essentially met.
4. Conclusion
Managing downtime is a common problem in manufacturing. Root cause analysis was conducted for the significant issues causing downtime. Here we see that reducing the average downtime per day by roughly 15 minutes makes a large difference in long-term profitability, and a 2.5% improvement in OEE percentage. Clearly, this was worth the investment in carrying out the root cause analysis. For the problem considered here, an additional full-time worker was needed to help with the loading and unloading material, and an additional part-time worker is needed for the TPM program. There was also a need to assign a Stations Manager to oversee operations and accountability.
The key tools used in the analysis include process maps, stacked bar graphs, causal trees and gauge meters. These basic tools, available in spreadsheet software, go a long way in process improvement projects. Furthermore, in addition to the benefits in profit and OEE percentage, there are additional intangible benefits including customer service being improved since the turnaround time on a customer order is reduced.
It should also be noted that this study was conducted on a pair of CNC Router stations, which limits the generalizability of the findings to other sites or operations with different equipment, workflows, or management practices. Additionally, the observation window was relatively short, potentially missing longer-term trends or anomalies. Factors such as seasonal variations in workload, as well as learning effects as operators become more familiar with the equipment, may have influenced the results and should be considered when interpreting the data.