Unlocking Efficiency: The Importance of Predictive Maintenance

Explore how predictive maintenance powered by AI is transforming industries like mining and manufacturing by reducing costs and minimizing downtime.
Apr 23, 2025 · Lukas Wiku, Vincentius C. Calvin

Understanding Machine Failure

Machine failure is a critical event which occurs when a machine or one of its components can no longer perform within its intended operational parameters. These parameters—known as performance thresholds—are usually defined by manufacturers based on engineering tolerances, historical assumptions, and safety margins. In real-world applications, however, these thresholds can vary widely depending on the underlying operating conditions, usage patterns, and even enviromental stresses.

Traditionally, failure has been thought of as a linear process—declining over time. However, a foundational study by Nowlan and Heap disrupted this notion. Their findings revealed that the majority of failures occur randomly and are not directly related to the age of the equipment iself. In fact, only a small percentage of failures folllowed a clear wear-out pattern. This discovery marked a turning point in maintenance philosophy and laid the initial groundowrk for condition-based and predictive maintenance approaches.

To visualize these failure trends, the Bathtub Curve is widely referenced:

The bathtub curve

source: https://www.sciencedirect.com/topics/engineering/bathtub-curve

The curve gets its name from its shape and consists of three distinct failure regions:

Infant Mortality Phase: This early phase is characterized by a high rate of failures shortly after deployment. These failures are often the result of manufacturing defects, installation errors, or early design flaws. Quality control and burn-in testing are common practices to mitigate this risk.
Normal Life Phase: Once early defects are resolved, the equipment enters a stable operational period. In this phase, the failure rate is relatively low and consistent—but importantly, most of the failures that do occur here are random and unpredictable. They may be triggered by anomalies in usage, environmental fluctuations, or unforeseen stresses.
Wear-Out Phase: As the machine ages, components begin to degrade due to fatigue, corrosion, or material breakdown. The failure rate increases gradually, and this stage marks the traditional concept of aging-related failure that preventive maintenance programs were originally designed to address.

While the Bathtub Curve itself is a simplification and does not represent all asset types equally (especially modern digital or software-integrated systems), it provides a useful mental model. More importantly, it provides us with a key hypothesis to begin with: only a fraction of failures are truly time-dependent. The rest require a deeper understanding of real-time operating conditions, usage variability, and anomaly detection.

This insight challenges organizations to rethink maintenance from being strictly time-based to being data-informed and condition-aware. As we’ll explore further, this shift is not just a technical evolution—but a strategic business imperative.

Maintenance Strategies

To manage the risk of machine failure and ensure operational continuity, organizations implement various maintenance strategies. These strategies have evolved over time—from simple reactive fixes to sophisticated predictive systems driven by data and AI.

Each approach carries trade-offs in terms of cost, risk, and efficiency. Understanding each of these approaches would be undoubtedly crucial, especially for industries where downtimes directly impacts revenue, safety, and repuation—all being equally crucial.

Reactive Maintenance (Run-to-Failure)

Reactive maintenance is the most basic form—equipment is used until it breaks, at which point repairs or replacements are made accordingly.

This approach may seem cost-effective in the short term, especially for non-critical assets where failure doesn't lead to major disruption. However, for most enterprise environments, waiting for failure is a high-stakes gamble.
- Pros: No upfront planning or monitoring required.
- Cons: Unplanned downtime, higher repair costs, potential safety hazards, and production losses.
Example: In a manufacturing plant, allowing a conveyor belt motor to fail during production without any prior warning would halt an entire production line, causing cascading delays and missed delivery deadlines.
Preventive Maintenance (Scheduled Intervals)

Preventive maintenance schedules inspections, servicing, or part replacements based on fixed time or usage intervals—such as operating hours, mileage, or calendar dates.

It is widely used and often mandated by equipment manufacturers. However, it is often based on historical averages, not on the actual condition of the asset.
- Pros: Reduces catastrophic failures, follows manufacturer guidelines, easier to manage at scale.
- Cons: May result in unnecessary maintenance for lightly used assets, or insufficient intervention for heavily used ones.
Example: A fleet of delivery trucks may be serviced every 10,000 km. However, it is logical to assume that a specific truck operating in hilly terrain may degrade faster than another used on flat roads—despite having traveled the same distance. Without adapting to real-world usage, preventive maintenance can either overspend on maintenance or fail to prevent issues altogether.
Predictive Maintenance (Data-Driven, Condition-Based)

As opposed to the previous two, predictive maintenance leverages real-time data from sensors embedded in machinery—capturing metrics such as temperature, vibration, oil quality, pressure, and acoustic signals. The data streams are then analyzed using AI and machine learning algorithms to estimate the Remaining Useful Life (RUL) of components.

This approach enables "just-in-time" maintenance—intervening only when the system detects that failure is likely within a specific window.
- Pros: Reduces unplanned downtime, extends asset life, lowers operational and labor costs, improves safety and reliability.
- Cons: Requires investment in sensors, data infrastructure, and model development.
Example: In a power plan, vibration data from turbine bearings can indicate the onset of imbalance or misalignment. With predictive models, engineers can plan a controlled shutdown before failure occurs—minimizing repair costs and preventing energy service disruption.

Why the Shift to Predictive Matters

Modern operational environments are far more complex and fast-paced than when time-based maintenance models were developed. Factors such as increased asset complexity, rising energy and labor costs, and tighter delivery SLAs demand smarter, leaner, and more adaptive maintenance models.

Predictive maintenance offers a transformative leap by enabling enterprises to:

Reduce maintenance costs by avoiding unnecessary service.
Prevent revenue loss from unexpected downtime.
Increase the overall lifespan and availability of critical assets.
Align maintenance planning with business operations and inventory management.

In short, predictive maintenance is no longer a nice-to-have—it is a competitive advantage.

Bridging Into Real-World Application

While the strategic benefits of predictive maintenance are clear, its true value becomes even more apparent when applied to real-world scenarios. In practice, the limitations of preventive maintenance are especially visible in industries where machinery operates under unpredictable, variable conditions.

Take, for example, heavy equipment manufacturers like Komatsu. They provide scheduled service recommendations—after 500, 1,000, 2,000, and 6,000 hours of operation. These intervals are designed based on typical fatigue models and average usage assumptions. But in reality, machine performance rarely adheres to “average.”

Under intense workloads or harsh environments, a machine might fail well before its recommended service interval.
Conversely, in lighter-duty applications, machines can operate safely beyond those checkpoints—making scheduled maintenance unnecessary and costly.

This mismatch reveals the core flaw of time-based servicing: it lacks visibility into the actual condition of the equipment. It treats all use cases equally, regardless of how differently machines are stressed and utilized.

A smarter approach involves continuously assessing the real-time condition of machinery to understand how long it can safely operate before intervention is truly needed. By identifying early warning signs of degradation—through vibration patterns, temperature changes, or pressure fluctuations—maintenance can be scheduled precisely when it’s required, avoiding both premature service and catastrophic failure.

This philosophy forms the foundation of predictive maintenance, and it’s more than theory—it’s testable.

From Theory to Practice: Experimenting with Predictive Maintenance

To illustrate the difference in outcomes between preventive and predictive maintenance, we’ll turn to a practical dataset: the NASA Turbojet Engine Dataset. This dataset captures sensor readings over time from simulated jet engines, tracking their performance as they gradually degrade toward failure.

By comparing predictive models trained on these sensor patterns against traditional scheduled maintenance assumptions, we can later quantify the economic and operational value of predictive strategies in action.

Importing modules

For this experiment, we’ll use Python with standard libraries for data processing and machine learning:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
 
import os
import random
import warnings
 
np.random.seed(34)
warnings.filterwarnings('ignore')

Dataset Exploration

Each row in the dataset represents a snapshot of an engine’s state at a given time cycle. The columns are structured as follows:

unit_number: Identifier for each engine (1–100)
time_cycles: The cycle count (i.e., usage time step) for the engine
setting_1 to setting_3: Operational settings, which may affect degradation
s_1 to s_21: Sensor readings that reflect internal physical conditions like temperature, pressure, and vibration

We start by naming the columns for better readability:

sensor_names = [f's_{i + 1}' for i in range(21)]
col_names = ['unit_number', 'time_cycles', 'setting_1', 'setting_2', 'setting_3'] + sensor_names
col_names

    ['unit_number',
     'time_cycles',
     'setting_1',
     'setting_2',
     'setting_3',
     's_1',
     's_2',
     's_3',
     's_4',
     's_5',
     's_6',
     's_7',
     's_8',
     's_9',
     's_10',
     's_11',
     's_12',
     's_13',
     's_14',
     's_15',
     's_16',
     's_17',
     's_18',
     's_19',
     's_20',
     's_21']

The objective

Our goal is to estimate how many cycles an engine can continue operating before it reaches the end of its useful life. In other words, based on its current usage (number of completed cycles) and condition (captured by sensor readings), we aim to predict the remaining useful life (RUL) of each engine.

Typically, engines degrade over time, which is reflected in factors such as increased oil temperature, decreased thrust, and so on. The acceptable threshold for degradation can vary depending on operational context and user requirements.

# loading the dataset using pandas
train_df = pd.read_csv('train_FD001.txt/train_FD001.txt',sep='\s+',header=None,index_col=False,names=col_names)
test_df = pd.read_csv('test_FD001.txt/test_FD001.txt',sep='\s+',header=None,index_col=False,names=col_names)
test_df_y = pd.read_csv('RUL_FD001.txt',sep='\s+',header=None,index_col=False,names=['RUL'])

A quick look on the training set

# Use display.max_columns None to display all the columns
with pd.option_context('display.max_columns', None):
    print(train_df)

           unit_number  time_cycles  setting_1  setting_2  setting_3     s_1  \
    0                1            1    -0.0007    -0.0004      100.0  518.67   
    1                1            2     0.0019    -0.0003      100.0  518.67   
    2                1            3    -0.0043     0.0003      100.0  518.67   
    3                1            4     0.0007     0.0000      100.0  518.67   
    4                1            5    -0.0019    -0.0002      100.0  518.67   
    ...            ...          ...        ...        ...        ...     ...   
    20626          100          196    -0.0004    -0.0003      100.0  518.67   
    20627          100          197    -0.0016    -0.0005      100.0  518.67   
    20628          100          198     0.0004     0.0000      100.0  518.67   
    20629          100          199    -0.0011     0.0003      100.0  518.67   
    20630          100          200    -0.0032    -0.0005      100.0  518.67   
    
              s_2      s_3      s_4    s_5    s_6     s_7      s_8      s_9  s_10  \
    0      641.82  1589.70  1400.60  14.62  21.61  554.36  2388.06  9046.19   1.3   
    1      642.15  1591.82  1403.14  14.62  21.61  553.75  2388.04  9044.07   1.3   
    2      642.35  1587.99  1404.20  14.62  21.61  554.26  2388.08  9052.94   1.3   
    3      642.35  1582.79  1401.87  14.62  21.61  554.45  2388.11  9049.48   1.3   
    4      642.37  1582.85  1406.22  14.62  21.61  554.00  2388.06  9055.15   1.3   
    ...       ...      ...      ...    ...    ...     ...      ...      ...   ...   
    20626  643.49  1597.98  1428.63  14.62  21.61  551.43  2388.19  9065.52   1.3   
    20627  643.54  1604.50  1433.58  14.62  21.61  550.86  2388.23  9065.11   1.3   
    20628  643.42  1602.46  1428.18  14.62  21.61  550.94  2388.24  9065.90   1.3   
    20629  643.23  1605.26  1426.53  14.62  21.61  550.68  2388.25  9073.72   1.3   
    20630  643.85  1600.38  1432.14  14.62  21.61  550.79  2388.26  9061.48   1.3   
    
            s_11    s_12     s_13     s_14    s_15  s_16  s_17  s_18   s_19  \
    0      47.47  521.66  2388.02  8138.62  8.4195  0.03   392  2388  100.0   
    1      47.49  522.28  2388.07  8131.49  8.4318  0.03   392  2388  100.0   
    2      47.27  522.42  2388.03  8133.23  8.4178  0.03   390  2388  100.0   
    3      47.13  522.86  2388.08  8133.83  8.3682  0.03   392  2388  100.0   
    4      47.28  522.19  2388.04  8133.80  8.4294  0.03   393  2388  100.0   
    ...      ...     ...      ...      ...     ...   ...   ...   ...    ...   
    20626  48.07  519.49  2388.26  8137.60  8.4956  0.03   397  2388  100.0   
    20627  48.04  519.68  2388.22  8136.50  8.5139  0.03   395  2388  100.0   
    20628  48.09  520.01  2388.24  8141.05  8.5646  0.03   398  2388  100.0   
    20629  48.39  519.67  2388.23  8139.29  8.5389  0.03   395  2388  100.0   
    20630  48.20  519.30  2388.26  8137.33  8.5036  0.03   396  2388  100.0   
    
            s_20     s_21  
    0      39.06  23.4190  
    1      39.00  23.4236  
    2      38.95  23.3442  
    3      38.88  23.3739  
    4      38.90  23.4044  
    ...      ...      ...  
    20626  38.49  22.9735  
    20627  38.30  23.1594  
    20628  38.44  22.9333  
    20629  38.29  23.0640  
    20630  38.37  23.0522  
    
    [20631 rows x 26 columns]

Looking at the dataset, it's clear that this is time series data. Each row represents a snapshot in time for a particular engine. The unit_number resets after a sequence of time_cycles, indicating data for a new engine unit.

To better understand engine lifespans, we can group the data by unit_number and look at the maximum cycle each engine reached before failure:

print(train_df.groupby('unit_number').max().reset_index())

        unit_number  time_cycles  setting_1  setting_2  setting_3     s_1     s_2  \
    0             1          192     0.0047     0.0005      100.0  518.67  644.21   
    1             2          287     0.0076     0.0006      100.0  518.67  643.94   
    2             3          179     0.0058     0.0005      100.0  518.67  643.93   
    3             4          189     0.0059     0.0006      100.0  518.67  644.53   
    4             5          269     0.0055     0.0005      100.0  518.67  644.02   
    ..          ...          ...        ...        ...        ...     ...     ...   
    95           96          336     0.0049     0.0005      100.0  518.67  644.20   
    96           97          202     0.0050     0.0006      100.0  518.67  643.97   
    97           98          156     0.0077     0.0004      100.0  518.67  644.39   
    98           99          185     0.0059     0.0005      100.0  518.67  644.10   
    99          100          200     0.0056     0.0004      100.0  518.67  643.95   
    
            s_3      s_4    s_5  ...     s_13     s_14    s_15  s_16  s_17  s_18  \
    0   1605.44  1432.52  14.62  ...  2388.35  8140.58  8.5227  0.03   398  2388   
    1   1610.10  1431.17  14.62  ...  2388.26  8175.57  8.5377  0.03   398  2388   
    2   1606.50  1438.51  14.62  ...  2388.20  8255.34  8.5363  0.03   399  2388   
    3   1612.11  1434.12  14.62  ...  2388.17  8259.42  8.5462  0.03   399  2388   
    4   1609.41  1434.59  14.62  ...  2388.23  8215.19  8.5410  0.03   398  2388   
    ..      ...      ...    ...  ...      ...      ...     ...   ...   ...   ...   
    95  1608.62  1432.65  14.62  ...  2388.28  8146.04  8.5615  0.03   398  2388   
    96  1610.66  1430.66  14.62  ...  2388.17  8270.91  8.5596  0.03   400  2388   
    97  1606.24  1432.16  14.62  ...  2388.30  8156.01  8.5308  0.03   396  2388   
    98  1616.91  1436.54  14.62  ...  2388.33  8145.61  8.5592  0.03   397  2388   
    99  1610.87  1433.58  14.62  ...  2388.28  8150.68  8.5646  0.03   398  2388   
    
         s_19   s_20     s_21  RUL  
    0   100.0  39.18  23.4999  191  
    1   100.0  39.24  23.6005  286  
    2   100.0  39.23  23.5181  178  
    3   100.0  39.21  23.5074  188  
    4   100.0  39.29  23.5503  268  
    ..    ...    ...      ...  ...  
    95  100.0  39.18  23.5344  335  
    96  100.0  39.22  23.5181  201  
    97  100.0  39.30  23.5461  155  
    98  100.0  39.20  23.4986  184  
    99  100.0  39.18  23.5751  199  
    
    [100 rows x 27 columns]

It’s not immediately obvious how each sensor reading relates to the engine’s remaining life. To explore this relationship, we’ll use unit_number == 1 as an example and visualize how sensor values evolve throughout its lifetime.

train_df_unit_1 = train_df[train_df['unit_number'] == 1]
 
rolling_windows = [10, 20, 50, 100]
 
columns_to_plot = [col for col in train_df_unit_1.columns if col != 'time_cycles']
num_plots = len(columns_to_plot)
 
fig, axes = plt.subplots(num_plots, 1, figsize=(12, 2 * num_plots), sharex=True)
 
if num_plots == 1:
    axes = [axes]
 
for i, column in enumerate(columns_to_plot):
    axes[i].plot(train_df_unit_1['time_cycles'], train_df_unit_1[column], label='Original', alpha=0.5)
    
    # Rolling average
    for rolling_window in rolling_windows:
        rolling_avg = train_df_unit_1[column].rolling(window=rolling_window, min_periods=1).mean()
        axes[i].plot(train_df_unit_1['time_cycles'], rolling_avg, label=f'Rolling Avg ({rolling_window})')
    
    axes[i].set_ylabel(column)
    axes[i].legend()
    axes[i].grid(True)
 
axes[-1].set_xlabel('Time Cycles')
plt.tight_layout()
plt.show()

png

Data Insights & Observations

Looking at the plots, we can intuitively see that some sensors change over time, while others remain static. This suggests that certain sensors correlate with the engine's age or its Remaining Useful Life (RUL). In fact, RUL is a standard metric used to determine "how old" or "how used" an engine is.

Keep in mind that RUL is a key concept in predictive maintenance and will be referenced frequently throughout this article.

As RUL is not directly provided in the dataset — we’ll need to calculate it ourselves.

Preprocessing

1. Calculating RUL

Given that column 2 (time_cycles) represents the current cycle, and the last sensor reading for each unit corresponds to the end of its life, RUL should decrease over time. We can calculate RUL by subtracting the current_cycle from the last_cycle for each engine unit. Here's how to implement this in pandas:

def add_RUL_column(df):
    train_grouped_by_unit = df.groupby(by='unit_number') 
    max_time_cycles = train_grouped_by_unit['time_cycles'].max() 
    merged = df.merge(max_time_cycles.to_frame(name='max_time_cycle'), left_on='unit_number',right_index=True)
    merged["RUL"] = merged["max_time_cycle"] - merged['time_cycles']
    merged = merged.drop("max_time_cycle", axis=1) 
    return merged
 
train_df = add_RUL_column(train_df)
print(train_df)

           unit_number  time_cycles  setting_1  setting_2  setting_3     s_1  \
    0                1            1    -0.0007    -0.0004      100.0  518.67   
    1                1            2     0.0019    -0.0003      100.0  518.67   
    2                1            3    -0.0043     0.0003      100.0  518.67   
    3                1            4     0.0007     0.0000      100.0  518.67   
    4                1            5    -0.0019    -0.0002      100.0  518.67   
    ...            ...          ...        ...        ...        ...     ...   
    20626          100          196    -0.0004    -0.0003      100.0  518.67   
    20627          100          197    -0.0016    -0.0005      100.0  518.67   
    20628          100          198     0.0004     0.0000      100.0  518.67   
    20629          100          199    -0.0011     0.0003      100.0  518.67   
    20630          100          200    -0.0032    -0.0005      100.0  518.67   
    
              s_2      s_3      s_4    s_5  ...     s_13     s_14    s_15  s_16  \
    0      641.82  1589.70  1400.60  14.62  ...  2388.02  8138.62  8.4195  0.03   
    1      642.15  1591.82  1403.14  14.62  ...  2388.07  8131.49  8.4318  0.03   
    2      642.35  1587.99  1404.20  14.62  ...  2388.03  8133.23  8.4178  0.03   
    3      642.35  1582.79  1401.87  14.62  ...  2388.08  8133.83  8.3682  0.03   
    4      642.37  1582.85  1406.22  14.62  ...  2388.04  8133.80  8.4294  0.03   
    ...       ...      ...      ...    ...  ...      ...      ...     ...   ...   
    20626  643.49  1597.98  1428.63  14.62  ...  2388.26  8137.60  8.4956  0.03   
    20627  643.54  1604.50  1433.58  14.62  ...  2388.22  8136.50  8.5139  0.03   
    20628  643.42  1602.46  1428.18  14.62  ...  2388.24  8141.05  8.5646  0.03   
    20629  643.23  1605.26  1426.53  14.62  ...  2388.23  8139.29  8.5389  0.03   
    20630  643.85  1600.38  1432.14  14.62  ...  2388.26  8137.33  8.5036  0.03   
    
           s_17  s_18   s_19   s_20     s_21  RUL  
    0       392  2388  100.0  39.06  23.4190  191  
    1       392  2388  100.0  39.00  23.4236  190  
    2       390  2388  100.0  38.95  23.3442  189  
    3       392  2388  100.0  38.88  23.3739  188  
    4       393  2388  100.0  38.90  23.4044  187  
    ...     ...   ...    ...    ...      ...  ...  
    20626   397  2388  100.0  38.49  22.9735    4  
    20627   395  2388  100.0  38.30  23.1594    3  
    20628   398  2388  100.0  38.44  22.9333    2  
    20629   395  2388  100.0  38.29  23.0640    1  
    20630   396  2388  100.0  38.37  23.0522    0  
    
    [20631 rows x 27 columns]

As expected, RUL decreases as time_cycle increases—a logical trend. Our ultimate goal with this dataset is to build a model that can predict RUL. The test_df provides partial engine run data, including cycle counts and sensor readings taken sometime before the end of each engine's life. Our task is to use this information to accurately estimate how many cycles are left before failure occurs.

2. Filtering

Based on the earlier plots, sensors s_1, s_5, s_6, s_10, s_16, s_18, and s_19 appear static — they don’t change over time and likely offer little predictive value. We'll drop them, along with the unit_number, which is no longer needed.

static_cols = ["s_1", "s_5", "s_6", "s_10", "s_16", "s_18", "s_19"]
train_df_x_y = train_df.drop(columns=["unit_number"] + static_cols).copy()
test_df_x = test_df.groupby('unit_number').last().reset_index().drop(columns=["unit_number"] + static_cols)

Preparing the dataset

1. Splitting into Training, Validation, and Testing Sets

Now let's prepare our dataset. First, we separate the features (X) and the target (y) from the training set. Then, we split the data into a 70:30 ratio for training and validation.

# Separate features (X) and target (y)
train_df_x = train_df_x_y.drop("RUL", axis=1)  # Drop column, not row
train_df_y = train_df_x_y[["RUL"]]             # Double brackets to keep DataFrame format
 
# Split into training and validation sets
train_df_x, val_df_x, train_df_y, val_df_y = train_test_split(
    train_df_x, train_df_y, test_size=0.3, random_state=42
)
 
# Note: test_df_x and test_df_y are already prepared

2. Scaling

Scaling is an important step in machine learning. It helps models learn more effectively by reducing the numerical distances between features, which can otherwise lead to instability or numerical overflow. Moreover, scaling prevents features with larger ranges from dominating the learning process, reducing bias in the model.

from sklearn.preprocessing import StandardScaler
 
scaler = StandardScaler()
 
train_df_x_scaled = scaler.fit_transform(train_df_x)  # Important: fit only on the training set
val_df_x_scaled = scaler.transform(val_df_x)          # Use the same fitted scaler on validation and test sets
test_df_x_scaled = scaler.transform(test_df_x)

Now that the dataset is preprocessed, we’re ready to build some prediction models!

Modeling

This is a multivariate regression problem. Our input features (x1, x2, ..., xi) are the sensor readings, and the target (y) is the RUL. There are many machine learning techniques to solve this type of problem. In this article, we’ll explore and compare three popular models:

SVM (Support Vector Machine)
RF (Random Forest)
GB (Gradient Boosting)

1. Model Initialization

from sklearn.svm import SVR
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor

# Instantiate model objects
SVR_model = SVR(kernel='rbf')
RF_model = RandomForestRegressor(n_estimators=100, random_state=42)
GBR_model = GradientBoostingRegressor(random_state=42)

2. Model Fitting

Fit each model using the exact same train set

SVR_model.fit(train_df_x_scaled, train_df_y)
RF_model.fit(train_df_x_scaled, train_df_y)
GBR_model.fit(train_df_x_scaled, train_df_y)

3. Model Evaluation

datasets = {
    'Train': [train_df_x_scaled, train_df_y],
    'Validation': [val_df_x_scaled, val_df_y],
    'Test': [test_df_x_scaled, test_df_y]
}
 
models = {
    "SVR_model": SVR_model,
    "RF_model": RF_model,
    "GBR_model": GBR_model
}
 
evaluation = {f"{dataset} {metric}":[] for dataset in datasets for metric in ["RMSE", "R²"]}
 
def predict_and_evaluate(model, x_set, y_set):
    y_hat = model.predict(x_set)
    y = y_set.values
    
    rmse = np.sqrt(mean_squared_error(y, y_hat))
    r2 = r2_score(y, y_hat)
 
    return y_hat, rmse, r2
 
for model in models:
    for dataset in datasets:
        x_set, y_set = datasets[dataset]
        _, rmse, r2 = predict_and_evaluate(models[model], x_set, y_set)
 
        evaluation[f"{dataset} RMSE"].append(rmse)
        evaluation[f"{dataset} R²"].append(r2)
 
print(pd.DataFrame(evaluation, index=models))

               Train RMSE  Train R²  Validation RMSE  Validation R²  Test RMSE  \
    SVR_model   38.812309  0.687080        37.872105       0.686860  24.034788   
    RF_model    13.597595  0.961592        35.788206       0.720373  26.170727   
    GBR_model   34.800525  0.748426        35.491615       0.724989  25.533728   
    
                Test R²  
    SVR_model  0.665481  
    RF_model   0.603382  
    GBR_model  0.622455

Based on these results, the SVR_model performs best in terms of Test RMSE and R², and it generalizes well. In contrast, the Random Forest model appears to overfit the training and validation sets.

With this simple setup, we were able to predict the RUL with an RMSE of 24.03 and R² of 0.66. While these results aren't particularly high, they’re still usable as a baseline. Of course, there are many ways to improve the model's performance — but for this article, this serves as a solid starting point.

Implementation and use cases

So far, we’ve built a reasonably performing model. Next, let’s compare its effectiveness against a preventive maintenance approach.

Currently, we have test_df_y, which contains the ground-truth RUL values. To make a meaningful comparison, we also need to calculate the actual time each engine would fail — i.e., the ground-truth time_cycles at the point of failure.

test_set_time_cycles = test_df.groupby('unit_number').max().reset_index()['time_cycles']
test_set_end_of_cycle = test_set_time_cycles + test_df_y['RUL']
test_set_end_of_cycle.head()

    0    143
    1    147
    2    195
    3    188
    4    189
    dtype: int64

Preventive maintenance: The naive approach

1. Average End-of-Life Cycle

In this approach, we assume that all engines are identical and should be maintained based on a fixed threshold. Normally, this threshold would be defined by manufacturer guidelines. However, since the dataset is simulated, no such policy is provided.

As a substitute, we can use the average end-of-life cycle (i.e., the average of each engine's final time cycle in the training set) as a reasonable threshold.

average_test_end_of_cycle = train_df.groupby('unit_number').max().reset_index()['time_cycles'].mean()
average_test_end_of_cycle

    206.31

def calculate_maintenance(true_end_of_cycle, estimated_end_of_cycle):
    error = true_end_of_cycle - estimated_end_of_cycle
    overestimation = (error < 0).astype(int)
    life_cycle_unsused = error.apply(lambda x: x if x > 0 else 0)
 
    print(f"Number of overestimate: {overestimation.sum()} units")
    print(f"Sum of unsued life cycle: {life_cycle_unsused.sum()} cycles")

calculate_maintenance(test_set_end_of_cycle, average_test_end_of_cycle)

    Number of overestimate: 57 units
    Sum of unsued life cycle: 1713.67 cycles

Using the average as a maintenance threshold leads to 57 engines being overestimated — meaning they fail before maintenance is performed. In real-world scenarios, these failures could lead to serious operational disruptions or even safety hazards.

To mitigate this risk, a more conservative threshold can be used: the minimum end-of-life cycle observed.

2. Minimum End-of-Life Cycle

min_test_end_of_cycle = train_df.groupby('unit_number').max().reset_index()['time_cycles'].min()
min_test_end_of_cycle

To get a better understanding of the impact, let's update our calculate_maintenance function and visualize it!

from matplotlib.patches import Patch
from matplotlib.lines import Line2D
 
def calculate_maintenance(true_end_of_cycle, estimated_end_of_cycle, plot_title=''):
    error = true_end_of_cycle - estimated_end_of_cycle
    overestimation = (error < 0).astype(int)
    life_cycle_unused = error.apply(lambda x: x if x > 0 else 0)
 
    fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(10, 4), gridspec_kw={'width_ratios': (40, 1, 1)})
 
    x = list(range(1, len(true_end_of_cycle) + 1))
    if type(estimated_end_of_cycle) != pd.Series:
        estimated_end_of_cycle = [estimated_end_of_cycle] * len(x)
    
    for i in range(len(estimated_end_of_cycle)):
        d = true_end_of_cycle[i]
        t = estimated_end_of_cycle[i]
 
        if d < t:
            ax1.bar(x[i], d, color='orange')
        elif d > t:
            ax1.bar(x[i], t, color='skyblue')
            ax1.bar(x[i], d - t, bottom=t, color='lightgray')
        else:
            ax1.bar(x[i], d, color='skyblue', label='Data')
 
    for i, y in enumerate(estimated_end_of_cycle):
        i = i + 1
        ax1.hlines(y, i - 0.4, i + 0.4, colors='red', linestyles='--', linewidth=2)
 
    legend_items = [
        Patch(facecolor=color, label=label)
        for label, color in [('Overestimate', 'orange'), 
                             ('Used life cycle', 'skyblue'), 
                             ('Unused life cycle', 'lightgray')]
    ]
    legend_items.append(Line2D([0], [0], color='red', linestyle='--', linewidth=2, label='Time of maintenance'))
 
    ax1.set_xlabel('Unit number')
    ax1.set_ylabel('RUL')
    ax1.set_title(plot_title)
    ax1.grid(True, axis='y')
    ax1.legend(handles=legend_items, loc='upper center', ncol=4)
    
    def plot_summary_bar(ax, value, color, ylim, ylabel):
        ax.bar(1, value, color=color)
        ax.set_xticklabels([])
        ax.tick_params(axis='x', bottom=False)
        ax.set_yticks([value])
        ax.set_yticklabels([])
        ax.set_ylim(0, ylim)
        ax.set_ylabel(ylabel)
    
    unused_sum = life_cycle_unused.sum()
    plot_summary_bar(ax=ax2, value=unused_sum, color='lightgray', ylim=7848,
        ylabel=f'Total Unused Life Cycle: {unused_sum:.2f}'
    )
    
    overestimation_sum = overestimation.sum()
    plot_summary_bar(ax=ax3, value=overestimation_sum, color='orange', ylim=100,
        ylabel=f'Number of Overestimated Units: {overestimation_sum}'
    )
            
    plt.tight_layout()
    plt.show()

calculate_maintenance(test_set_end_of_cycle, min_test_end_of_cycle, 'Maintenance at Minimum Time Cycle')

png

This conservative strategy avoids all failures, ensuring that engines are serviced before their end-of-life. However, this comes at a cost — 7848 total life cycles are wasted due to premature maintenance.

This leads us to a crucial question:

Can we optimize the maintenance timing—extending or postponing it slightly—to make better use of the remaining life cycles and reduce costs, while still avoiding unexpected failures?

Absolutely — and that’s where predictive maintenance shines.

Instead of using a static threshold, predictive maintenance uses model-based predictions to dynamically determine the remaining life for each engine. This enables customized, per-engine thresholds, leading to smarter, safer, and more cost-efficient maintenance schedules.

Predictive Maintenance Approach

The first step is to create a function that calculates a variable end-of-life cycle using the model's predictions:

svr_RUL_prediction, _, _ = predict_and_evaluate(SVR_model, test_df_x_scaled, test_df_y)

def generate_model_EOC(prediction):
    model_end_of_cycle = prediction + test_set_time_cycles
    return model_end_of_cycle

svr_end_of_cycle = generate_model_EOC(svr_RUL_prediction)
svr_end_of_cycle.head()

    0    198.762791
    1    194.935590
    2    198.451921
    3    194.425444
    4    203.173133
    Name: time_cycles, dtype: float64

Unlike previous approaches where the end-of-life cycles were the same for all engines, svr_end_of_cycle provides individualized estimates, accounting for each engine's condition. Let's visualize the result:

calculate_maintenance(test_set_end_of_cycle, svr_end_of_cycle, 'Maintenance Based on SVR')

png

This approach successfully reduced the unused life cycles to just 551.66 cycles, a significant improvement over the previous methods. However, it resulted in 62 engines being overestimated, which is still a safety concern. Unfortunately, this is not what we want, yet.

1. SVR Adjusted by Average

To reduce the risk of overestimating engine RUL, we can introduce a correction constant. This constant adjusts predictions downward based on the average prediction error, calculated only from overestimated cases:

def generate_model_EOC_adj(end_of_cycle, prediction):
 
    model_end_of_cycle = prediction + test_set_time_cycles
    error = end_of_cycle - model_end_of_cycle
    c = error[error < 0].mean()
 
    adj_model_end_of_cycle = model_end_of_cycle + c
    
    return adj_model_end_of_cycle

svr_end_of_cycle_adj = generate_model_EOC_adj(test_set_end_of_cycle, svr_RUL_prediction)
svr_end_of_cycle_adj.head()

    0    179.172482
    1    175.345281
    2    178.861612
    3    174.835135
    4    183.582824
    Name: time_cycles, dtype: float64

calculate_maintenance(test_set_end_of_cycle, svr_end_of_cycle_adj, 'Maintenance Based on SVR Adjusted by Average')

png

With the average correction applied, the number of overestimated engines decreases. If we would like to be even more conservative, we can adjust the predictions using the minimum observed error instead.

2. SVR Adjusted by Minimum

Using the same adjusting concept, we slightly modify our previous function as follows:

def generate_model_EOC_adj(end_of_cycle, prediction, mode="min"):
    model_end_of_cycle = prediction + test_set_time_cycles
    error = end_of_cycle - model_end_of_cycle
    c = error[error < 0].min() if mode == "min" else error[error < 0].mean()
 
    adj_model_end_of_cycle = model_end_of_cycle + c
    return adj_model_end_of_cycle

svr_end_of_cycle_adj_min = generate_model_EOC_adj(test_set_end_of_cycle, svr_RUL_prediction, mode="min")
svr_end_of_cycle_adj_min.head()

    0    133.381686
    1    129.554486
    2    133.070816
    3    129.044339
    4    137.792028
    Name: time_cycles, dtype: float64

Some predictions drop below the min_test_end_of_cycle, so let’s clip them:

svr_end_of_cycle_adj_min = svr_end_of_cycle_adj_min.clip(lower=min_test_end_of_cycle)

calculate_maintenance(test_set_end_of_cycle, svr_end_of_cycle_adj_min, 'Maintenance Based on SVR Adjusted by Minimum')

png

Cost-Benefit Analysis: Predictive vs. Preventive Maintenance

Now that we’ve reduced the number of overestimated engines to zero and reclaimed 2,329.22 unused life cycles, let’s quantify this in terms of real-world savings.

Maintenance Cost Assumption

Let’s assume:

The cost of a single engine maintenance is $9,000
Manufacturers typically perform preventive maintenance every 128 cycles
(this was the minimum observed end-of-life cycle across all engines)

This implies a cost per engine cycle:

cost_per_cycle = 9000 / min_test_end_of_cycle
f"{cost_per_cycle} USD"

70.3125 USD

So, every unused cycle costs approximately $70.31 in wasted maintenance potential.

Savings from Predictive Maintenance

From earlier, we found that:

The naive preventive maintenance strategy led to 7,848 unused life cycles
Our optimized predictive approach reduced that to 5,518.78 cycles
That’s a reduction of 2,329.22 cycles, which would otherwise have been wasted

Let’s calculate the cost savings:

saved_cost = 2329.22 * cost_per_cycle
f"{saved_cost} USD"

163773.28125 USD

💡 By switching to predictive maintenance, we saved approximately $163,773, just from better timing alone—without requiring more maintenance or hardware changes.

Summary

Strategy	Unused Life Cycles	Estimated Failures	Total Maintenance Cost Impact
Naive (Minimum Policy)	7,848	0	Baseline ($0 saved)
Predictive (Adjusted)	5,518.78	0	$163,773 saved

This example clearly demonstrates how AI-driven predictive maintenance not only preserves equipment health but also delivers significant financial impact—a compelling case for its adoption across asset-heavy industries like aviation, manufacturing, or energy.

Closing Thoughts

In today's competitive and asset-intensive industries, efficiency, reliability, and cost optimization are non-negotiable. Traditional maintenance strategies—especially preventive maintenance based on fixed schedules—often lead to a tradeoff between unexpected breakdowns and premature servicing. Both scenarios come with significant operational and financial costs.

This is where predictive maintenance stands out as a strategic advantage. By harnessing historical sensor data and machine learning models, we can accurately estimate an asset's remaining useful life (RUL) and make informed, real-time decisions on when to service equipment. As demonstrated in our analysis, predictive maintenance significantly reduces unnecessary maintenance while still preventing operational failures—saving thousands of dollars per fleet with every optimization cycle.

More importantly, predictive maintenance isn't just about cost savings. It empowers businesses with:

Higher equipment uptime and availability
Extended asset lifecycle and ROI
Increased safety through proactive failure prevention
Smarter resource allocation and reduced waste

As the industrial landscape continues to evolve with Industry 4.0 and digital transformation, predictive maintenance will be a cornerstone of intelligent operations. Businesses that invest in these capabilities now are better positioned to unlock long-term value, resilience, and operational excellence.

In essence, predictive maintenance is not just a technical upgrade—it's a business imperative.

Unlocking Efficiency: The Importance of Predictive Maintenance

Read More

AI for Rail Transportation

Data Science for Digital Advertising

On this page

The latest in AI and Enterprise Analytics

Supertype | Industry-Leading AI Consultancy