Introduction to RFM Analysis in R

Using RFM for customer segmentation, and gaining insights into your customer base through RFM visualizations

Andina Septia, Unit 5

The case for understanding your customers

As a Marketing Strategist, it is crucial to understand your customers’ behavior in order to optimize your spending on marketing ads. You probably don’t want to exhaust all your marketing budget on customers who already are ‘Champions’ of your brands, a term marketers assigned to customers with a history of high purchase frequency, recency or high-value purchases.

Instead, you may like to allocate your budget to customers deemed ‘at-risk’ of churning — maybe former e-commerce visitors who have been lurking around but haven’t made their first purchase yet; or, for those with purchase history, haven’t visited your online store (or mobile commerce, or mobile games) for more than a few weeks now.

With R, there’s a package called rfm to help Marketers segmenting their customer based on their Recency, Frequency, and Monetary, the very abbreviation of RFM. Each of them containing scores where every customer’s transaction would be examined based on:

how recently a customer has purchased (Recency)
how often they purchase (Frequency)
how much the customer spends (Monetary)

With their RFM score examined, we can recognize which of the customers who are prone to be responsive to promotion, which one who are satisfied with our listings and could be waiting for another promotion, and also, which one are the top spenders and which one are the low-value purchasers. This way, we can effectively target different campaign objectives to our customers according to their segment.

To calculate the RFM score for each customer, we need the raw transaction data (transaction logs with timestamp), from which we will derive the following:

a unique id (could be email address, a customer id or anything that we can identify our customers by)
number of transactions (can also be: orders / in-app microtransactions
total revenue by customer
number of days (or any unit of time that is reasonable) since the customer’s last purchase

Pre-processing on raw transaction data

I will be using a sample transaction set from an e-commerce client and will include sample R code for the process I’ve taken.

datalzd <- read.csv("data/orderlist.csv")
# function to calculate missing values
pMiss <- function(x){sum(is.na(x))/length(x)*100}
apply(datalzd, 2, pMiss)

The code above read our dataset in and then compute the number of missing values by-column.

Any columns with a missing value of more than 75% will be removed for the purpose of this analysis; All ID columns that can be traced to a customer (except Customer Name) will also be removed. To help in the preprocessing and cleaning process, we’ve also brought in the dplyr package as well as some other helper functions from the tidyverse .

# select columns we want to work with
dataclean <- datalzd %>% select(c(
  Order_Number, Customer_Name, Item_Name, Unit_Price, 
  Payment_Method, Created_at, Updated_at, Paid_Price, 
  Shipping_City, Shipping_Postcode, Shipping_Country))

# date conversion
analysis_date <- lubridate::as_date("2020-08-31", tz = "UTC")

df_rfm <- dataclean %>% mutate(
  Customer = as.factor(Customer_Name),
  Recency_days = (analysis_date - date)
) %>% 
  group_by(Customer, Order_Number, Recency_days) %>% 
  summarise(Freq_order = n(),
            Total_revenue = sum(Paid_Price)) %>% 
  ungroup()

In the code above, we extract the columns we want to keep, and then perform a few type conversion (character to factor levels, date strings to date) before computing the frequency and sum of revenue grouped by customer, orders and the recency dimensions.

RFM Analysis in Practice

Once the data is cleaned, we proceed to the preparation steps required using the rfm package. This package comes with several functions to accommodate the computation of RFM scores, as well as some utility functions to visualize the relationship between each RFM dimensions.

When used together, these functions allow us to get a visual idea of how our customer segments look like using the RFM table.

To calculate the RFM Score, we simply use the rfm_table_customer() function passing in the following parameters:

data: the dataframe we’ve prepared above with
- unique customer id
- date of transaction
- and amount
customer_id: name of the customer id column
order_date: name of the transaction date column
revenue: name of the transaction amount column
analysis_date: date of analysis
recency_bins: number of rankings for recency score (default is 5)
frequency_bins: number of rankings for frequency score (default is 5)
monetary_bins: number of rankings for monetary score (default is 5)

This is how the code looks like in R:

rfm_score <- rfm_table_customer(data = df_rfm,customer_id = Customer,
  n_transactions =Freq_order,recency_days = Recency_days,
  total_revenue = Total_revenue, analysis_date = analysis_date,
  recency_bins = 4,frequency_bins =4, monetary_bins = 4)

We can see each of our customer is now annotated with the rfm_score. I set the bin to 4 to simplify the quantile so the result is much more readable. Before we continue with assigning each of our RFM score into a more readable segment, let’s try to get some quick insights from the plotting function that our rfm package provides.

RFM Visualization

We observe from the plot above that close to 95% of our orders are from customers who have a total purchase frequency of less than 5 times (customers sorted by number of orders).

From our RFM Heat Map, there are 4 quantiles of Frequency, but only two of them are filled with data. That means, the customers’ frequency (of orders) is weighted into two extreme quantiles which are 25% on top, and 25% on the bottom. From the previous bar plot, we can see why that happened: there is no middle ground. The Monetary score isn’t shown in this plot, but rather, the mean value of it is expressed in color gradients.

Customer Segmentation Visualization

This is a similar plot to the one we have above (Frequency on one axis, Recency on the other) but with the Monetary score also visualized. We can see how the Monetary dimension relates to the other two dimensions, and perhaps also assign them labels for each bins (“bars”) so they are more readily understood by others in the marketing team.

How you label these different segments of customers may depend on industry conventions, or may be very specific to your customer loyalty plans (i.e how you plan to activate or re-engage your customers), or the features of your main products.

Consider a company that sells high-end electronics or household appliances; You may have a distribution that skew to high Monetary value due to the average unit price of your products. You will likely also have a Recency score that, compared to the FMCG industry (fast-moving consumer goods, like beers and dairy milk), are skewed to an extreme end because household appliances don’t tend to expire or have short shelf-life.

These are considerations worth pointing out when you flesh out your customer segmentation plan, and they take more deliberation and consideration than blindly applying RFM as a one-size-fits-all tool.

To finish off this introduction to RFM analysis, I’ve visualized our final customer segmentation into simple bar plots. A simple bar plot provides the clarity that allows the rest of the decision-making team to quickly identify revenue bottlenecks, growth opportunities and create tactical plans tailored for each customer segment.

Analytics Products & Services

Data Science by Applications

Bespoke solution for enterprises

Advisory & Consulting

Portfolio & highlights

Data Engineering

Data Science

Full-Cycle Data Science Consultancy

Data Science & Analytics Consulting