Instant Customer Data Analysis using Excel: worked example
One of the most useful aspects of Microsoft Excel is its ability to quickly slice and dice customer data from live systems to identify important trends and behaviors which can inform strategy. In this article I will share with you a simple 7-step plan based on a fully worked example of Customer Data Analysis using Excel.
The example I will use is a retail banking customer dataset however the same principles and techniques apply equally well to any live customer dataset for any business sector.
To master this approach you simply need to understand the power of 3 excel functions - IF, COUNTIF (single criteria) and COUNTIFS (multiple criteria) along with their cousins SUMIF and AVERAGEIF.
For a copy of the worked example excel spreadsheet containing all the formula please email me.
My example spreadsheet, which would be extracted from your live customer system, has one row per customer containing just the following fields:
- Customer Reference
- Interest Rate
- Date A/C Opened
The objective of the analysis is to understand this data to see if we can spot any useful business insights which could inform our strategy on customer acquisition and retention.
Step 1 - Identify some questions you want answers to
Before you start any data analysis it's always a good idea to identify a number of questions you want answers to. At a minimum you need to answer these. However during the analysis the results will also suggest other questions you can answer which you won't have anticipated!
Step 2 - Add the required Categorisation Columns
The problem with numeric fields such as balances or interest rates is that all we can really do with them in their raw state is is to total them or average them. To go further we need to group them into bands and categories.
In the example I have created a new column (F) in the spreadsheet called "Balance Band" to group customers by balance by simply using the excel IF function as follows:
=IF(D2>1000000, "1M+", ( IF(D2>500000, "500K-1M",(IF(D2>250000, "250-500K",(IF(D2>100000, "100-250K",(IF(D2>50000, "50-100K",(IF(D2>10000, "10-50K", "1-10K")))))))))))
A similar problem arises with dates so I have created a new column (G) in the spreadsheet "Account Age" (in months) using the excel DATEDIF function.
Step 3 - Perform the Basic Analysis
Now we are ready to begin the basic analysis of the data - first by simply segmenting the data by a single field such as Region or Balance Band. I do this using the excel COUNTIF, SUMIF and AVERAGEIF functions. It's important to do this first and to check that the totals (by rows and values) are correct before we move on to the move advanced analysis. It's a simple job then to graph the data using the excellent excel chart facilities.
For example, we can count the number of customers in North Region in Cell B3 with the formula 'COUNTIF(LIVE!$B$2:$B$101, $A3)' which uses the COUNTIF function in excel.
In our example the basic analysis (FIGURE 1 below) indicates that South Region accounts for 17% of customers by number but 30% by value and has an average customer balance 2-3 times greater than the other regions. It looks like South is unique in the fact that it is dealing with a small number of very valuable customers?
FIGURE 1 - Basic Analysis
Step 4 - Perform the Cross-Reference Analysis
It's unlikely the basic analysis will tell us very much we do not already know but we might be surprised. However when we start to do the cross reference analysis new insights will emerge.
To achieve this we need to use an extended version of the COUNTIF, SUMIF and AVERAGEIF functions imaginatively named COUNTIFS, SUMIFS and AVERAGEIFS which allow us to supply multiple criteria. The simpler versions of the function are single criteria.
In our example the cross-reference analysis (FIGURE 2 below) shows that 56% of the total balances sit with just 7% the customers whose balances are greater above £250K but we are paying them less interest than the lower balance customers. Why is this? Does it make sense? Are we creating a problem? Again we repeat this kind of cross-reference analysis for all the fields in the data.
FIGURE 2 - Cross-Reference Analysis
Step 5 - Perform the Trend Analysis over Time
We now extend our analysis to include the time dimension to see how things change over time. For example, are the numbers and values of accounts opened growing over time or decreasing or static. How are the Regions doing against each other over time. Are a disproportionate amount of high-balance customers closing their accounts in the last 3 months. You get the idea.
For example we can calculate the average balance for each month for each region in Cell F473 with the formula 'SUMIFS(LIVE!$D$2:$D$101, LIVE!$G$2:$G$101,$A47, LIVE!$B$2:$B$101,F$46)/1000' which uses the SUMIFS function in excel to sum across multiple criteria, in this case using "age of account" and "customer region".
In our example the Trend Analysis (FIGURE 3 below) shows that at an overall level the business is recovering well from a major problem. However when you look at this regionally it is clear that South Region's business has in fact totally collapsed but this has been compensated for by strong growth in East Region.
FIGURE 3 - Trend Analysis
Step 6 - Check your assumptions about the data
It's important never to do data analysis in a vacuum. First you need to check what assumptions you have made about the underlying data and confirm these with a person (e.g. in IT) who understands in detail the behaviour of the data in the live system which has been provided to you. For example, do closed accounts stay on the system forever or are they archived after 12 months? The answer to this will affect your results on account closure trends.
Step 7 - Review the Data Insights with Operational Staff
All you really have so far is a set of potential insights some of which may have real value to the business and others may be trivial, wrong or can be explained in other ways. You need to share these insights and the data you based them on with staff who are intimate with the operational details of the processes and you need to be open to have your conclusions challenged, destroyed, revised, improved or confirmed.
Finally ...Some Performance Tips
In my experience you can easily handle large data sets (up to 100,000 records) using these techniques. Once you go above 10,000 rows you should set formula calculation to "manual" rather than automatic to maximise spreadsheet performance. Above 50,000 rows you can make a copy of your spreadsheet with all your categorisation calculations converted to values (COPY, PASTE SPECIAL (VALUES)).
Above 100,000 records, if you need to, you can either create multiple spreadsheets with a consolidation sheet or else you can use the excel RAND function to extract a random subset (say 25%) of the data to analyse.