This is the fifth in our case study series, showcasing some of the ways we have been using R at Barnett Waddingham.

It’s highly unlikely that anyone at a dinner party will want to be regaled with the woes of fitting a distribution to a set of data. “Why don’t you just use an off-the-shelf tool” or “I don’t even know where to start” begins (and likely ends) the line of conversation to the poor soul trapped next to you.

Our analysis looking at off-the-shelf tools identified that you would have to pay a licence fee for a tool that would not do quite what you wanted or did not contain some of the more nuanced distributions required for insurance work.

Fortunately, at Barnett Waddingham we have developed a tool that makes selecting the right distribution, and fitting it to the data, a straightforward task.

Developing the distribution fitting tool

Distributions are a powerful tool when it comes to data analysis and modelling, and therefore require close attention when choosing and calibrating. The tool was originally developed to support our model validation work, but its uses are not only confined to validation – it is just as useful in the investigation and creation stages of a model as it is in validating results.

We chose to build the tool in R and not Excel to capitalise on R’s ability to handle large data sets and complex, non-vanilla distributions. We appreciate that R may not be familiar to everyone, but the distribution tool’s user-friendly interface gives access to the heavy-lifting powers of R without the need to understand or edit the underlying code (unless, of course, you wanted to).

The result is a tool which helps alleviate some of the indecision and apprehension which might otherwise accompany the distribution fitting process.

Using the distribution fitting tool

The user inputs a data set into the tool and chooses the distributions they would like to investigate. The distributions available range from the vanilla ones, like the Normal and Student’s t-distribution, to the unfamiliar and exotic, like the exponential generalised beta type 2 and variance-gamma.

The tool produces diagnostic plots, density graphs, and test metrics and statistics to combine a visual representation of each distribution’s fit to the data with a quantitative measurement of goodness-of-fit. These plots and test statistics have been chosen to help create a structured approach to the selection process.

Although data analysis has already completely revolutionised our everyday retail journeys, in the workplace many key decisions are still based purely on anecdotal evidence or instinct alone. Slowly but surely, the juggernaut is turning and analytics is on the rise in the workplace. Indeed, we are heading towards a new destination where Employer DNA will deliver sustainable, robust and innovative strategies.

"We are heading towards a new destination where Employer DNA will deliver sustainable, robust and innovative strategies."

I started by saying that “DNA is the very material that defines our uniqueness – the very substance that carries the information we need to survive and to thrive.” The same is true for Employer DNA. As you work your way up the rungs of the data analytics ladder, the closer you get to the top, the more you’ll realise that your Employer DNA really does contains the insights you need to both survive and to thrive. 

We recently used the tool to validate a set of distributions a client was using within their internal model. Starting from the same raw data, the tool was able to match distribution parameterisations and test metrics, and provide an extra layer of assurance that the distribution selected for the model was both optimal and correctly calibrated.

Benefits of the distribution fitting tool

Fitting a distribution is both an art and a science. The distribution fitting tool helps to codify and standardise some of the science so that the user can focus on the art.

Our distribution tool helps narrow down the wide universe of distributions available and takes some of the guesswork out of selecting the optimal distribution for the data. Time once spent researching distributions, parameterising them and choosing which test statistics and metrics should be used in the selection process can now be saved. Instead, time can be spent on making certain the distribution selected is the optimal fit for the data.

And, who knows, it may even help turn the tide on your next mealtime conversation. After all, everybody can appreciate a good bit of slick new tech.

How can we help you?

We want to help you become skilled fitters of distributions. If you or your team would like to know more about our automated fitting tool, please contact Amit Lad, and Ian Turner. We will be happy to provide additional information and arrange a demonstration.

To stay up to date with the latest independent commentary and exclusive insights - tailored to your preference - click here.

Applications of R in insurance: clustering

In our fourth case study of the series we show how we have used R to improve the implementation of clustering algorithms.

Read our case study

Applications of R in insurance: Tripartite Template asset data

See how we have we used R to improve how we process Tripartite Template (TPT) asset data. Read our case study.

Read our case study