# 1 Model

Loss curves are a standard actuarial technique for helping insurance companies assess the amount of reserve capital they need to keep on hand to cover claims from a line of business. Claims made and reported for a given accounting period are tracked seperately over time. This enables the use of historical patterns of claim development to predict expected total claims for newer policies.

In insurance, depending on the types of risks, it can take many years for an insurer to learn the amount of liability incurred on policies written during any particular year. So, at a particular point in time after the policy is written some claims may not reported or known about by then, or some claims are still working through the legal system so the final amount due is not determined.

Total claim amounts from a simple accounting period are laid out in a single row of a table, each column showing the total claim amount after that period of time. Subsequent accounting periods have less development, so the data takes a triangular shape - hence the term ‘loss triangles’. Using previous patterns, data in the upper part of the triangle is used to predict values in the unknown lower triangle, giving the insurer a probabilistic forecast of the ultimate claim amounts to be paid for all business written.

The ChainLadder package provides functionality to generate and use these loss triangles.

In this case study, we take a related but different approach: we model the growth of the losses in each accounting period as an increasing function of time, and use the model to estimate the parameters which determine the shape and form of this growth. We also use the sampler to estimate the values of the “ultimate loss ratio”, i.e. the ratio of the total claims on an accounting period to the total premium received to write those policies. We treat each accounting period as a cohort.

## 1.1 Overview

We will work with two different functional forms for the growth behaviour of the loss curves: a ‘Weibull’ model and a ‘loglogistic’ model:

\begin{align*} g(t \, ; \, \theta, \omega) &= \frac{t^\omega}{t^\omega + \theta^\omega} & (\text{Weibull}) \\ g(t \, ; \, \theta, \omega) &= 1 - \exp\left(-\left(\frac{t}{\theta}\right)^\omega\right) & (\text{Log-logistic}) \end{align*}

# 2 Load Data

We load the Schedule P loss data from casact.org.

### File was downloaded from http://www.casact.org/research/reserve_data/ppauto_pos.csv
data_files <- dir("data/", pattern = "\\.csv", full.names = TRUE)

data_cols <- cols(GRCODE = col_character())

rawdata_tbl <- data_files %>%
map(read_claim_datafile, col_type = data_cols) %>%
bind_rows

glimpse(rawdata_tbl)
## Observations: 77,900
## Variables: 14
## $grcode <chr> "266", "266", "266", "266", "266", "266", "266", "2... ##$ grname          <chr> "Public Underwriters Grp", "Public Underwriters Grp...
## $accidentyear <int> 1988, 1988, 1988, 1988, 1988, 1988, 1988, 1988, 198... ##$ developmentyear <int> 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 199...
## $developmentlag <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7,... ##$ incurloss       <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 22, 24, 21, 24, 25, 2...
## $cumpaidloss <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 20, 21, 23, 24, 24... ##$ bulkloss        <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 1, 0, 0, 0, 0, 0, ...
## $earnedpremdir <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25, 25, 25, 25, 25, 2... ##$ earnedpremceded <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $earnedpremnet <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25, 25, 25, 25, 25, 2... ##$ single          <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $postedreserve97 <int> 932, 932, 932, 932, 932, 932, 932, 932, 932, 932, 9... ##$ lob             <chr> "comauto", "comauto", "comauto", "comauto", "comaut...
claimdata_tbl <- rawdata_tbl %>%
mutate(acc_year   = as.character(accidentyear)
,dev_year   = developmentyear
,dev_lag    = developmentlag
,cum_loss   = cumpaidloss
,loss_ratio = cum_loss / premium) %>%
select(grcode, grname, lob, acc_year, dev_year, dev_lag, premium, cum_loss, loss_ratio)

With the data in the format we will use in this analysis, we take a look at it in tabular form:

print(claimdata_tbl)
## # A tibble: 77,900 x 9
##    grcode                  grname     lob acc_year dev_year dev_lag premium
##     <chr>                   <chr>   <chr>    <chr>    <int>   <int>   <int>
##  1    266 Public Underwriters Grp comauto     1988     1988       1       0
##  2    266 Public Underwriters Grp comauto     1988     1989       2       0
##  3    266 Public Underwriters Grp comauto     1988     1990       3       0
##  4    266 Public Underwriters Grp comauto     1988     1991       4       0
##  5    266 Public Underwriters Grp comauto     1988     1992       5       0
##  6    266 Public Underwriters Grp comauto     1988     1993       6       0
##  7    266 Public Underwriters Grp comauto     1988     1994       7       0
##  8    266 Public Underwriters Grp comauto     1988     1995       8       0
##  9    266 Public Underwriters Grp comauto     1988     1996       9       0
## 10    266 Public Underwriters Grp comauto     1988     1997      10       0
## # ... with 77,890 more rows, and 2 more variables: cum_loss <int>,
## #   loss_ratio <dbl>

# 3 Data Exploration

In terms of modeling, we first confine ourselves to a single line of business ‘ppauto’ and ensure the data we work with is a snapshot in time. We remove all data timestamped after 1997 and use the remaining data as our modelling dataset.

Once we have fits and predictions, we use the later timestamped data as a way to validate the model.

use_grcode <- c(43,353,388,620)

carrier_full_tbl <- claimdata_tbl %>%
filter(lob == 'ppauto')

carrier_snapshot_tbl <- carrier_full_tbl %>%
filter(grcode %in% use_grcode
,dev_year < 1998)

We are looking at four insurers with the GRCODEs above. Before we proceed with any analysis, we first plot the data, grouping the loss curves by accounting year and faceting by carrier.

ggplot(carrier_snapshot_tbl) +
geom_line(aes(x = dev_lag, y = loss_ratio, colour = as.character(acc_year))
,size = 0.3) +
expand_limits(y = c(0,1)) +
facet_wrap(~grcode) +
xlab('Development Time') +
ylab('Loss Ratio') +
ggtitle('Snapshot of Loss Curves for 10 Years of Loss Development'
,subtitle = 'Private Passenger Auto Insurance for Single Organisation') +
guides(colour = guide_legend(title = 'Cohort Year'))

We look at the chain ladder of the data, rather than looking at the loss ratios we just look at the dollar amounts of the losses.

snapshot_tbl <- carrier_snapshot_tbl %>%
filter(grcode %in% use_grcode[1])

snapshot_tbl %>%
select(acc_year, dev_lag, premium, cum_loss) %>%
print
## # A tibble: 10 x 12
##    acc_year premium   1   2   3   4   5   6   7   8   9  10
##  *    <chr>   <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
##  1     1988     957   133   333   431   570   615   615   615   614   614   614
##  2     1989    3695   934  1746  2365  2579  2763  2966  2940  2978  2978    NA
##  3     1990    6138  2030  4864  6880  8087  8595  8743  8763  8762    NA    NA
##  4     1991   17533  4537 11527 15123 16656 17321 18076 18308    NA    NA    NA
##  5     1992   29341  7564 16061 22465 25204 26517 27124    NA    NA    NA    NA
##  6     1993   37194  8343 19900 26732 30079 31249    NA    NA    NA    NA    NA
##  7     1994   46095 12565 26922 33867 38338    NA    NA    NA    NA    NA    NA
##  8     1995   51512 13437 26012 31677    NA    NA    NA    NA    NA    NA    NA
##  9     1996   52481 12604 23446    NA    NA    NA    NA    NA    NA    NA    NA
## 10     1997   56978 12292    NA    NA    NA    NA    NA    NA    NA    NA    NA

In the above ‘triangle’, we see the cumulative amounts of ‘incurred losses’ for each accounting year. 1988 was the first year and so has ten years of claims development by 1998. Similarily, 1989 has nine years of development and so on. Incurred claims come in two forms: closed claims that the insurer has paid out and will have no further changes, or open claims known to the insurer but not fully settled and paid out yet.

As claims develop, we see that the total claims is an approximately-monotonically increasing function of time, providing the motivation to model this pattern as a growth curve.

The premium column details the total premium received by the insurer for the policies written in that accounting year. Recall that the ratio of total claims paid to total premium received is the ‘Loss Ratio’ (LR).

For this insurer we see that the premium collected in each account year increases significantly over time, suggesting that the size of this line of business grew as time went on.

Next, we look at loss ratios in a similar fashion:

snapshot_tbl %>%
select(acc_year, dev_lag, premium, loss_ratio) %>%
print.data.frame(digits = 2)
##    acc_year premium    1    2    3    4    5    6    7    8    9   10
## 1      1988     957 0.14 0.35 0.45 0.60 0.64 0.64 0.64 0.64 0.64 0.64
## 2      1989    3695 0.25 0.47 0.64 0.70 0.75 0.80 0.80 0.81 0.81   NA
## 3      1990    6138 0.33 0.79 1.12 1.32 1.40 1.42 1.43 1.43   NA   NA
## 4      1991   17533 0.26 0.66 0.86 0.95 0.99 1.03 1.04   NA   NA   NA
## 5      1992   29341 0.26 0.55 0.77 0.86 0.90 0.92   NA   NA   NA   NA
## 6      1993   37194 0.22 0.54 0.72 0.81 0.84   NA   NA   NA   NA   NA
## 7      1994   46095 0.27 0.58 0.73 0.83   NA   NA   NA   NA   NA   NA
## 8      1995   51512 0.26 0.50 0.61   NA   NA   NA   NA   NA   NA   NA
## 9      1996   52481 0.24 0.45   NA   NA   NA   NA   NA   NA   NA   NA
## 10     1997   56978 0.22   NA   NA   NA   NA   NA   NA   NA   NA   NA

## 3.1 Loss Ratio Ladders

We are working with the loss ratio, so we recreate the chain ladder format but look at loss ratios instead of dollar losses.

ggplot(snapshot_tbl) +
geom_line(aes(x = dev_lag, y = loss_ratio, colour = acc_year)
,size = 0.3) +
expand_limits(y = 0) +
xlab('Development Time') +
ylab('Loss Ratio') +
ggtitle("Loss Ratio Curves by Development Time") +
guides(colour = guide_legend(title = 'Cohort Year'))