A closed-form approximation for pricing temperature-based weather derivatives

This paper develops analytical distributions of temperature indices on which temperature derivatives are written. If the deviations of daily temperatures from their expected values are modelled as an Ornstein-Uhlenbeck process with time-varying variance, then the distributions of the temperature index on which the derivative is written is the sum of truncated, correlated Gaussian deviates. The key result of this paper is to provide an analytical approximation to the distribution of this sum, thus allowing the accurate computation of payoffs without the need for any simulation. A data set comprising average daily temperature spanning over a hundred years for four Australian cities is used to demonstrate the efﬁcacy of this approach for estimating the payoffs to temperature derivatives. It is demonstrated that expected payoffs computed directly from historical records is a particularly poor approach to the problem when there are trends in underlying average daily temperature. It is shown that the proposed analytical approach is superior to historical pricing.


Introduction
A weather derivative takes its value from an underlying measure of weather, such as temperature, rainfall or snowfall over a particular period of time, and permits the financial risk associated with climatic conditions to be managed. Major participants in this market include utilities and insurance companies along with other firms with costs or revenues that are dependent upon the weather. For example, an electricity supplier normally provides its customers with electricity at a fixed price irrespective of the wholesale price. On the other hand the wholesale price of electricity can fluctuate wildly with extreme temperatures, and so temperature-based derivatives can provide a hedging tool for fluctuations in wholesale electricity prices. The first weather derivative was transacted in the US in 1996 and the size of the market now exceeds US$ 8 billion. Almost all weather derivatives are based on temperature indices such as heating degree days and cooling degree days and consequently the focus of this paper will be exclusively on developing closed-form approximations to the distribution of the temperature indices on which temperature-based derivatives are written which in turn affects their valuation. * Traditionally, the valuation of options discounts the expected payoff at the risk-free force of interest based on a zero-arbitrage argument involving the formation of a portfolio consisting of a risk-free combination of an option and the underlying asset [3]. Because temperature cannot be traded, there is no arbitrage-free pricing framework available to price this kind of option. The generally accepted way to value temperature derivatives is the actuarial method in which the fair price is taken to be the expected value of the payoff ignoring discounting and any volatility premium. The crucial element of this valuation strategy is the accurate calculation of the distribution of the relevant temperature index on which the weather derivative is written.
The most direct way to compute the distribution of temperature indices is from historical records [4,5]. A more elaborate method is to fit a model to the time-series of average daily temperature so as to capture seasonal variations in both temperature and its volatility [5,6]. The model is then used to simulate temperature outcomes over the period of the contract in order to construct the distribution of the temperature-based index on which the derivative is written. Note that widely-available meteorological forecasts are not suitable for this purpose because these forecasts are made over relatively short horizons, such as 7 days, whereas temperature derivatives are often traded well before the contracts generate any payoffs [7,8,6].
This paper makes two contributions to the existing literature on pricing temperature derivatives. First, it builds on the early work of Benth andŠaltynė-Benth [9] by developing closed-form approximations to the distribution of the indices on which temperature-based derivatives are writ-ten with particular emphasis on obtaining good estimates of the variance of relevant index. Second, two methods are provided for estimating the parameters of the model underpinning the behaviour of temperature that are required to implement the pricing strategy. There are respectively a two-step least-squares based approach and a more comprehensive maximum-likelihood procedure.
The ideas developed in this paper are applied to data comprising average daily temperatures for over a century in four Australian cities, namely, Brisbane (BNE), Melbourne (MEL), Perth (PER) and Sydney (SYD), where accurate temperature records of long-duration are available at single weather stations. This is a quality data set which represents a substantial improvement on what appears to be the current standard used in the literature. The empirical results based on this data set, demonstrate that the closed-form pricing strategy performs substantially better that using historical pricing.

A Model of Daily Temperature
The first step in pricing any temperature-based option must be a model of the underlying index from which the option derives its value, which in the case of temperature derivatives is average daily temperature. Let average daily temperature be expressed as the sum of the seasonal mean temperature T ptq at time t and the deviation θptq of the average daily temperature from its seasonal mean. Suppose that θptq is modelled by the Ornstein-Uhlenbeck process * dθ "´αθ dt`σptq dW , α ą 0 , where dW is the increment in the Wiener process. The parameter α and the volatility σptq are to be determined from observations of average daily temperature. Equation with autocorrelation function at lag u given by where Sptq is the variance of daily average temperature. It is straightforward to show that σ 2 ptq and Sptq satisfy The joint distribution of the average daily temperatures T t and T t`s at the respective calender times t and pt`sq (s ą 0) is given by the product where f pT t , tq is the marginal distribution of T t , namely is the transitional probability density function from T t to T t`s . Consequently the joint probability density function, and β " e´α s . Thus the joint probability density function of pT t , T t`s q is multivariate Gaussian with mean value µ " pT t , T t`s q and covariance matrix This model of average daily temperature is now used to develop a closed form approximation to the distributions of the underlying temperature indices on which vanilla European options † are written, namely cumulative heating degree days (HDDs) and cumulative cooling degree days (CDDs).

Distribution of Temperature Indices
Let T ave denote the average temperatures in degrees Celsius measured on a particular day at a specific weather station. The HDD and CDD indices at that station on that day are defined respectively by HDD " max`T´T ave , 0˘, where T˝C is a threshold temperature. The choice of threshold, in this instance 18˝C, is set by market convention and is the standard used in the US. In the southern (northern) hemisphere the HDD (CDD) season would be from May to September, while the CDD (HDD) season would be from November to March. Without loss of generality, the analysis of this paper will be limited to considering European call options written on cumulative CDDs.

2021
The CDD index over a period of N consecutive days is defined by where T k is the average daily temperature on the k th day of the derivative.
Let D be the strike of a call option defined as a particular value of the CDD index. The buyer of this option pays an up-front premium and receives a payout if the value of the CDD index exceeds D at the maturity of the option. The tick value of a cumulative CDD call option with strike D and duration N days is therefore (1.7) The per-unit monetary payoff from the contract is its expected tick value 8) where f N pxq is the probability density function of C N and therefore the efficacy of this pricing strategy relies upon the accurate estimation of f N pxq. The idea pursued here is that although the daily contributions to C N are truncated correlated random variables in which the degree of truncation is nontrivial, nevertheless C N will behave as a Gaussian random variable provided N is suitably large. The central theoretical result of the paper is summarized in Proposition 1.

Proposition 1
The tick value C N of a European option defined on cumulative cooling degree days is approximately Gaussian distributed with mean value and variance Var rC N s with expression where z k " pT k´T q{ ? S k , β j,k " e´α pj´kq and the constants η j,k , χ j,k , p and q are defined respectively by q " p 2´1´η j,k Φp´η j,k q φpη j,k q¯.
(1.9) Proposition 1 establishes that accurate closed form expressions for the mean and the variance of C N are available in terms of the density function and distribution function of the standard normal distribution alone. Given these results, the per-unit monetary payoff of a CDD call option is stated in Proposition 2.

Proposition 2
The per-unit monetary payoff of a European call option with strike D written on C N , where the distribution of C N is Gaussian with mean and variance established in Proposition 1, is given by The focus of subsequent subsections is to develop and prove the results stated in Proposition 1.

Mean of C N
It follows directly from equation (1.6) that where φpzq and Φpzq are respectively the probability density function and cumulative distribution function of the standard normal. The quoted expression for E rC N s follows immediately from result (1.11). Moreover, it should be noted in passing that the proof of Proposition 2 is analogous to the derivation of equation (1.11).

Variance of C N
The computation of the variance of C N is less straightforward. The key steps in this calculation are outlined here with the detail being relegated to Appendices 1 and 2. The analysis begins by noting that Var rC N s can be expressed Cov rT k , T j s .

(1.12) Straightforward calculation indicates that
Var rT k s " (1.14) It is demonstrated in Appendix 1 that thereby completing the computation of the first item on the right hand side of equation (1.12).
The second item on the right hand side of equation (1.12) is a sum of covariances of generic form in which t and s (ą 0) are to be given appropriate values. First, the integral on the right hand side of equation (1.16) is simplified using the change of variables T t " T t´? S t z and T t`s " T t`s´a S t`s w to get S t and z t`s " pT t`s´T q{ a S t`s and r f pz t`s , z t q is the joint probability density of z and w, namely 1 2π where β " e´α s and The integral in equation (1.17) is expressed as a repeated integral in which integration is first performed with respect to w and then again with respect to z. The detailed calculations can be found in Appendix 2, but the outcome of these operations is that (1.20) In particular each component of Cov rT t , T t`s s, with the exception of the integral, may be evaluated from the probability density function φpzq and cumulative distribution function Φpzq of the standard normal with appropriately chosen arguments. The usefulness of expression (1.19) for Cov rT t , T t`s s can be improved if the value of the integral appearing in this formula can be expressed, albeit approximately, in terms of φp q and Φp q with appropriately chosen arguments.
For positive values of the parameter q, this objective can be achieved by making the approximation noting, in particular, that the approximation agrees with the interpolated function at z " z t and as z Ñ´8 independently of the values of the parameters p and q. The quality of the approximation is improved by choosing the values of p and q to ensure that the first and second derivatives of the interpolating function match those of the interpolated function when z " z t . The outcome of this matching procedure is that (1.22) In particular, it is easy to show that q ą 0, as required. The use of the interpolating formula (1.21) to evaluate the integral in expression (1.19) leads to the conclusion that ż zt (1.24) Expressions (1.15) and (1.24) (with t replaced by k and t`s replaced by j) when substituted into expression (1.12) provide a closed-form approximation for the variance of the cumulative temperature index which is then treated as a Gaussian random variable with the computed variance and mean value given by expression (1.11).

Approximating the Variance
A closed form expression for the variance of the cumulative temperature index was derived in the previous subsection. Curiously a heuristic argument based on interpolation can be used to generate a simpler expression for this variance, one that exhibits good accuracy despite the empirical nature of the derivation. The argument begins by noting that the k-th day in the lifetime of a CDD option will contribute to the cumulative temperature index driving the value of the option with probability where Φpzq is the cumulative distribution function of the standard normal and T is the temperature above which CDDs are accumulated. If the k-th day always contributes to the cumulative temperature index then the variance of that contribution would be S k . On the other hand if the k-th day never contributes to the cumulative temperature index then the variance of that contribution would be zero. Since in reality the k-th day contributes fraction p k of the time then linear interpolation suggests that the variance of this contribution may be reasonably approximated by S k p k . Based on this idea, the first summation on the right hand side of equation ( The second summation on the right hand side of equation (1.12) is a correction to expression (1.26) to take account of the fact that contributions to the value of the temperature index from different days are not independent. The contribution made by the quantity Cov rT k , T j s to the variance of the temperature index is argued in a similar way. In the absence of clipping, the variance of this product is equal to Cov rθ k , θ j s with value S k e´α pj´kq assuming that j ą k.
However, the product T k T j is nonzero with probability p k p j and therefore the same linear interpolation argument suggests that Cov rT k , T j s is reasonably approximated by p k p j S k e´α pj´kq . Based on this idea, the second summation on the right hand side of equation (1.12) has approximate value (1.27) In conclusion, linear interpolation suggests that the variance of T is well approximated by the formula (1.28) In fact equation (1.28) is the first-order approximation to the closed-from expression of the variance in Proposition 1. Consequently, it is expected that this approximation will perform particularly well when the level of truncation is low and also when the persistence in temperature is low which means that deviations in temperature, θptq, are restored to their mean value relatively quickly.
To test the accuracy of the approximate closed-form expression for Var rC N s stated in Proposition 1, tranches of one million realizations of equation (1.1), each of duration 90 days, were constructed for fixed values of α and σ. Specifically, each realization pθ 0 ,¨¨¨, θ 90 q was obtained by drawing θ 0 from the marginal density of θ expressed in the form N p0, S 2 q, and subsequent values of θ were determined exactly using the iteration where ξ k " N p0, 1q. Realizations of θptq generated in this way had mean value zero and stationary standard deviation S which was set at 4C˝for all simulation experiments. A threshold value of θ was chosen, say Θ, and a cumulative CDD for the 90 day period was constructed from a realization pθ 0 ,¨¨¨, θ 90 q using the formula C "  For α " 0.5 the column headed "Θ" gives threshold temperature relative to zero for contributions to cumulative CDD. Columns headed "Mean" and "Std Dev" give the mean cumulative CDD and its standard deviation based on one million simulations. Estimates of this standard deviation based on Proposition 1 (Exact) and the heuristic argument of Section 4. (Approx) are shown.
It is clear from these results that the variance of cumulative CDDs predicted by the closed-form approximation of Proposition 1 is achieved in practice. Minor differences between the approximate variance in Proposition 1 and that achieved by simulation become evident only when the threshold temperature lies two standard deviations or more above the mean temperature largely due to the fact that under these circumstances realizations of CDDs will be dominated by zero values. However this is not a scenario that will be occur in practice.
The most interesting observation in Tables 1.1 and 1.2 lies in the unexpected accuracy of the heuristic estimate of variance. In the region of most interest, that is when the threshold temperature lies on or below the average daily temperature taken to be zero in this analysis, the heuristic approach delivers parsimonious estimates of variance that, although marginally inferior to the estimates of true variance provided by Proposition 1, are negligibly different from it for all practical purposes.

Parameter estimation
To use this model for predicting the payoffs from temperature-based derivatives an estimate of the parameter α in equation (1.1) is required. This parameter measures the rate at which deviations of temperature from the seasonal are restored to this mean. In order to do so, it is first necessary to obtain estimates of T ptq and σptq. Following Campbell and Diebold [6], T ptq and σptq are approximated by the Fourier series T psq " a 0`b0 s`n ÿ k"1 a k cospω k sq`b k sinpω k sq , where ω k " 2kπ{365 and s " 0 is assumed to be the calender date of the first observation of average daily temperature. The contribution b 0 s in the expression for T psq is present to take account of any annual trend in daily average temperature. Otherwise expressions (1.31) assume that seasonal variations in daily average temperature follow an annual cycle which is independent of calendar year. Consequently, the expression for Sptq corresponding to the expression (1.31) for σ 2 psq is Spsq " p 0`n ÿ k"1 p k cospω k sq`q k sinpω k sq , (1.32) where the Fourier coefficients c 0 , c 1 ,¨¨¨, c n , d 1 ,¨¨¨, d n are related to the Fourier coefficients p 0 , p 1 ,¨¨¨, p n , q 1 ,¨¨¨, q n by the formulae where k takes all integer values from k " 1 to k " n inclusive. Two strategies to estimate the value of α and the coefficients in the Fourier series (1.31) are now described.

Two-step estimator
Suppose that the data consists of observations of daily average temperatures T 1 , T 2 ,¨¨¨, T N at times t 1 , t 2 ,¨¨¨, t N . The Fourier coefficients of T psq can be estimated in a straightforward way by minimizing the objective function Ψpa 0 , b 0 , a 1 ,¨¨¨, a n , b 1 ,¨¨¨, b n q " N ÿ j"1`T j´T pt j qq 2 .
Once these coefficients are known, then the deviations from the seasonal means θ 1 , θ 2 ,¨¨¨, θ n can be computed directly from the formula θ j " T j´T pt j q. The problem is now to find the values of α and the coefficients c 0 , c 1 ,¨¨¨, c n , d 1 ,¨¨¨, d n which best fit the residuals θ 1 , θ 2 ,¨¨¨, θ n . Using a result established by Bibby and Sorensen [12], (1.34) The difficulty, however, in using this expression is that σ 2 k is unknown whereas what is known is the seasonal variance of the residuals. The strategy for finding the values of α and the coefficients c 0 , c 1 ,¨¨¨, c n , d 1 ,¨¨¨, d n is therefore the following.
Step 2: Choose an arbitrary value for α, say α 0 , and compute the Fourier coefficients c 0 , c 1 ,¨¨¨, c n , d 1 ,¨¨¨d n from expression (1.33) with α " α 0 . Knowing the Fourier coefficients of σ 2 psq enables σ 2 0 ,¨¨¨, σ 2 n to be computed from equation (1.31). Expression (1.34) is now used to update the estimate of α 0 . This procedure may then be iterated by recomputing in turn c 0 , c 1 ,¨¨¨, c n , d 1 ,¨¨¨d n and σ 2 0 ,¨¨¨, σ 2 n . This procedure is repeated until consecutive estimates of α are not deemed to be significantly different.
The estimate of α and the Fourier coefficients a 0 , b 0 , a 1 ,¨¨¨, a n , b 1 ,¨¨¨, b n and c 0 , c 1¨¨¨, c n , d 1 ,¨¨¨d n can either be used as they stand or can be used as an initial guess for the parameters of the maximum likelihood estimation procedure outlined in the next subsection.

Maximum-likelihood estimation
The feasibility of parameter estimation by maximum likelihood (ML) in this instance relies on the fact that the transitional probability density function of average daily temperature can be computed under the assumption that the deviations of average daily temperature from its mean value satisfies the stochastic differential equation (1.1). Ito's lemma applied to the stochastic differential equation (1.1) may be shown to lead to the formal solution θptq " θ j e´α pt´tj q`ż t tj e´α pt´sq σpsq dW s , t ą t j .
(1.35) with θ j " θpt j q. The important observation from this solution is that θptq is a Gaussian random variable with mean value E rθptqs " θ j e´α pt´tj q and variance χpt, t j q " ż t tj e´2 αpt´sq σ 2 psq ds " Sptq´e´2 αpt´tj q Spt j q , (1.36) where the latter expression for χpt, t j qt is derived directly from the definition of Sptq given in equation (1.3). Because T " T ptq`θptq, then the average daily temperature T is itself Gaussian distributed with mean value T ptq``T j´T j˘e´α pt´tj q and variance χpt, t j q " Sptqé´2 αpt´tj q Spt j q in which T ptq " a 0`b0 t`n ÿ k"1 a k cospω k tq`b k sinpω k tq . (1.37) Thus the average daily temperature T ptq has transitional probability density function where ψpT, tq "`T´T ptq´pT j´T j q e´α pt´tj q˘2 2χpt, t j q .
The likelihood of observing the sequence T 1 , T 2 ,¨¨¨, T N of average daily temperatures at calendar times t 1 , t 2 ,¨¨¨, t N is therefore Lpα; a 0 , a 1 ,¨¨¨, a n , b 1 ,¨¨¨b n ; c 0 , c 1 ,¨¨¨, c n , d 1 ,¨¨¨, d n q (1.39) In practice, the parameters are estimated by minimizing the negative log-likelihood functioń (1.40) where the notation S j " Spt j q has been used. The optimal values for the parameters of this model are taken to be those which minimize expression (1.39). Although model (1.1) is specified in terms of the intrinsic function σptq, from a purely technical point of view it is easier to treat the Fourier coefficients of Sptq as the parameters to be determined by the ML procedure.

Empirical Illustration
The task is now to provide a means of gauging the efficacy of the analytical expressions for the mean and variance of C N given derived previously in terms of the the expected payoffs to options contracts. Payoffs based on the analytical results of the paper are compared to historical pricing as outlined in [4,5]. The metric for comparison is taken to be the mean 'profit' of a 90-day call option contract. Profit is defined from the point of view of the buyer of the call option as the difference between the actual tick value of the contract and the expected tick value or 'price' of the option. Of course, this is not meant to represent a true price for the option, as this notional pricing strategy takes no account of discounting or overhead expenses. But of course, any pricing scheme will stand or fall by its ability to estimate the expected tick value accurately.

Data
The data set comprises daily maximum and minimum temperature records in degrees Celsius for Brisbane that the behaviour of the mean and standard deviation is amenable to modelling by a low-order Fourier series approximation. In this exercise the order of the series is taken to be 3. The Fourier approximation is applied only over the period over which the option is to be written, namely, 1 January to 31 March, inclusive.  The distributions of cumulative CDDs for each city is illustrated in Figure 1.2 which plots both the distribution of historical cumulative CDDs (shaded region) and the predicted distributions for 1950 (dashed line) and 2007 (solid line) generated by closed-form approximations to the distributions of CDDs derived in the paper. To the uniformed eye, the distribution of historical cumulative CDDs may appear well behaved and taken as reasonable evidence in favour of using historical records to price temperaturebased derivatives. When compared to the distributions for 1950 and 2007 generated by the analytical approach, however, the potential for error inherent in the historical approach becomes evident. Not only does the mean of the predicted distribution change noticeably over time, but the distribution also has lower volatility.

Payoffs
The profits generated by two call-option contracts with different strike prices, written on the period 1 January to 31 March are now reported in Tables 1.4  The experiments begin by pricing these options for the year 1950 using data up to and including 1949. The actual payoff for 1950 is recorded, the profit or loss stored and the data set updated to include the latest observation on cumulative CDDs. These steps are repeated up to and including 2007 giving a total of 58 separate profits for each option. The means and standard deviations of the profits are regarded as measures of the performance of each of the methods used to determine expected tick values. The historical pricing reported in Tables 1.4 and 1.5 is self-explanatory, but the implementation of the closedform approximations needs further elucidation. Two variations of this method are implemented, namely an annual version and a quarterly version. The annual approach fits the mean and seasonal variance of average daily temperature using data for the entire year and the best estimates of the parameters are used in computing the closed-from approximations of the distribution of cumulative CDDs. By contrast, the quarterly version focusses on the period from 1 January to the 31 March in each year and fits the mean and seasonal variance of average daily temperature for this 90-day period alone. In other words, the fitting procedure is implemented only on the period over which the contract is written. The main reason for adopting this approach is that the behaviour of temperature in parts of the year unrelated to the period of the option are not being allowed to influence parameter estimates for the mean and variance processes. Another benefit of this approach is that better resolution of the mean and variance processes with the same number of parameters. The first striking conclusion to be drawn from these results is just how bad historical pricing performs for the Australian temperature data. Interestingly enough, it appears that historical pricing in three of the cities has substantially over-priced the call options. This result is counter-intuitive as the conventional view is that there is an upward trend in temperature which would result in the under-pricing of call options priced on the history of cumulative CDDs.

BNE
The resolution of this conundrum is to be found in the behaviour of temperature between the years 1890 and 1920. During this period, Brisbane, Melbourne and Perth recorded substantial outliers in cumulative CDDs, the likes of which were not seen again until late in the sample period. These outliers will have had a disproportionate affect on the pricing of temperature derivatives in the 1960s, 1970s and 1980s. Their existence also explains the deterioration of profits based on historical pricing when moving from lower to higher exercise prices. The weather station in Sydney where the temperature data were recorded did not show these extreme temperature events and consequently historical pricing for Sydney performs significantly better.
Taken as a whole, the closed-approximations used to price the call options generate mean profits closer to zero and with lower standard deviations than historical pricing. Nevertheless, this method appears to underprice somewhat, even though these pricing errors are smaller in magnitude than those generated by the historical method. This underpricing is again a manifestation of the outliers in cumulative CDDs but in this case, not enough weight is given to them. There is little difference in terms of performance of quarterly and annual models, with the exception of Perth where the quarterly model performs better. It is conjectured that this is due to the ability of the quarterly model to better resolve the extreme temperature variations that are prone to take place in Perth. Unlike the case documented for historical pricing, there seems little difference in performance when moving from the lower to the higher exercise price for the the closed-form approach.

Conclusion
This paper has derived closed-form expressions for approximating the distribution of temperature indices. The major practical use for these approximations is in estimating the payoffs to temperature-based weather derivatives. Although the cumulative cooling degree day index is the focus of this research, the methods used are equally applicable to derivatives based on cumulative heating degree days. Common practice when modelling average daily temperature is to regard the deviations of temperature from its expected value as an Ornstein-Uhlenbeck process. The key result derived in this paper, is that if this model of temperature is adopted, then the distribution of cumulative cooling degree days may be constructed as the sum of truncated, correlated Gaussian deviates. The mean and variance of the resultant Gaussian distribution depend on the parameters of the underlying temperature process and its autocorrelation structure.
The efficacy of these approximate distributions is tested by estimating the payoffs to temperature-based derivatives. Time series data spanning over a hundred years of average daily temperatures in four major Australian cities are used to estimate the payoffs to European call options written on cooling degree days. The robust conclusion to emerge from this line of research is that the closedform distributions perform more reliably than the historical pricing method that is commonly advocated in the literature.