The purpose of this paper is to bring awareness to the general public that certain conditions that occur at a buoy in the Atlantic Basin, such as wind located at the buoy, pressure located at a buoy, water temperature located at a buoy, atmospheric pressure located at a buoy, may be useful in helping predict when a hurricane could possibly hit the state of Florida in the future. One of the goals of this paper is to bring new statistical methods to investigate and analyze data, which will create better predicable measures in determining when a hurricane will possibly hit the state of Florida. In this paper , the topics of binary logistic regression and multinomial regression modeling are discussed in reference to their outcomes of both the odds ratio and relative risk ratio respectively. The coefficients from these models will show which prospective buoy conditions are possibly more responsible for indication of a storm being present in the Atlantic Basin. In this paper, the data that was used and compiled into a larger data set came from two different sources. First, the hurricane data for the years 1992 - 2013 came from Unisys Weather site (Atlantic Basin Hurricanes data) and the buoy data has been available from the National Buoy Center. In this paper, the variables of interest are: storm present, buoy wind speed, buoy pressure, buoy atmospheric temperature, buoy water temperature and buoy wind direction. The buoy conditions are the buoy wind, the buoy wind direction, the buoy pressure, buoy atmospheric temperature and the buoy water temperature.
The research statements/questions to be addressed in this chapter are:
1) Determining the odds of a storm being present in the Atlantic Basin, given the buoy conditions;
2) Determining the odds of a storm being present categorically, given what the conditions are at the buoy.
In order to address the first research statement, we will use the concept of logistic regression. We will let the buoy wind be denoted as x_{1}, buoy wind direction as x_{2}, buoy pressure as x_{3}, buoy atmospheric temperature as x_{4}, and the buoy water temperature as x_{5} [
y = 1 1 + e ∧ − ( β 0 + β 1 x 1 + ⋯ β k x k )
Applying such a model to our dataset, each estimated coefficient is the expected change in the log odds of a storm being present for a unit increase in the corresponding predictor variable holding the other predictor variables constant (please refer to the methodology chapter for further explanation). The binary logistic regression model has with multiple predictor variables and no interaction terms. The categorical dependent variable in our model is the storm present (y), and the predictor variables are the buoy wind speed (x_{1}), buoy wind direction (x_{2}), buoy pressure (x_{3}), buoy atmospheric temperature (x_{4}), and buoy water temperature (x_{5}) [
y = { 1 , if a storm is present 0 , otherwise }
The general logistic model is denoted as:
y = 1 1 + e ∧ − ( β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3 + β 4 x 4 + β 5 x 5 )
Using the binomial logistic regression methods for computing the coefficient estimates for the predictor variables, we found that each predictor variable was significantly contributing to the model above [
Predictor Variables | Coefficients | P-values | Significance Ranking p-values |
---|---|---|---|
B Wind | 0.72 | 0.061 | 3 |
B Wind Direction | 0.86 | 0.122 | 5 |
B Pressure | −0.09 | 0.080 | 4 |
B Atmospheric Temp | 0.93 | 0.032 | 2 |
B Water Temp | 1.21 | 0.011 | 1 |
Note that in
The developed model is as follows:
y = 1 1 + e ∧ − ( 102.14 + 0.72 x 1 + 0.86 x 2 − 0.09 x 3 + 0.93 x 4 + 1.21 x 5 )
Our developed model was found to be an 83% accurate model, based on the calculation of the probability of accuracy. We will now interpret the odds, where the odds are the function that exponentiate the predicted logit coefficients. We are interested in determining the odds of a storm being present in the Atlantic Basin, given the buoy conditions. First, we will look at the buoy conditions of wind speed and wind directions and interpret the odds result. In our data set, the range for the buoy wind speed is 40 mph (wind speed at the buoy), whereas the mean is 5.9. The range for the buoy wind direction is 26 (min value is 2.60 and the max value is 28.60). In
In the following analysis of the odds for the buoy wind speed and wind direction, we will be looking at the buoy wind speed and wind direction conditions for when the odds are greater than 3, to see if these conditions are indicators for when a storm is present. If the odds are greater than 3, then the buoy conditions for the wind speed is between 15 mph and 25 mph, and the wind direction is between 22 and 26 [
In the following
Thus, we can say that when the buoy wind speed is between 15 mph and 25 mph and the buoy wind direction is between 22 and 26, the odds are greater than 3, indicating that there will most likely be a storm present in the Atlantic Basin.
Next, we will look at the buoy conditions of the buoy wind speed and buoy pressure and interpret the odds result. In our data set, the range for the buoy pressure is 101.5 mb (min value is 935.5 mb and the max value is 1037 mb), these are measured in mill bars. In the following
In the following
Thus, we can say that when the buoy wind speed is between 15 mph and 40 mph and the buoy pressure is between 935 mb and 990 mb, the odds are greater than 20, indicating that there will most likely be a storm present in the Atlantic Basin [
Next, we will look at the buoy conditions of the buoy atmospheric temperature and buoy water temperature and interpret the odds result. In our data set, the range for the buoy atmospheric temperature is 23.7 degrees Celsius (min value is 9.60 and the max value is 33.30), these are measured in Celsius, and the mean is 25. The range for the buoy water temperature is 13.7 degrees Celsius (min value is 20.10 and the max value is 33.80), these are measured in Celsius, and the mean is 26.42.
In the following analysis of the odds for the buoy atmospheric temperature and the buoy water temperature, we will be looking at the buoy atmospheric temperature and water temperature conditions for when the probability of the odds are greater than 3, to see if these are indicators for when a storm is present. In the following
Thus, we can say that when the atmospheric temperature is between 24 and 30 and the water temperature is between 28 and 34, the odds are greater than 3, indicating that there will most likely be a storm present in the Atlantic Basin.
This makes sense since the mean of the buoy atmospheric temperature is 25, and the third quartile of the buoy water temperature is 29. The interquartile range is 29 − 24.45 = 5.55 (which explains the temperature intervals of 28 - 34). When the odds are greater than 3, we can say that when the buoy wind speed is between 15 mph and 25 mph, the wind direction is between 22 and 26, the buoy atmospheric temperature is between 24 and 30, and the buoy water temperature is between 28 and 34, then there will most likely be a storm present in the Atlantic Basin. Also, when the odds are greater than 20, the wind speed is between 15 mph and 40 mph while the buoy pressure is between 935 mb and 990 mb. Thus, when the odds are greater than 20 with the speed being between 15 mph and 40 mph and the buoy pressure is between 935 mb and 990 mb, then there will most likely be a storm present in the Atlantic Basin [
Since our model is showing to be 83% accurate, we can now find the odds ratio (OR) of the predictor variables within the model. We will be using the previous analysis of the buoy conditions of when the odds are greater than 3 and greater than 20 (buoy wind and buoy pressure) to find the odds ratio of the predictor variables in our model. To find each exponentiated coefficient OR (odds ratio) of the variables, we need to first substitute the coefficients estimates for buoy wind, buoy wind direction, buoy pressure, buoy atmospheric temperature, and buoy water temperature, into the exponential of their coefficients, (i.e., e β 1 , e β 2 , ⋯ e β n ).
Since we have five predictor variables, we will have five cases where each of the predictor variables has a fixed value so that we can estimate each predictor variables odds of having a storm being present. First, we will hold all but one variable fixed and test the effects of a unit(s) change. This means that we will
Predictor Variables | Coefficients | Exponentiated Coefficients |
---|---|---|
Buoy Wind | 0.72 | 2.06 |
Buoy Wind Direction | 0.86 | 2.36 |
Buoy Pressure | −0.09 | 0.91 |
Buoy Atmospheric Temperature | 0.93 | 2.53 |
Buoy Water Temperature | 1.21 | 3.35 |
hold the buoy wind direction, buoy pressure, buoy atmospheric temperature and buoy water temperature at a fixed value. Holding the buoy wind direction, buoy pressure, buoy atmospheric temperature and the buoy water temperature at a fixed value, the odds of a storm being present (1) for when the buoy wind speed is between 15 - 40 mph, over the odds of a storm not being present (0) with the buoy wind speed not being between 15 - 40 mph is exp (0.72) = 2.06. We can say that the odds for a storm being present when the buoy wind speed is between 10 - 25 mph is 206% higher than the odds for a storm not being present when the buoy wind speed is not between 15 - 40 mph. When the buoy wind direction, buoy pressure, buoy atmospheric temperature and buoy water temperature are held constant (or fixed) and the OR for buoy wind is greater than 1, then this means that the probability of a storm being present increases with the buoy wind conditions being from 15 - 40 mph. In other words, when the wind speed ranges from 15 - 40 mph at the buoy then the higher the chance a storm will be present.
Next, we will hold the buoy wind speed, buoy pressure, buoy atmospheric temperature and buoy water temperature at a fixed value. When holding these predictor variables at a fixed value, the odds of a storm being present (1) for the buoy wind direction being between 22 and 26 over the odds of a storm not being present (0) with the wind direction not being between 22 and 26 is exp (0.86) = 2.36. When the odds for a storm being present when the wind direction is between 22 and 26 is 232% higher than the odds for a storm not being present when the buoy wind direction is not between 22 and 26. Since the OR for the wind direction is greater than 1, then the probability of a storm being present increases with the buoy wind direction conditions between 22 and 26 [
Holding the buoy wind speed, buoy atmospheric temperature and the buoy water temperature at a fixed value, the odds of a storm being present (1) for the buoy pressure being between 935 mb and 990 mb over the odds of a storm not being present (0) with the buoy pressure not being between 935 mb and 990 mb is exp (−0.09) = 0.91. We can say that the odds for a storm being present for the buoy pressure being between 935 mb and 990 mb is 91% higher than the odds for a storm not being present when the buoy pressure is higher than the values between 935 mb and 990 mb. The coefficient for the buoy pressure says that, holding the buoy wind, buoy atmospheric pressure and the buoy water temperature at a fixed value, we will see a −9% decrease in the odds of having a storm being present when the buoy pressure is higher than the values between 935 mb and 990 mb. The OR for buoy pressure is less than 1, which means that the odds of a storm being present are lower when the buoy pressure is higher than the values between 935 mb and 990 mb. This implies that the probability of a storm being present decreases when the buoy pressure (pressure located at the buoy) values are not between 935 mb and 990 mb. Next, we will hold the buoy wind speed, buoy wind direction, buoy pressure and buoy water temperature at a fixed value [
Holding the buoy wind speed, buoy wind direction, buoy pressure and the buoy water temperature at a fixed value, the odds of a storm being present (1) for the buoy atmospheric pressure being high in between the values of 24 and 30 over the odds of a storm not being present (0) with the buoy atmospheric temperature being lower than the values between 24 and 30 is exp (0.93) = 2.53. In terms of percent change, we can say that the odds for a storm being present for the buoy atmospheric pressure being between the values of 24 and 30 is 253% higher than the odds for a storm not being present when the buoy atmospheric pressure is not between the values of 24 and 30. Since the OR for buoy atmospheric temperature is greater than 1, this means that the probability of a storm being present increases when the atmospheric temperature values at the buoy are between 24 and 30. Now, holding the buoy wind speed, buoy wind direction, buoy pressure, and buoy atmospheric pressure at a fixed value, the odds of a storm being present (1) for the buoy water temperature being between the values of 28 and 34 over the odds of a storm not being present (0) with the buoy water temperature not being between the values of 28 and 34 is exp (1.21) = 3.35. We can say that the odds for a storm being present for the buoy water temperature being between the values of 28 and 34 is 3.35% higher than the odds for a storm not being present when the buoy water temperature is lower than these values. The coefficient for the buoy water temperature says that, holding the buoy wind speed, buoy pressure, and buoy atmospheric temperature at a fixed value, we will see a 121% increase in the odds of having a storm present when the buoy water temperature is higher (between the values of 28 and 34) than the average water temperature of 26. Since the OR for buoy water temperature is greater than 1, then this means that the probability of a storm being present increases when the water temperature (the water temperature located at the buoy) is between the values of 28 and 34. In other words, the higher the water temperature is at the buoy the higher the chance a storm is present. Since the OR is less than 1 for the buoy pressure, than we will eliminate the buoy pressure predictor variable from our current binomial logistic regression model [
Next, we will address our second research statement, determine the odds of a storm being present categorically, given what the conditions are at the buoy. We will use (multinomial) logistic regression to further address our research statement. The categorical dependent variable in our model is the Hurricane Category (1 - 5) (y), and the predictor variables are the buoy wind (x_{1}), buoy pressure (x_{2}), buoy atmospheric temperature (x_{3}), buoy water temperature (x_{4}), buoy atmospheric temperature*buoy water temperature (x_{5}). The predictor variables are the conditions at the buoy.
To address our research statement, first, the following analytic model form
y = α i j + x 1 β 1 + x 2 β 2 + x 3 β 3 + x 4 β 4 + x 5 β 5
will be used to gather information regarding the coefficients with their associated relative risk ratios. Relative risk is used frequently in the statistical analysis of binary or multinomial outcomes where the outcome of interest has relatively low probability. The binary logistic model had an OR (odd ratio) interpretation for the coefficients of the model, the multinomial logistic regression model that we will be using will have a similar interpretation for the coefficients of the model. Instead of using OR, we will use the relative risk (RR) ratio for the interpretation of the coefficients in our multinomial logistic regression model [
Using the binomial logistic regression methods for computing the coefficient estimates for the predictor variables, we found that each predictor variable was significantly contributing to the model above [
Our developed model with the coefficient estimates is as follows:
y = 24.5074585 + 0.007x_{1} − 0.06x_{2} − 0.09x_{3} − 0.06x_{4} + 0.03x_{5}
Our developed model was found to be an 87% accurate model, based on the calculation of the probability of accuracy. To find each exponentiated coefficient RR (relative risk) of the variables, we need to first substitute the coefficients estimates for buoy wind (x_{1}), buoy pressure (x_{2}), buoy atmospheric temperature (x_{3}), buoy water temperature (x_{4}), buoy atmospheric temperature*buoy water temperature (x_{5}) into the exponential of their coefficients, (i.e., e β 1 , e β 2 , ⋯ e β n ).
Since the RR for buoy wind speed is equal 1, then this means that there is no difference that the probability that a hurricane of category (1 - 5) will occur when the wind conditions at the buoy are between the values of 15 - 40 mph [
Upon using the concept of backward elimination and subset analysis from the variables in the data set (
y = 7.90 − 0.008 x 1 − 0.05 x 2 + 0.02 x 3
Predictor Variables | Coefficients | P-Values | Significance Ranking P-Values |
---|---|---|---|
B Wind | 0.007 | 0.31 | 1 |
B Pressure | −0.06 | 0.052 | 4 |
B Atmospheric Temp | −0.09 | 0.09 | 3 |
B Water Temp | −0.06 | 0.05 | 5 |
B Atmp Temp*B Water Temp | 0.03 | 0.14 | 2 |
Predictor Variables | Coefficients | Exponentiated Coefficients |
---|---|---|
B Wind | 0.007 | 1 |
B Pressure | −0.06 | 0.94 |
B Atmospheric Temp | −0.09 | 0.91 |
B Water Temp | −0.06 | 0.94 |
B Atmp Temp*B Water Temp | 0.03 | 1.03 |
Predictor Variables | Coefficients | Exponentiated Coefficients |
---|---|---|
Buoy Pressure | −0.008 | 0.99 |
Buoy Atmospheric Temperature | −0.05 | 0.95 |
Buoy Water Temperature | −0.02 | 0.98 |
Our new developed model was found to be a 93% accurate model, based on the calculation of the probability of accuracy (please refer to the methodology chapter for an extended description). Now, we will find the RR of each of the predictor variables from our developed model. In the following table the coefficients and exponentiated coefficients of the predictor variables for our new developed model are displayed.
Since the RR for buoy pressure is less than 1, then this means that the probability that a hurricane of category (1 - 5) is more likely to occur, when the pressure at the buoy is between 935 mb and 990 mb. In other words, when the pressure located at the buoy is low, then there is a high chance of a hurricane of category (1 - 5) to occur. Since the RR for buoy atmospheric temperature is less than 1, then this means that the probability that a hurricane of category (1 - 5) is more likely to occur, when the atmospheric temperature is between the values of 24 and 30. In other words, when the atmospheric temperature located at the buoy is high (between the values of 24 and 30), then there is a high chance of a hurricane of category (1 - 5) to occur. The RR for buoy water temperature is less than 1, then this means that the probability that a hurricane of category (1 - 5) is more likely to occur, when the water temperature is between the values of 28 and 34. In other words, when the buoy water temperature is high (between the values of 28 and 34), then there is a high chance of a hurricane of category (1 - 5) to occur. Once again, we can conclude that when the pressure at the buoy is low (between the values of 935 mb and 990 mb), the atmospheric pressure at the buoy is high (between the values of 24 and 30), and the water temperature at the buoy is high (between the values of 28 and 34), then there is a high chance of a hurricane of category (1 - 5) to occur [
Based on our examination of the binary logistic regression outcomes of the coefficients odds ratio and the outcomes from the relative risk ratio from the coefficients of the multinomial regression model, the higher the water temperature is at the buoy, the higher the chances are that a storm will be present. The higher the atmospheric pressure is at the buoy and the lower the pressure is at the buoy, the higher the chance is that a storm will be present. On the other hand, when the pressure located at the buoy is high and the atmospheric temperature at the buoy is low, then there is a low chance or low probability that a hurricane of category (1 - 5) will occur. When the buoy wind speed is high between the values of 15 mph and 40 mph and the water temperature at the buoy is high (between the values of 28 and 34), then there is a high probability that a hurricane of category (1 - 5) will occur. When the probability of the odds is greater than 3, with the conditions of the buoy being the wind speed being between 15 mph and 25 mph, the buoy wind direction being between 22 and 26, the buoy atmospheric temperature being between 24 and 30, and the buoy water temperature being between 28 and 34, then there most likely will be a storm present in the Atlantic Basin. Now, when the probability of the odds is greater than 20, with the buoy conditions of the buoy wind speed being between 15 mph and 40 mph and the buoy pressure being between 935 mb and 990 mb, then there is most likely a storm present in the Atlantic Basin [
We have found that the first binomial logistic regression model
y = 1 1 + e ∧ − ( 102.14 + 0.72 x 1 + 0.86 x 2 − 0.09 x 3 + 0.93 x 4 + 1.21 x 5 ) ,
was 83% accurate, while our second multinomial logistic regression model
y = 24.5074585 + 0.007 x 1 − 0.06 x 2 − 0.09 x 3 − 0.06 x 4 + 0.03 x 5
was found to be 87% accurate. Once we used backward elimination and subset analysis on the second model, we found that a new third model y = 7.90 − 0.008x_{1} − 0.05x_{2} + 0.02x_{3} was found to be 93% accurate. In developing these logistic regression models, we have found that calculating the odds ratio and relative risk ratio for each of the variables will help better explain which conditions are at a buoy in the Atlantic Basin decipher when a storm may be present.
Using a binary logistic regression model, we were able to determine what the relationship is between the storms being present or not present in the Atlantic Basin, given the conditions at the buoy, such as: buoy wind, buoy wind direction, buoy pressure, buoy atmospheric temperature and buoy water temperature. One of the contributions to the field of applied statistics using logistic regression was that we found the first brand new model was over 82% accurate in detecting when a storm will be present in the Atlantic Basin [
Another contribution to the field of applied statistics, using a multinomial logistic regression model, was to determine if the chances of a hurricane of category (1 - 5) occurring are low or high, given the buoy pressure, buoy atmospheric pressure and buoy water temperature. This evaluation of using the multinomial logistic regression model produced outcomes that may provide clarity on which predictor variables will determine if a storm is present (categorically) in the Atlantic Basin [
This paper is dedicated to Angeline Chromiak-Sears. Thank you for all the patience and understanding that you have shown me throughout the years. I couldn’t have gotten through the last few years without you. Thank you!
The authors declare no conflicts of interest regarding the publication of this paper.
D’Andrea, J.M., Wooten, R.D. and Pogoda, W.A. (2018) Odds Ratio & Relative Risk Ratio of Buoy Conditions for Storms in the Atlantic Basin. Open Journal of Statistics, 8, 747-759. https://doi.org/10.4236/ojs.2018.85049