Odds Ratio & Relative Risk Ratio of Buoy Conditions for Storms in the Atlantic Basin

The purpose of this paper is to bring awareness to the general public that certain conditions that occur at a buoy in the Atlantic Basin, such as wind located at the buoy, pressure located at a buoy, water temperature located at a buoy, atmospheric pressure located at a buoy, may be useful in helping predict when a hurricane could possibly hit the state of Florida in the future. One of the goals of this paper is to bring new statistical methods to investigate and analyze data, which will create better predicable measures in determining when a hurricane will possibly hit the state of Florida. In this paper, the topics of binary logistic regression and multinomial regression modeling are discussed in reference to their outcomes of both the odds ratio and relative risk ratio respectively. The coefficients from these models will show which prospective buoy conditions are possibly more responsible for indication of a storm being present in the Atlantic Basin. In this paper, the data that was used and compiled into a larger data set came from two different sources. First, the hurricane data for the years 1992-2013 came from Unisys Weather site (Atlantic Basin Hurricanes data) and the buoy data has been available from the National Buoy Center. In this paper, the variables of interest are: storm present, buoy wind speed, buoy pressure, buoy atmospheric temperature, buoy water temperature and buoy wind direction. The buoy conditions are the buoy wind, the buoy wind direction, the buoy pressure, buoy atmospheric temperature and the buoy water temperature.


Introduction
The research statements/questions to be addressed in this chapter are: 1) Determining the odds of a storm being present in the Atlantic Basin, given the buoy conditions; 2) Determining the odds of a storm being present categorically, given what the conditions are at the buoy.
In order to address the first research statement, we will use the concept of logistic regression.We will let the buoy wind be denoted as x 1 , buoy wind direction as x 2 , buoy pressure as x 3 , buoy atmospheric temperature as x 4 , and the buoy water temperature as x 5 [1].In general, we can have multiple predictor variables in a logistic regression model.

Methods
Applying such a model to our dataset, each estimated coefficient is the expected change in the log odds of a storm being present for a unit increase in the corresponding predictor variable holding the other predictor variables constant (please refer to the methodology chapter for further explanation).The binary logistic regression model has with multiple predictor variables and no interaction terms.
The categorical dependent variable in our model is the storm present (y), and the predictor variables are the buoy wind speed (x 1 ), buoy wind direction (x 2 ), buoy pressure (x 3 ), buoy atmospheric temperature (x 4 ), and buoy water temperature (x 5 ) [2].Our response variable y (based on the buoy conditions) has two outcomes; there was a storm present or there wasn't a storm present, 1, if a storm is present 0, otherwise The general logistic model is denoted as: ( ) Using the binomial logistic regression methods for computing the coefficient estimates for the predictor variables, we found that each predictor variable was significantly contributing to the model above [3].After using a t-test and finding the associated p-values, we found that the order of the most significantly contributing predictor variables were: x 5 , x 4 , x 1 , x 3 , x 2 [3].The most significant contributing predictor variable was the buoy water temperature (x 5 ) with a p-value of 0.011, the second significant contributing predictor variable was the buoy atmospheric temperature (x 4 ) with a p-value of 0.032, the third most significant contributing predictor variable was the buoy wind (x 1 ) with a p-value of 0.061, the fourth most significant contributing predictor variable was the buoy pressure (x 3 ) with a p-value of 0.080, and the fifth most significant contributing predictor variable was the buoy wind direction (x 2 ) with a p-value of 0.122.The coefficients of the predictor variables and their associated p-value ranking of significance can be seen in Our developed model was found to be an 83% accurate model, based on the calculation of the probability of accuracy.We will now interpret the odds, where the odds are the function that exponentiate the predicted logit coefficients.We are interested in determining the odds of a storm being present in the Atlantic Basin, given the buoy conditions.First, we will look at the buoy conditions of wind speed and wind directions and interpret the odds result.In our data set, the range for the buoy wind speed is 40 mph (wind speed at the buoy), whereas the mean is 5.9.The range for the buoy wind direction is 26 (min value is 2.60 and the max value is 28.60).In Figure 1, the median and the mean are both 20, which means that the distribution for the buoy wind direction is approximating a symmetric distribution.
In the following analysis of the odds for the buoy wind speed and wind direction, we will be looking at the buoy wind speed and wind direction conditions for when the odds are greater than 3, to see if these conditions are indicators for when a storm is present.If the odds are greater than 3, then the buoy conditions for the wind speed is between 15 mph and 25 mph, and the wind direction is between 22 and 26 [1].
In the following Figure 2, the red circles indicate the chances of a storm being present, when the odds are greater than 3 with the buoy conditions of the wind speed between 15 mph and 25 mph and the wind direction between 22 and 26.
This indicates that the chances of a storm being present is greater than 50% and is more likely to occur.
Thus, we can say that when the buoy wind speed is between 15 mph and 25 mph and the buoy wind direction is between 22 and 26, the odds are greater than 3, indicating that there will most likely be a storm present in the Atlantic

Basin. Open Journal of Statistics
Next, we will look at the buoy conditions of the buoy wind speed and buoy pressure and interpret the odds result.In our data set, the range for the buoy pressure is 101.5 mb (min value is 935.5 mb and the max value is 1037 mb), these are measured in mill bars.In the following Figure 3, the median and the mean are both 1016 mb, which means that the distribution for the buoy pressure is approximating a symmetric distribution.
In the following Figure 4, the purple circles indicate the chances of a storm being present, when the odds are greater than 20 with the buoy conditions of the wind speed between 15 mph and 40 mph and the pressure between 935 mb and 990 mb.
Thus, we can say that when the buoy wind speed is between 15 mph and 40 mph and the buoy pressure is between 935 mb and 990 mb, the odds are greater than 20, indicating that there will most likely be a storm present in the Atlantic Basin [3].
Next, we will look at the buoy conditions of the buoy atmospheric temperature and buoy water temperature and interpret the odds result.In our data set, the range for the buoy atmospheric temperature is 23.7 degrees Celsius (min value is 9.60 and the max value is 33.30), these are measured in Celsius, and the mean is 25.The range for the buoy water temperature is 13.7 degrees Celsius (min value is 20.10 and the max value is 33.80), these are measured in Celsius, and the mean is 26.42.
In the following analysis of the odds for the buoy atmospheric temperature and the buoy water temperature, we will be looking at the buoy atmospheric temperature and water temperature conditions for when the probability of the odds are greater than 3, to see if these are indicators for when a storm is present.
In the following Figure 5, the green circles indicate the chances of a storm being present, when the odds are greater than 3 with the buoy conditions of the atmospheric temperature between 24 and 30 and the water temperature between 28 and 34.Thus, we can say that when the atmospheric temperature is between 24 and 30 and the water temperature is between 28 and 34, the odds are greater than 3, indicating that there will most likely be a storm present in the Atlantic Basin.
This makes sense since the mean of the buoy atmospheric temperature is 25, and the third quartile of the buoy water temperature is 29.The interquartile range is 29 − 24.45 = 5.55 (which explains the temperature intervals of 28 -34).
When the odds are greater than 3, we can say that when the buoy wind speed is between 15 mph and 25 mph, the wind direction is between 22 and 26, the buoy atmospheric temperature is between 24 and 30, and the buoy water temperature is between 28 and 34, then there will most likely be a storm present in the Atlantic Basin.Also, when the odds are greater than 20, the wind speed is between 15 mph and 40 mph while the buoy pressure is between 935 mb and 990 mb.Thus, when the odds are greater than 20 with the speed being between 15 mph and 40 mph and the buoy pressure is between 935 mb and 990 mb, then there will most likely be a storm present in the Atlantic Basin [2].
Since our model is showing to be 83% accurate, we can now find the odds ratio (OR) of the predictor variables within the model.We will be using the previous analysis of the buoy conditions of when the odds are greater than 3 and greater than 20 (buoy wind and buoy pressure) to find the odds ratio of the predictor variables in our model.To find each exponentiated coefficient OR (odds ratio) of the variables, we need to first substitute the coefficients estimates for buoy wind, buoy wind direction, buoy pressure, buoy atmospheric temperature, and buoy water temperature, into the exponential of their coefficients, (i.e., 1 2 e ,e , e n β β β


). Table 2 displays the coefficients for our predictor variables and their exponentiated coefficients.
Since we have five predictor variables, we will have five cases where each of the predictor variables has a fixed value so that we can estimate each predictor variables odds of having a storm being present.First, we will hold all but one variable fixed and test the effects of a unit(s) change.This means that we will hold the buoy wind direction, buoy pressure, buoy atmospheric temperature and buoy water temperature at a fixed value.Holding the buoy wind direction, buoy pressure, buoy atmospheric temperature and the buoy water temperature at a fixed value, the odds of a storm being present (1) for when the buoy wind speed is between 15 -40 mph, over the odds of a storm not being present (0) with the buoy wind speed not being between 15 -40 mph is exp (0.72) = 2.06.We can say that the odds for a storm being present when the buoy wind speed is between 10 -25 mph is 206% higher than the odds for a storm not being present when the buoy wind speed is not between 15 -40 mph.When the buoy wind direction, buoy pressure, buoy atmospheric temperature and buoy water temperature are held constant (or fixed) and the OR for buoy wind is greater than 1, then this means that the probability of a storm being present increases with the buoy wind conditions being from 15 -40 mph.In other words, when the wind speed ranges from 15 -40 mph at the buoy then the higher the chance a storm will be present.
Next, we will hold the buoy wind speed, buoy pressure, buoy atmospheric temperature and buoy water temperature at a fixed value.When holding these predictor variables at a fixed value, the odds of a storm being present (1) for the buoy wind direction being between 22 and 26 over the odds of a storm not being present (0) with the wind direction not being between 22 and 26 is exp (0.86) = 2.36.When the odds for a storm being present when the wind direction is between 22 and 26 is 232% higher than the odds for a storm not being present when the buoy wind direction is not between 22 and 26.Since the OR for the wind direction is greater than 1, then the probability of a storm being present increases with the buoy wind direction conditions between 22 and 26 [4].
Holding the buoy wind speed, buoy atmospheric temperature and the buoy water temperature at a fixed value, the odds of a storm being present (1) for the buoy pressure being between 935 mb and 990 mb over the odds of a storm not being present (0) with the buoy pressure not being between 935 mb and 990 mb is exp (−0.09) = 0.91.We can say that the odds for a storm being present for the buoy pressure being between 935 mb and 990 mb is 91% higher than the odds for a storm not being present when the buoy pressure is higher than the values between 935 mb and 990 mb.The coefficient for the buoy pressure says that, holding the buoy wind, buoy atmospheric pressure and the buoy water temperature at a fixed value, we will see a −9% decrease in the odds of having a storm DOI: 10.4236/ojs.2018.85049754 Open Journal of Statistics being present when the buoy pressure is higher than the values between 935 mb and 990 mb.The OR for buoy pressure is less than 1, which means that the odds of a storm being present are lower when the buoy pressure is higher than the values between 935 mb and 990 mb.This implies that the probability of a storm being present decreases when the buoy pressure (pressure located at the buoy) values are not between 935 mb and 990 mb.Next, we will hold the buoy wind speed, buoy wind direction, buoy pressure and buoy water temperature at a fixed value [3].
Holding the buoy wind speed, buoy wind direction, buoy pressure and the buoy water temperature at a fixed value, the odds of a storm being present (1) for the buoy atmospheric pressure being high in between the values of 24 and 30 over the odds of a storm not being present (0) with the buoy atmospheric temperature being lower than the values between 24 and 30 is exp (0.93) = 2.53.In terms of percent change, we can say that the odds for a storm being present for the buoy atmospheric pressure being between the values of 24 and 30 is 253% higher than the odds for a storm not being present when the buoy atmospheric pressure is not between the values of 24 and 30.Since the OR for buoy atmospheric temperature is greater than 1, this means that the probability of a storm being present increases when the atmospheric temperature values at the buoy are between 24 and 30.Now, holding the buoy wind speed, buoy wind direction, buoy pressure, and buoy atmospheric pressure at a fixed value, the odds of a storm being present (1) for the buoy water temperature being between the values of 28 and 34 over the odds of a storm not being present (0) with the buoy water temperature not being between the values of 28 and 34 is exp (1.21) = 3.35.We can say that the odds for a storm being present for the buoy water temperature being between the values of 28 and 34 is 3.35% higher than the odds for a storm not being present when the buoy water temperature is lower than these values.
The coefficient for the buoy water temperature says that, holding the buoy wind speed, buoy pressure, and buoy atmospheric temperature at a fixed value, we will see a 121% increase in the odds of having a storm present when the buoy water temperature is higher (between the values of 28 and 34) than the average water temperature of 26.Since the OR for buoy water temperature is greater than 1, then this means that the probability of a storm being present increases when the water temperature (the water temperature located at the buoy) is between the values of 28 and 34.In other words, the higher the water temperature is at the buoy the higher the chance a storm is present.Since the OR is less than 1 for the buoy pressure, than we will eliminate the buoy pressure predictor variable from our current binomial logistic regression model [3].
Next, we will address our second research statement, determine the odds of a storm being present categorically, given what the conditions are at the buoy.We will use (multinomial) logistic regression to further address our research statement.The categorical dependent variable in our model is the Hurricane Category (1 -5) (y), and the predictor variables are the buoy wind (x 1 ), buoy pres- sure (x 2 ), buoy atmospheric temperature (x 3 ), buoy water temperature (x 4 ), buoy atmospheric temperature*buoy water temperature (x 5 ).The predictor variables are the conditions at the buoy.
To address our research statement, first, the following analytic model form will be used to gather information regarding the coefficients with their associated relative risk ratios.Relative risk is used frequently in the statistical analysis of binary or multinomial outcomes where the outcome of interest has relatively low probability.The binary logistic model had an OR (odd ratio) interpretation for the coefficients of the model, the multinomial logistic regression model that we will be using will have a similar interpretation for the coefficients of the model.Instead of using OR, we will use the relative risk (RR) ratio for the interpretation of the coefficients in our multinomial logistic regression model [3].
Using the binomial logistic regression methods for computing the coefficient estimates for the predictor variables, we found that each predictor variable was significantly contributing to the model above [4].After using a t-test and finding the associated p-values, we found that the order of the most significantly contributing predictor variables was: x 1 , x 5 , x 3 , x 2 , x 4 .The most significant contributing predictor variable was the buoy wind (x 1 ) with a p-value of 0.31, the second significant contributing predictor variable was the buoy atmospheric temperature*buoy water temperature (x 5 ) with a p-value of 0.14, the third most significant contributing predictor variable was the buoy atmospheric temperature (x 3 ) with a p-value of 0.09, the fourth most significant contributing predictor variable was the buoy pressure (x 2 ) with a p-value of 0.052, and the fifth most significant contributing predictor variable was the buoy wind speed (x 4 ) with a p-value of 0.05 [4].The coefficients of the predictor variables and their associated p-value ranking of significance can be seen in Table 3.Note that in Table 3, buoy wind speed is denoted as B Wind, buoy pressure is B Pressure, buoy atmospheric temperature is B Atmospheric Temp, and buoy water temperature is B Water Temp, and the interaction term of buoy atmospheric temperature and the buoy water temperature is B Atmp*B Wtmp [3].
Our developed model with the coefficient estimates is as follows: y = 24.5074585+ 0.007x 1 − 0.06x 2 − 0.09x 3 − 0.06x 4 + 0.03x 5 Our developed model was found to be an 87% accurate model, based on the calculation of the probability of accuracy.To find each exponentiated coefficient RR (relative risk) of the variables, we need to first substitute the coefficients estimates for buoy wind (x 1 ), buoy pressure (x 2 ), buoy atmospheric temperature (x 3 ), buoy water temperature (x 4 ), buoy atmospheric temperature*buoy water temperature (x 5 ) into the exponential of their coefficients, (i.e., e ,e , e n β β β
Table 4 displays the coefficients for our predictor variables and their exponentiated coefficients.
Since the RR for buoy wind speed is equal 1, then this means that there is no difference that the probability that a hurricane of category (1 -5) will occur when the wind conditions at the buoy are between the values of 15 -40 mph [2].
In other words, when the wind speed ranges from 15 -40 mph at the buoy then there is no difference that a hurricane could or could not occur.Since the RR for buoy pressure is less than 1, then this means that the probability that a hurricane of category (1 -5) is more likely to occur, when the pressure at the buoy is between 935 mb and 990 mb [2].In other words, when the pressure located at the buoy is low, then there is a high chance of a hurricane of category (1 -5) to occur.Since the RR for buoy atmospheric temperature is less than 1, then this means that the probability that a hurricane of category (1 -5) is more likely to occur, when the atmospheric temperature is between the values of 24 and 30.In other words, when the atmospheric temperature located at the buoy is high (between the values of 24 and 30), then there is a high chance of a hurricane of category (1 -5) to occur.The RR for buoy water temperature is less than 1, then this means that the probability that a hurricane of category (1 -5) is more likely to occur, when the water temperature is between the values of 28 and 34.In other words, when the buoy water temperature is high (between the values of 28 and 34), then there is a high chance of a hurricane of category (1 -5) to occur.
The RR for the interaction term buoy atmospheric temperature* buoy water temperature is equal to 1, which means that there is no difference that the probability that a hurricane of category (1 -5) will occur [3].In other words, there is no significance or correlation among the other variables when the interaction term is included.Thus, we can conclude that when the pressure at the buoy is low (between the values of 935 mb and 990 mb), the atmospheric pressure at the buoy is high (between the values of 24 and 30), and the water temperature at the buoy is high (between the values of 28 and 34), then there is a high chance of a hurricane of category (1 -5) to occur.
Upon using the concept of backward elimination and subset analysis from the variables in the data set (Table 5), the following variables were found to be the most significantly contributing to use for our multinomial logistic regression model [3]; buoy pressure (x 1 ), buoy atmospheric temperature (x 2 ), and buoy water temperature (x 3 ).Our new developed model is the following:

Figure 1 .
Figure 1.Histogram for the buoy wind direction conditions.

Figure 2 .
Figure 2. The Odds greater than 3 of the buoy wind speed and buoy wind direction.

Figure 3 .
Figure 3. Histogram for the buoy pressure conditions.

Figure 4 .
Figure 4. Odds greater than 20 of the buoy wind speed and buoy pressure.

Figure 5 .
Figure 5. Odds greater than 3 of the buoy atmospheric temperature and buoy water temperature.

Table 1 .
Coefficients of the predictor variables and p-value ranking.

Table 2 .
Coefficients, exponentiated coefficients of the predictor variables.

Table 3 .
Coefficients of the predictor variables, p-values and their significance ranking.