Standardization of Winning Streaks in Sports

This research provides some metrics to better summarize streaks in sporting events with binary outcomes. In sporting events, information is often lost when “statistics” are presented regarding “streaks,” and whether or not certain teams or players have been recently been successful or unsuccessful. This usually leads to the presentation of metrics with no common baseline. This particular research effort provides statistics to capture the information regarding recent success or lack thereof, in a more standardized manner. To illustrate the presented metrics, data from the 2016 seasons for the American sports leagues National Basketball Association and Major League Baseball are used in an attempt to standardize streaks.


Introduction
Sports are rife with statistics-some of them are useful, while others are not. In baseball, a player's batting average is an important statistic. It tells us the number of base hits per plate appearances. A batting average of 0.300 (or 30%) is considered a very good batting average. A batting average of about 0.200 or less is not considered good. An Earned Run Average (ERA), which is the number of earned runs forfeited by a pitcher per every nine innings is another important statistic, which gives us an idea of the pitcher's ability to prevent the opponent from scoring. An earned run average of 3.00 is considered good, whereas an ERA of 2.00 or less is considered exceptional. In basketball, a shooting percentage of 50% or better is typically considered good. a value will not exist, as a pitcher ineffective to that degree will not be permitted to continue pitching. Typically, an ERA of 4.50 or above is considered ineffective.
Unfortunately, not all sports statistics fall into the "tidy" category of having firmly established baselines. Streaks fall into this unfortunate category. Articulation of a streak, such as a winning streak or losing streak, is intended to inform the fan of recent success or lack thereof. While this can be informative, it can also be inconsistent. Consider, for example, a baseball team that has won its last nine consecutive games. Clearly, this is an impressive run. Also consider that this same team has won 12 out of its last fifteen games-an 80% winning percentage. Which measure is more important and/or informative? There is no clear answer to this question. Let us further assume that before this team won 12 out of fifteen games, they lost five consecutive games. Therefore, this team has won 60% of its last 20 games-slightly better than average. When we "stretch" the chronology of the measurement, the success becomes less impressive. Let us review this via a When sports media talk about streaks, and/or recent runs of success or lack of success, there is a tendency to "package" the information in such a way that maximizes or enhances the success or lack of success. While the general point of this is understandable, such statistics are usually biased due to a small sample size.
The above is not intended to be critical of studying streaks. Streaks are important to show that binary outcomes in sports can on occasion defy expectation, and this is worth study. The intent here is to standardize streaks across a larger time frame. In short, it is intended here, to study streaks across an entire season, and isolate the teams that tend to show more "streakiness" than other teams. This standardization consists of a few new metrics to study consistency of winning and losing across an entire season. These metrics also consider the winning percentage of teams across the season. In other words, these analyses of streaks are adjusted for the team's success. After the metrics are presented, they are used to assess the performance of all teams in American Major League Baseball, and all teams in the American National Basketball Association for their respective 2016 seasons.

Literature Review
Much work has been done to study streaks in sports. Perhaps one reason for this is due to Joe DiMaggio's 56-game hitting streak in 1941. Many consider it the most impressive streak in sports history. Much effort has been put forth in an attempt to better understand the forces at work during the streak, which has subsequently led to deeper understanding to streaks in general, and the entities that are related to streaks, which could be considered possible "causes" or contributing factors to the existence of streaks [1]. Effort has gone into deciding whether or not a resultant set of data actually qualifies as an actual "streak" [2], and much work has been done at trying to predict streaks [3] [4] [5]. Vallone and Tversky [6] demonstrated that single outcomes in sports are not related to prior outcomes. Streaks have even been studied so that gamblers can improve their chances of successful sports betting [7].
Streaks are difficult to measure, because there are no rules defining what constitutes a meaningful streak. Because of this, there are many opportunities to research what is considered a streak, and how meaningful a streak is [8]. In certain ways, a streak can be related to a degree of variation that exists in the data-the "streakier" the data is, the more variation the data will show. Conversely, the less streaky the data, the smaller the variation. This problem has been addressed [9].
The work in this paper attempts to extend this work by first studying descriptive statistics associated with win/loss streak performance of two sports leagues, and secondly understanding the relationship between expected wins and actual wins throughout a single season. This second motivation has exploited previous work associated with production scheduling [10] to generalize actual win/loss performance with expected win/lost performance.

Methodology
Here, we first describe descriptive statistics associated with win/loss streaks of sports teams. Next, we describe a "Gap" measure which compares a team's actual win/loss performance to their expected performance though each game of the season. Finally, a "runs" test is described. The section concludes by illustrating the methodology via a simple, simulated data set.

Streak Analysis
The first part of our methodology pertains to actual streaks: winning streaks and losing streaks. Here, we compute all descriptive statistics relevant to winning and losing streaks. Prior to delving to the mathematics of these metrics, a table of definitions is provided (see Table 1).
Let us assume that there are n games in a season, m teams, and w ij represents team j winning game i, shown via the following: The total number of wins for team j (Wins j ) is computed as follows: Similarly, the total number of losses for team j (Losses j ) is computed as follows: The winning percentage for team j (Pct j ) is computed as follows: The above is trivial-we are simply comparing wins and losses for each team. More importantly, we wish to glean information from winning and losing streaks.
In order to do this, we need to use the w ij values to construct a list of winning and losing streaks for each team. In order to do this, we define the I th winning streak as follows: In other words, team j wins all games, starting with game a, and ending with game b. The values of both w a−1,j and w b+1,j are zero. This results in the I th winning streak of the following length: The count of winning streaks for team j is incremented by one via the following: It should be noted that for all j, WSC j is initialized to zero prior to analysis. The longest winning streak for team j is determined as follows: Calculating losing streak characteristics is done in similar fashion to winning streaks. First of all, the J th losing streak is defined as follows: Analogous to the case for winning streaks, the losing streak above has w a−1,j and w b+1,j values equal to 1. The length of this J th losing streak for team j is computed as follows: For team j, the count of the losing streaks is incremented as follows: As was the case with the winning streak counts, all j teams have their LSC j values initialized to zero prior to analysis. The longest losing streak length for team j is as follows:

Gap Analysis
There is another measure of importance that is not directly related to streaks.
That is the "smoothness" of a team's success throughout the season. For example, if we assume a team wins 66.67% of their games, and they won two games then lost one, with this pattern repeating itself throughout the season, their winning pattern would map exactly to their winning percentage, and the "smoothness" of their winning would be optimal. In reality, of course, this does not happen, so it's important to quantify the smoothness of teams' winning patterns. We can quantify this via a variation of the "smoothness index" that has been used to study many scheduling algorithms [10]. Given the above definitions, we call this smoothness index the "Gap" measure, and each team has such a measure. It is calculated as follows: This metric essentially tells us how many games team j has won through game i compared to how many games they are expected to win through game i. This difference is then squared, summed for all n games, and then the square root of this quantity is taken for standardization purposes. In layman's terms, this metric tells us the smoothness of a team's winning pattern. Lower quantities suggest more consistency in the winning patterns, while higher quantities suggest less consistency in winning patterns.

Runs Test
A "runs test" is a popular way to determine if a sequence of binary outcomes is truly random [11]. The runs test essentially has three properties that can be gathered from the sequence of binary outcomes: n 1 is the total number of one type of binary outcome (Wins j regarding this effort), n 2 is the total number of the other type of binary outcome (Losses j regarding this effort), while "r" is the number of streaks (or "runs"), analogous to WSC j + LSC j regarding this effort.
We compute the mean, standard deviation, and associated z-score according to the following: Given the standardized normal deviate, we can determine the two-tailed pvalue is follows: If the p-value associated with the test is less than a pre-specified critical value, we reject the null hypothesis and claim that the values comprising the sequence is not random. Otherwise, we fail to reject the null hypothesis and conclude that the values comprising the sequence are in fact random.
This research effort employs the runs test to see if the win-loss distribution is random or not.

Example Problem
A "toy" data set is presented to provide an illustration as to how the presented metrics work. The data set is binary data on the passing success of an American football quarterback. A "1" means an attempted pass was completed, a "0" means the attempted pass was not completed. The data set is simulated such that the percentage of completed passes is 57%. One hundred simulated passes were generated. The simulated data is as follows: This data set was used for the presented formulae, and the statistics are summarized accordingly. For the winning streak statistics, "n" is used to represent WSC j , " x " is used to represent j W x , "s" is used to represent s Wj , and "max" is used to represent MaxW j . Similarly for losing streaks, "n" is used to represent LSC j , " x " is used to represent j L x , "s" is used to represent s Lj , and "max" is used to represent MaxL j . Since there is only a single entity (one "team," so to speak) for this example, the subscript is not recognized for convenience. The winning streak data and the losing streak data are segregated for the presentation below (see Table 2): In the context of this example, a completion is considered a success (analogous to a "win") while an incompletion is considered a failure (analogous to a "loss").

Experimentation
The simple example above is used to merely illustrate the use of the presented statistics. It is important to apply the presented methodology to real data to better understand the "streakiness," consistency and/or winning patterns for real It is of particular interest to understand which teams are most "streaky," or inconsistent. The presented methodology is intended to shed some light on this fundamental question.

This section presents the results for the aforementioned leagues-Major League
Baseball and the National Basketball Association. The results are then discussed in some detail.

Major League Baseball
MLB has 162 games on their regular season schedule. However, not all of the thirty teams play this number of games, due to rain postponements and the like.  Clearly, better teams will have longer average winning streaks, while lesser teams will have longer average losing streaks. In fact, the correlation between the two entities is −0.5235, which is statistically significant (p = 0.0030).
In terms of consistency, or lack of "streakiness," the following teams did well: St. Louis, San Diego, Milwaukee, Oakland, Arizona, Washington, Cleveland, Los Angeles Dodgers and Detroit, all having "Gap" measures of less than 20. These teams, in effect, consistently won and lost games at a rate in accordance with their overall winning percentage. They had relatively small maximum winning streaks, and relatively small maximum losing streaks. The exception to this was Cleveland, whose long 14 game winning streak was basically offset by a very short maximum losing streak of 3 games.
In terms of inconsistency, or "streakiness," the Atlanta Braves are seen to be the most salient, which a "Gap" measure of 76.3. This high number can be understood by taking note of their maximum winning streak of (7) games and their maximum losing streak of (9) games-both long streaks, which detract from their consistency. San Francisco was also "streaky," with a maximum winning streak of (8) games, and a maximum losing streak of (6) games.
It should be noted that there is no significant correlation between a team's winning percentage and their "Gap" measure-the correlation is −0.2175, with a p-value of 0.2483. As such, the "Gap" measure does provide information beyond a team's winning percentage.
The runs test for this data set is not informative, because in all instances, the conclusion is random sequences of wins and losses.

National Basketball Association
In the NBA, there are (82) regular-season games scheduled. Unlike MLB, these games are not postponed due to weather. As such, all teams play 82 games in a regular season.
For the 2016 regular NBA season, the Golden State Warriors had the best winning percentage (0.8902), which was the best record in league history-no team had ever won 73 games in a season before Golden State accomplished this.
Conversely, the Philadelphia 76 ers had the worst winning percentage in the league (0.1220). All findings are shown in Table 4.
In terms of winning streaks, Golden State had the longest average winning streak of 7.3 wins/streak, followed by San Antonio, with 5.15 average wins/ streak. It is also worth noting that Golden State started the season with a recordbreaking 24-game win streak. For losing streaks, Philadelphia had the longest The most consistent or least "streaky" teams were the Milwaukee Bucks, Los Angeles Lakers and the Denver Nuggets, all with "Gap" measures under (9).
None of these teams had winning streaks in excess of (4) wins, and their maximum losing streaks, while as high as (10) losses for Los Angeles, were nevertheless consistent with their dismal winning percentages. In short, these three teams won their games fairly proportionally to winning percentages.
There are three teams that are very streaky, or inconsistent throughout the season: the Portland Trail Blazers, the Memphis Grizzlies, and the Charlotte Hornets. Despite all of these teams being good, with winning percentages above 0.500, and making the playoffs, inconsistency throughout the regular season is prevalent. Portland had a maximum winning streak of (6) games and a maximum losing streak of (7) games. Memphis had a maximum winning streak of (5) games and a maximum losing streak of (6) games. Charlotte had maximum winning and losing streaks of (7) games each. These lengthy winning and losing streaks increase the "Gap" measure to the "top of the list." As is the case with MLB, there is no correlation between a team's winning percentage and their "Gap" measure. The correlation is 0.0197, with an associated p-value of 0.9178. Given this lack of relationship, the "Gap" measure does provide information beyond winning percentage. As was the case for the runs test used for MLB, the result is not informative, as the runs tests informs us that the sequence of wins and losses is random.

Comparison of MLB and NBA
There is a vast difference in the win/loss dynamic between the NBA and MLB. There is much more disparity in the NBA as compared to MLB. Table 5 shows a general breakdown of winning percentage statistics.
The average winning percentage for any league will always be 50%, because every team's win is offset by another team's loss. The standard deviation in winning percentage (Albert, 2012) through the league is a different story, however. In MLB this value is 6.62%, but in the NBA it is 16.92%-the NBA has much more performance parity as compared to MLB. The same applies to the winning percentage gap between the best and worst teams in the league. The difference in winning percentage between the Chicago Cubs and the Minnesota Twins (the best and worst teams in the league) is 27.56%. Similarly, the difference in winning percentage between the Golden State Warriors and the Philadelphia 76 ers is 76.82%-an immense difference in success. Figure 1 and Figure 2 also show a difference in the win/loss dynamic when MLB is compared against the NBA. These two plots are organized such that the horizontal axis represents the winning percentage, while the vertical axis is the   sum of the number of winning streaks and losing streaks (WSC j + LSC j ) for each team. Figure 1 shows no relationship between winning percentage and number of streaks-there is seemingly noting interesting to report. Figure 2, however, shows a nonlinear relationship between winning percentage and number of streaks.
The mediocre teams have more streaks than as compared to the teams that perform very poorly and/or very well. Figure 3 shows the same as Figure 2, but with teams removed whose winning percentages are confined to match the range of MLB winning percentages. In other words, this filtered data set has omitted teams whose winning percentages are more extreme than those from the MLB data set.  The high disparity in success for the NBA as compared to MLB is a surprise finding. Nevertheless, this enables us to basically conclude that the NBA team performance is more "streaky" or less consistent as compared to MLB. Our streak statistics and "Gap" measure have demonstrated that.

Concluding Comments
Methodology has been presented in an attempt to standardize streaks in sports.
We have presented these metrics in two different forms: studying streaks via descriptive statistics associated with the streaks themselves, and studying how a team performs throughout the season as compared to how they should perform according to their season winning percentage. The methodology was applied to the 2016 NBA and 2016 MLB seasons. Our findings have shown us that NBA performance involves much more disparity as compared to MLB. The reason for this, beyond statistical analysis, is beyond the scope of the paper. This type of binary analysis involves winning or losing. Our "toy" problem data set used completions vs. incompletions for an American football quarterback. The binary nature of our data can be used for many other sporting applications: a baseball player's batting average for all at bats (hit vs. no hit), a soccer player's success regarding penalty kicks (goal vs no goal), a hockey player's success regarding penalty shots (goal vs. no goal), etc.
This type of analysis can also be used for other binary outcomes outside the world of sports. For example, we can study market streaks at the close of some stock exchange-market increase vs. market decrease. We can study streaks regarding the success of salespeople-successful sales call (customer places an order) vs. an unsuccessful sales call. In short, the applications for studying streaks