^{1}

^{1}

Causal explanations are often favored to explain club performance in soccer tournaments, and the role that luck plays is usually neglected. Here, we consider three recent seasons of the first-division Brazilian soccer league to examine the relative importance to club success of loss aversion (a causal explanation) and regression to the mean (luck). We find that club performance depends on both, and quantify this finding.

Fluctuation in club performance is common to occur in the course of soccer tournaments. Both pundits and fans usually explain such a phenomenon relying exclusively on causal explanations, such as: “injuries impaired team A;” “refereeing was biased toward team B;” “team C chickened out in the final rounds;” “a change in management negatively affected the environment in club D;” and so on. A narrative is seemingly always available to account for success or failure. However, this is at odds with the story of success in sports and elsewhere [

Causal explanations often ignore a key aspect of reality: random fluctuations in performance, that is, plain luck [

System 1’s insistence on causal explanations is due to both: 1) the analytical mind (a.k.a. System 2)’s difficulties in understanding the nuances of the notion of randomness, and 2) the cognitive ease brought about by simple causal explanations that give an air of inevitability to past events [

Here, we consider data from the three more recent seasons of the first-division Brazilian soccer league, called Serie A. Section 2 considers the issue of regression to the mean. First, we evaluate whether club performance in the first half-season of 2016, 2017 and 2018 is predictive of performance in the second half-season. We ignore earlier seasons because two extra clubs could join Copa Libertadores qualifiers from 2016 onward, thus altering the structure of incentives the teams faced. Regression to the mean is detected if the clubs that are doing well in the first half-season end up performing relatively poorly in the second half-season.

Section 3 investigates whether a causal explanation is also at play―namely, loss aversion. In this case, part of regression to the mean could be genuinely addressed to a causal explanation. In sports, loss aversion has been shown to occur in golf tournaments, where professional golfers putt more accurately for par than for a birdie [

Section 4 discusses which one is more important―regression to the mean or loss aversion―and Section 5 concludes this report.

First, we test whether clubs with points won above the median in the first half-season tend to score relatively fewer points in the second, whereas clubs that score points below the median in the first half-season tend to end up with relatively more points in the second. We consider the median rather than the mean because our data have outliers. Indeed, when sample size is large and does not include outliers, the mean score usually provides a better measure of central tendency; and we use the median to describe the middle of a set of data that does have an outlier.

Using ordinary least squares regressions for each of the three seasons, we take as an independent variable the deviations from the median of the points won by a club in the first half-season. The dependent variable is the deviations from the median of the points won by the same club in the second half-season relative to those won in the first half-season. Here, we cannot dismiss the hypothesis of regression to the median if the angular coefficient of the estimated regression line is negative.

We first consider a sample of the 20 clubs that take part in Serie A for each of the three seasons. Then, we drop outliers, that is, the club that lost more points in the second half-season as compared to their performance in the first, as well the club that won more points. Doing so, we assess whether results are not being affected by extreme data points. We also check for the robustness of results by running the regressions without the linear coefficient. Lastly, we repeat our analysis by considering the data pooled for the three seasons.

This season was the most idiosyncratic of all. The beginning (until the eighth matchday) was marked by the leadership of two teams that ended up relegated (Santa Cruz and Internacional). Besides, overall club performance was superior in the second half-season (522 versus 515), which means an expected positive intercept for the regression.

Club | Points won in the first half-season (P1) | Deviation from the median (24.5) | Points won in the second half-season (P2) | P2-P1 |
---|---|---|---|---|

América MG | 13 | −11.5 | 15 | 2 |

Atlético MG | 35 | 10.5 | 27 | −8 |

AtléticoPR | 30 | 5.5 | 27 | −3 |

Botafogo | 20 | −4.5 | 36 | 16 |

Chapecoense | 24 | −0.5 | 28 | 4 |

Corinthians | 34 | 9.5 | 21 | −13 |

Coritiba | 21 | −3.5 | 25 | 4 |

Cruzeiro | 19 | −5.5 | 32 | 13 |

Figueirense | 21 | −3.5 | 16 | -5 |

Flamengo | 34 | 9.5 | 37 | 3 |

Fluminense | 25 | 0.5 | 22 | −3 |

Grêmio | 32 | 7.5 | 21 | −11 |

Internacional | 22 | −2.5 | 21 | −1 |

Palmeiras | 36 | 11.5 | 44 | 8 |

Ponte Preta | 27 | 2.5 | 26 | −1 |

Santa Cruz | 18 | −6.5 | 13 | −5 |

Santos | 33 | 8.5 | 38 | 5 |

São Paulo | 26 | 1.5 | 26 | 0 |

Sport | 23 | −1.5 | 24 | 1 |

Vitória | 22 | −2.5 | 23 | 1 |

Coefficient | Standard error | t-statistic | p-value | |
---|---|---|---|---|

Intercept | 0.79 | 1.59 | 0.49 | 0.62 |

Deviation from the median | −0.35 | 0.24 | −1.44 | 0.16 |

Moreover, an R squared of 0.104 was the lowest of the three seasons. Alas, ignoring the intercept does not change results a great deal. However, results do improve after dropping from analysis of the two outliers Santa Cruz and Internacional because the angular coefficient turns significant at 10 percent (

This season was characterized by an outstanding performance by champions Corinthians. However, this club scored 47 points in the first half-season, but only 25 in the second (^{th} round and was anticipated by Grêmio coach Renato Portaluppi, who seems to have had a glimpse of the phenomenon. In turn, Atlético GO, which ended up at the bottom of the table, also experienced strong regression to the median.

The results in

Regression to the median also seems to have occurred in this season. São Paulo’s excellent performance in the first half-season (41 points won) was followed by a very disappointing second half-season (22 points) (

A large p-value for the intercept in

Coefficient | Standard error | t-statistic | p-value | |
---|---|---|---|---|

Intercept | 1.59 | 1.72 | 0.91 | 0.37 |

Deviation from the median | −0.46 | 0.25 | −1.77 | 0.094 |

Club | Points won in the first half-season (P1) | Deviation from the median (25) | Points won in the second half-season (P2) | P2-P1 |
---|---|---|---|---|

Atlético GO | 12 | −13 | 24 | 12 |

Atlético MG | 23 | −2 | 31 | 8 |

Atlético PR | 26 | 1 | 25 | −1 |

Avaí | 18 | −7 | 25 | 7 |

Bahia | 23 | −2 | 27 | 4 |

Botafogo | 25 | 0 | 28 | 3 |

Chapecoense | 22 | −3 | 32 | 10 |

Corinthians | 47 | 22 | 25 | −22 |

Coritiba | 25 | 0 | 18 | −7 |

Cruzeiro | 27 | 2 | 30 | 3 |

Flamengo | 29 | 4 | 27 | −2 |

Fluminense | 25 | 0 | 21 | −4 |

Grêmio | 39 | 14 | 23 | −16 |

Palmeiras | 32 | 7 | 31 | −1 |

Ponte Preta | 22 | −3 | 16 | −6 |

Santos | 35 | 10 | 28 | −7 |

São Paulo | 19 | −6 | 31 | 12 |

Sport | 28 | 3 | 17 | −11 |

Vasco | 24 | −1 | 32 | 8 |

Vitória | 19 | −6 | 24 | 5 |

Coefficient | Standard error | t-statistic | p-value | |
---|---|---|---|---|

Intercept | 0.75 | 1.14 | 0.66 | 0.51 |

Deviation from the median | −1.00 | 0.14 | −6.77 | 2.38E−06 |

Coefficient | Standard error | t-statistic | p-value | |
---|---|---|---|---|

Intercept | 0.91 | 1.26 | 0.71 | 0.48 |

Deviation from the median | −1.03 | 0.23 | −4.40 | 0.000441 |

Club | Points won in the first half-season (P1) | Deviation from the median (22.5) | Points won in the second half-season (P2) | P2-P1 |
---|---|---|---|---|

América MG | 22 | −.5 | 18 | −4 |

Atlético MG | 33 | 10.5 | 26 | −7 |

Atlético PR | 21 | −1.5 | 36 | 15 |

Bahia | 25 | 2.5 | 23 | −2 |

Botafogo | 22 | −.5 | 29 | 7 |

Ceará | 16 | −6.5 | 28 | 12 |

Chapecoense | 21 | −1.5 | 23 | 2 |

Corinthians | 26 | 3.5 | 18 | −8 |

Cruzeiro | 26 | 3.5 | 27 | 1 |

Flamengo | 37 | 14.5 | 35 | −2 |

Fluminense | 23 | .5 | 22 | −1 |

Grêmio | 36 | 13.5 | 30 | −6 |

Internacional | 38 | 15.5 | 31 | −7 |

Palmeiras | 33 | 10.5 | 47 | 14 |

Paraná | 14 | −8.5 | 9 | −5 |

Santos | 21 | −1.5 | 29 | 8 |

São Paulo | 41 | 18.5 | 22 | −19 |

Sport | 20 | −2.5 | 22 | 2 |

Vasco da Gama | 21 | −1.5 | 22 | 1 |

Vitória | 19 | −3.5 | 18 | −1 |

Coefficient | Standard error | t-statistic | p-value | |
---|---|---|---|---|

Intercept | 1.69 | 1.79 | .94 | 0.35 |

Deviation from the median | −0.52 | 0.21 | −2.40 | 0.0273 |

These results seem to depend on outliers, however.

Regression to the median could not be discarded after pooling the three seasons, either. The angular coefficient was negative and significant at 1 percent. However,

Coefficient | Standard error | t-statistic | p-value | |
---|---|---|---|---|

Deviation from the median | −0.44 | 0.19 | −2.21 | 0.0391 |

Coefficient | Standard error | t−statistic | p−value | |
---|---|---|---|---|

Intercept | 0.93 | 1.60 | 0.58 | 0.56 |

Deviation from the median | −0.26 | 0.21 | −1.26 | 0.22 |

the intercept became non-significant and the R squared was 0.33. Moreover, running regressions without either the intercept or outliers did not change the results a great deal (

In sum, we cannot dismiss the hypothesis of regression to the median in the 2016, 2017 and 2018 seasons of the Brazilian soccer league Serie A. Luck played a role in club success or failure. However, though we do not need it, a causal explanation can still overlap the statistical fact of regression to the mean. For this reason, we turn next to evaluate how a causal explanation might also matter.

One credible narrative is that club performance can also depend on loss aversion, that is, clubs fighting relegation have more incentive to win matches than clubs targeting promotion to Copa Libertadores. To test this hypothesis, we consider the eight teams on top and the eight at the bottom in Matchday 28 of 38 in each of the three seasons, as in

To evaluate the statistical significance of the mean differences in

Loss aversion can explain club performance as an regression to the mean. Puzzling enough, what was found as regression to the mean in Section 2 could have been explained by loss aversion as well. Likewise, what was found as loss aversion in Section 3 could have been explained by regression to the mean, too. Then, how can we disentangle the role of each in club performance? Here, some arithmetic might be useful.

Coefficient | Standard error | t−statistic | p−value | |
---|---|---|---|---|

Intercept | 1.21 | 0.89 | 1.36 | 0.17 |

Deviation from the median | −0.64 | 0.11 | −5.44 | 1.08E−06 |

2016 | 2017 | 2018 | |||
---|---|---|---|---|---|

Relegation | Promotion | Relegation | Promotion | Relegation | Promotion |

América MG (35) | Atlético MG | AtléticoGO (36) | Botafogo | AméricaMG (38) | AtléticoMG |

Cruzeiro | AtléticoPR | Avaí (38) | Corinthians | Bahia | Flamengo |

Figueirense (36) | Botafogo | Chapecoense | Cruzeiro | Ceará | Fluminense |

Internacional (38) | Corinthians | Coritiba (38) | Flamengo | Chapecoense | Grêmio |

Santa Cruz (35) | Flamengo | Ponte Preta (38) | Grêmio | Paraná (32) | Internacional |

São Paulo | Fluminense | São Paulo | Palmeiras | Sport (38) | Palmeiras |

Sport | Palmeiras | Sport | Santos | Vasco | Santos |

Vitória | Santos | Vitória | Vasco | Vitória (37) | São Paulo |

Note: Figures in parentheses show the matchday where relegation actually occurred, and thereafter a relegated club is left out from the sample.

2016 | 2017 | 2018 | Mean | |
---|---|---|---|---|

Number of direct confrontation over the first 10 matchdays | 31 | 42 | 31 | 34.67 |

Percent of points won by the clubs fighting relegation over the first 10 matchdays | 37.8 | 34.12 | 24.72 | 32.21 |

Number of direct confrontation over the last 10 matchdays | 39 | 32 | 29 | 33.33 |

Percent of points won by the clubs fighting relegation over the last 10 matchdays | 33.33 | 45.82 | 44.82 | 41.32 |

First, assume there is no role for luck in the points won by the clubs, that is, only loss aversion explains performance. Note that a club fighting relegation has eight confrontation matches against the clubs aiming at promotion in the second half-season. This means there is an expected total average advantage for the club of 2.19 points (that is, 8 × 0.2733). However, we also computed the average performance for those clubs below the median in the three seasons as −4.03. Considering the regression in

Now hypothesize loss aversion plays no role. In this case, we have to assume the linear coefficient of the estimated regression line in

Thus, the two exercises above demonstrate that club performance should be explained by both regression to the mean and loss aversion.

Pundits and fans alike fall prey of the narrative fallacy when explaining soccer club performance by talent alone. This tendency is ingrained in our automatic mind. However, success is also a matter of luck. Here, we consider loss aversion to explain club success and failure, but also the role luck plays through the commonly neglected phenomenon of regression to the mean.

Our data refer to recent seasons of the first-division Brazilian soccer league, called Serie A. To test regression to the mean, we examine whether the clubs scoring above the median in the first half-season tend to score relatively fewer points in the second, whereas the clubs that score points below the median in the first half-season tend to end up with relatively more points in the second. And to test loss aversion, we investigate whether the clubs struggling to escape the relegation zone perform relatively better than the clubs aiming at promotion to Copa Libertadores. Here, loss aversion means the underdogs have more to lose than the favorites.

In the end, we find that club performance should be explained by both regression to the mean and loss aversion, and provide an exercise to quantify the role each play.

Financial support from CNPq and Capes is acknowledged.

The authors declare no conflicts of interest regarding the publication of this paper.

Esquierro, L. and Da Silva, S. (2019) Role of Regression to the Mean and Loss Aversion in Brazilian Soccer Club Performance. Open Access Library Journal, 6: e5603. https://doi.org/10.4236/oalib.1105603