Pythagorean expectation: What is it, and why should I care?

One of the major difficulties in performing analysis of sports is working within the limitations of the sample size. This is especially true of hurling; between the league and the All-Ireland, a senior county team might play as few as ten matches in a given year, given the current format of both competitions. Though the best teams usually rise to the top, and the worst sink to the bottom, this still allows for a huge amount of variation and ‘luck’. Trying to differentiate between a team that’s just on a hot streak, and that will soon revert to the mean, versus one that’s made genuine improvement can be extremely difficult. One tool to assist with this is Pythagorean expectation.

What is Pythagorean expectation, and why is it useful?

Pythagorean expectation is a formula originally created for baseball, by sports analytics pioneer Bill James. Since then, the idea has spread to many other sports. The idea is simple: if a team scores more than they concede, they should win more games. Though a team can have a positive scoring margin while losing most of their games, or vice versa, over enough games the total winning ratio should even out to a function of the scoring margin and the total amount scored and conceded. This equation can be written as...

...where W% is the winning percentage, Ps is the points scored and Pc is the points conceded. X is a constant value, related to empirical results in a given sport. This value changes from sport to sport, as some sports tend to be higher or lower scoring. For example, in a lower scoring game, like baseball, where the formula was first developed, a value of 1.83 is commonly used. A very high scoring game like basketball often uses a value of 13.91. The formula was run through the site’s database of hurling matches, and the value of X was determined by seeing where the value of X produced a predicted winning percentage which best matched the actual winning percentage. The most effective value for hurling was found to be 3.927.

An interpretation of this formula is that it can measure how ‘lucky’ a team was, by comparing how many games a team actually won over a series of matches, versus how many it should have won given the combined scores of those matches. If a team has a considerably higher pythagorean expectation than actual winning percentage, it could indicate that they’re better than their record indicates, and that they could be due for a better run in the future. Similarly, a team with a lower expectation than their actual results might be expected to soon revert to the mean, struggling to replicate their positive record as their actual ability becomes exposed.

Because this factors in how a team performed within the game, rather than just looking at the bottom line, it has a reputation for often giving a stronger indication of future performance than simply looking at the previous year’s results. In this article, we will apply the formula to hurling.

Year on year performance

A new season always brings uncertainty with it. New players are brought in, old players retire, managers change, systems change and, for better or worse, the activities of the off-season make their impact. Despite this, predictions still need to be made on what the new season will bring, and typically, most of what we have to go on is last year’s performance. However, Pythagorean expectation gives us an extra tool.

Each of the 2018 Munster and Leinster teams’ winning record for the years 2007-2017 were investigated, and the change from the previous year’s record was recorded. This was then compared to the previous year’s Pythagorean expectation. The results are displayed below:

As can be seen above, on average, the Pythagorean expectation formula provided a slightly better prediction than the previous year’s record. In both cases, the prediction was better with average winning percentages, though the Pythagorean expectation didn't decline as much when more extreme winning percentages occurred. Though the average improvement in using this method was only 3% better overall than the previous year’s league performance, it did perform well when simply predicting if a team would be better or worse the following year, with a correct prediction 65% of the time. Considering that the formula has no information about changes in the panel, injuries, new managers, or any factors beyond the previous years' scorelines, this is a very impressive result.

League versus championship

Whether or not the league is a reliable predictor of championship performance is a persistent question. The formula was each team who appeared in both the All-Ireland and division one of the league in each year from 2017 to 2008, inclusive. Though it was expected that it would not perform as well, given the smaller sample size of matches, as well as the extreme values that can occur with the winning percentage over only a handful of championship games, the Pythagorean expectation still had a positive result over simply looking at the league winning percentages.

Again, the accuracy was only slightly better than the assuming the championship winning percentage would match the  league winning percentage: It was a roughly 2% closer in its prediction, on average. Again,however, it performed well when predicting if a team would perform better or worse than they did in the league, making the correct prediction 62% of the time. Despite the much smaller sample size, there was not a huge drop-off in accuracy compared to the year on year predictions.

Predicting the upcoming championship

Based on all of the above, this allows us to make some broad predictions about the upcoming championship, based on league performance. Though there are many factors which prevent an accurate prediction, such as lineup changes, and a difference in the strength of opposition, we can nevertheless make a rough guess on how teams will perform. The following table displays the expected winning percentage, based on Pythagorean expectation, how much better or worse this is than their 2018 league winning percentage (a larger positive difference indicates that a team was more unlucky in the league, a larger negative difference indicates that they got lucky, and should perform less well in the championship), and their expected number of wins in the group stage, for each team in Munster and Leinster. Only time will tell just how futile a task predicting the future is: