Analyzing America’s Pastime: Do Fans Really Dig the Long Ball?

Everyone has their own reason to attend baseball games. Some like the food; some like the atmosphere; others long for the hope of catching a foul ball. Many, as Greg Maddux and Tom Glavine indicated in the iconic commercial 1998, come to baseball games for one thing – to see home runs. Equipped with one of my favorite data tools, Power BI, and historical attendance records, I set out to determine if this was true - that attendance is directly relational with the number of home runs hit in any given season.

For the population of data, I started with historical data dating back to 1876. To standardize the data across years, I calculated the average attendance per game and average home runs per game across this period.

To convey this data, I determined that a scatter plot would be the best form of visualization since it is a great way to show trends between two variables. The graph below is the result of plotting the “Home Runs per Game” and the “Attendance per Game” per year for the total period. A trend line shows the relationship between the variables, and the correlation coefficient was calculated.

At this point, is worth it to sit back and evaluate if these results make sense. Although this looks great for my hypothesis, one needs to consider if additional factors would have contributed to that strong of a relationship between attendance and the number of home runs per game during this time period. As many who follow baseball know, the popularity of the sport grew tremendously during the 20th century, which could potentially skew the relationship if home runs also increased over this period. To test for this factor, I attempted to isolate the growth in attendance and see if the relationship changed. Looking into the data, the attendance per game peaked around 1991, so I limited the analysis to data from 1991 to the current day, as exemplified below.

Removing these data points significantly alters the analysis, with correlation going from very strong to very weak. If the positive correlation were indeed as strong as suggested in the initial analysis, we would have expected home runs have a strong positive correlation with average attendance during this period as well. This hints that it is likely that there were more factors in play during the original period that helped to grow attendance than the number of home runs hit, such as overall population growth, additional teams added, or even ballpark size. So, the question remains – Do Fans Dig the Long Ball?

For more information on Schneider Downs’ data analytic service capabilities, contact Joel Rosenthal at 412.697.5387 or [email protected], or Andrew Trettel at 412.697.5436 or [email protected].

