Sports Data Analytics on World Powerlifter's performance
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
For readers who only have a vague idea of "powerlifting"
Powerlifting is a strength sport that consists of 3 attempts, recorded as data, at maximal weight on 3 lifts:
- Benchpresses (upper body strength assessment)
- Squats (lower body strength assessment)
- Deadlifts (whole body strength assessment)
Unlike Olympic Weightlifting which also requires reflex and technique, "Powerlifting" is primarily about "Pure Core Strength."
From no equipment to full equipment, powerlifters compete in 4 events:
- Raw: lifting with no or little additional equipment (belts allowed)
- Wraps: lifting with knee/wrist wraps on
- Single-ply: lifting wearing 1 layer of supportive suits
- Multi-ply: lifting wearing 2 or more supportive suits
The data used for this project was obtained from the OpenPowerlifting database (link)
This dataset is the aggregation of time-varying performance records of powerlifting competition contestants and their characteristics, enabling the analysis of performance in powerlifting by different groups and times. With "robustness" added in the analysis by dropping records from years with small numbers of observations, the resulting time horizon is from 1988 to 2019.
Data Research Questions (EDA)
- How did the characteristics of the contestants change by time?
- The bigger the number of contestants for a country, the better the performance?
- How did equipment affect the performance of male and female powerlifters?
- Has there been a trend of consistent improvement in lifting performance since 1988 as a result of advancement in sports science overall?
1. How did the characteristics of the contestants change by time?
The majority of powerlifting contestants have been, and are men. However, starting from around 10% of the total contestants in 1988, female participation did increase, fluctuating between 20% to 25% from the early 90s to 2015. In 2016, the female participation rate exceeded 30%, and has since been on an increasing trend.
- Age Categories
Throughout history of powerlifting in general, the age category "24-34" has had the most contestants, followed by the "20-23" category. This seems natural gvien that the "24-34" category has the widest age range among all age cateogries in the data. Also, this result is likely to confirm the conventional belief that human physical strength peaks around 30 (especially among Men).
- Competitions by Equipment
Until 1992, Single-Ply (one layer of supportive suit) was virtually the only form of powerlifting competions. After 1993, Multi-Ply (two layer or more supportive equipment), or even Raw (no equipment) competitions, began to be held, but Single-Ply represented the dominant form of equipment in powerlifting competitions until the mid-2000s. Since 2014, however, competitions with light or no equipment have become popular, as can be seen below. (plot for 2017)
2. The bigger the number of contestants for a country, the better the performance?
By comparing the country share (percentage of each country's participants out of the total) and the performance of the country, by year, we can check if there is a correlation between the number of a country's contestants and the performance of that country.
Mesaurement criteria of the performance include "Average" and "Maximum" of powerlifting weights (Best 3 Bench presses, Best 3 Squats, Best 3 Deadlifts, and in Total).
It turns out that while "maximum powerlifting weights" do have some correlation with the number of contestants for a country, "average powerlifting weights" do not.
As shown in the "Change in Country share" plot above, the four countries with the most contestants in 2011 were Ukraine, USA, Russia, and Czech. Accordingly, those four countries occupied 4 spots in terms of "maximum powerlifting weights," but it was not the case for "average powerlifting weights." This tendency can be seen across the time horizon.
This is likely to be the effect of outliers. The "country average" of powerlifting weights is affected by both the upper and lower outliers, while the country maximum is only determined by the upper outlier. As increase in sample size likely enhances the likelihood of outliers, increase in the number of contestants will produce more upper and lower outliers, in terms of performance, for a country. Thus, that country's maximum weight performance will likely improve as we see in the above plots, while it is not necessarily the case for the average weight performance.
3. How did equipment affect the performance of male and female powerlifters?
Considering the body weight difference between men and women, the average of powerlifting weights divided by the body weights are used, instead of direct comparison of the average powerlifting weights by sex.
As shown in the three box-plots above, generally powerlifting performance improves as the powerlifter gets “more equipped,” in both men and women. However, in deadlifting, both male and female powerlifters performed worse on average when they were equipped with Multi-Ply than with Single-Ply. This indicates that adding more layers on the lifting suits of deadlifters has to be reconsidered in terms of performance.
Another point to notice is how the performance gap between men and
women changes with increase in equipment, across different entries (Bench Press, Squat, and Deadlift). Without equipment ("Raw"), the performance gap is distinct. Even when being equipped with "Wraps" there is no significant decrease in sex-gap for all three entries.
However, "Single-Ply" brings a notable decrease therein. In both Squatting and Deadlifting, the upper quartile of female powerlifting performance by body weight becomes similar to the average in that of male. Also, there was a noticeable decrease in the gap in Bench-Pressing, although not as significant as in the previous two. In the case of "Multi-Ply," the sex-gap is similar to that of "Single-Ply." Thus, lifting suits (Single-Ply and Multi-Ply) assist female powerlifters more, relatively, than they do males in terms of performance.
4. Has there been a trend of consistent improvement in lifting performance since 1988 as a result of advancement in sports science overall?
Sports science for powerlifting includes a variety of features, not only
equipment but also those such as athlete nutrition and training methods.
Sports science made significant progresses with passage of time, leading to better performances in competitions.
In identifying the effect of enhanced sports technology on the performances of powerlifters, the US and Russia, the two countries with most samples, were selected. The Country Selection also took account of the sporting rivalry between the US and Russia, which accelerated the development of each country's sports science.
The Age Category of "24-34" was selected due to its largest sample size. The performance records of "Male" powerlifters were selected because female participation rate was very low in the past. Furthermore, "Total KG", which is the sum of weight records in all three entries, was used as the criteria, for the purpose of taking a look on the overall performance level at once.
As shown in the above graphs, the powerlifting performance in terms of maximum weights ("max-performance") shows an “increasing” trend for both countries. However, the same in terms of average weights ("average-performance"), fluctuated more and did not show any distinct trend for both.
Powerlifting is about renewing the maximum lifting weights. The increasing trend in "max-performances" for both countries means that the value of the uppermost outlier has continuously increased with time. A certain continuous positive trend in upper outliers is hard to attribute to randomness, which implies that there are some driving forces. Intuitively, development of sports science in powerlifting will be among the main driving forces.
While the "max-performance" is determined by the uppermost outlier, the "average-performance" is determined by all observed results, including upper and lower outliers. This indicates that much greater randomness is involved there. Thus, it is not a surprise that one has difficulty in finding a distinct trend in the "average-performances" for both countries.