Sports Data Analytics on World Powerlifter's performance

Posted on Jul 1, 2019
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

ShinyApp | Github 

Sports Data Analytics on World Powerlifter's performance

For readers who only have a vague idea of "powerlifting"

    Powerlifting is a strength sport that consists of 3 attempts, recorded as data, at maximal weight on 3 lifts:

  •  Benchpresses (upper body strength assessment)
  • Squats (lower body strength assessment)
  • Deadlifts (whole body strength assessment)

    Unlike Olympic Weightlifting which also requires reflex and technique, "Powerlifting" is primarily about "Pure Core Strength."

    From no equipment to full equipment, powerlifters compete in 4 events:

  •  Raw:  lifting with no or little additional equipment (belts allowed)
  •  Wraps: lifting with knee/wrist wraps on
  •  Single-ply:  lifting wearing 1 layer of supportive suits
  •  Multi-ply: lifting wearing 2 or more supportive suits

Data Source

    The data used for this project was obtained from the OpenPowerlifting database (link)

    This dataset is the aggregation of time-varying performance records of powerlifting competition contestants and their characteristics, enabling the analysis of performance in powerlifting by different groups and times. With "robustness" added in the analysis by dropping records from years with small numbers of observations, the resulting time horizon is from 1988 to 2019. 

Data Research Questions (EDA)

  1.  How did the characteristics of the contestants change by time?
  2.  The bigger the number of contestants for a country, the better the performance?
  3.  How did equipment affect the performance of male and female powerlifters?
  4.  Has there been a trend of consistent improvement in lifting performance since 1988 as a result of  advancement in sports science overall?

Sports Data Analytics on World Powerlifter's performance


Data Results

1.  How did the characteristics of the contestants change by time?

  • Sex 

         The majority of powerlifting contestants have been, and are men.  However, starting from around 10% of the total contestants in 1988, female participation did increase, fluctuating between 20% to 25% from the early 90s to 2015.  In 2016, the female participation rate exceeded 30%,  and has since been on an increasing trend. 

  • Age Categories

         Throughout history of powerlifting in general, the age category "24-34"  has had the most contestants, followed by the "20-23" category. This seems natural gvien that the "24-34" category has the widest age range among all age cateogries in the data.  Also, this result is likely to confirm the conventional belief that human physical strength peaks around 30 (especially among Men). 

Sports Data Analytics on World Powerlifter's performance


  • Competitions by Equipment 

          Until 1992, Single-Ply (one layer of supportive suit) was virtually the only form of powerlifting competions.  After 1993, Multi-Ply (two layer or more supportive equipment), or even Raw (no equipment) competitions, began to be held, but Single-Ply represented the dominant form of equipment in powerlifting competitions until the mid-2000s. Since 2014, however, competitions with light or no equipment have become popular, as can be seen below. (plot for 2017)


2.  The bigger the number of contestants for a country, the better the performance?

      By comparing the country share (percentage of each country's participants out of the total) and the performance of the country, by year, we can check if there is a correlation between the number of a country's contestants and the performance of that country. 

      Mesaurement criteria of the performance include "Average" and "Maximum" of powerlifting weights (Best 3 Bench presses, Best 3 Squats, Best 3 Deadlifts, and in Total).

      It turns out that while "maximum powerlifting weights" do have some correlation with the number of contestants for a country, "average powerlifting weights" do not.   

      As shown in the "Change in Country share" plot above, the four countries with the most contestants in 2011 were Ukraine, USA, Russia, and Czech.  Accordingly, those four countries occupied 4 spots in terms of "maximum powerlifting weights," but it was not the case for "average powerlifting weights." This tendency can be seen across the time horizon.  

      This is likely to be the effect of outliers. The "country average" of powerlifting weights is affected by both the upper and lower outliers, while the country maximum is only determined by the upper outlier.  As increase in sample size likely enhances the likelihood of outliers, increase in the number of contestants will produce more upper and lower outliers, in terms of performance, for a country. Thus, that country's maximum weight performance will likely improve as we see in the above plots, while it is not necessarily the case for the average weight performance. 


3.  How did equipment affect the performance of male and female powerlifters?

     Considering the body weight difference between men and women, the average of powerlifting weights divided by the body weights are used, instead of direct comparison of the average powerlifting weights by sex.

     As shown in the three box-plots above, generally powerlifting performance improves as the powerlifter gets “more equipped,” in both men and women. However, in deadlifting, both male and female powerlifters performed worse on average when they were equipped with Multi-Ply than with Single-Ply. This indicates that adding more layers on the lifting suits of deadlifters has to be reconsidered in terms of performance. 

     Another point to notice is how the performance gap between men and
women changes with increase in equipment, across different entries (Bench Press, Squat, and Deadlift). Without equipment ("Raw"), the performance gap is distinct. Even when being equipped with "Wraps" there is no significant decrease in sex-gap for all three entries.

However, "Single-Ply" brings a notable decrease therein. In both Squatting and Deadlifting, the upper quartile of female powerlifting performance by body weight becomes similar to the average in that of male. Also, there was a noticeable decrease in the gap in Bench-Pressing, although not as significant as in the previous two. In the case of "Multi-Ply," the sex-gap is similar to that of "Single-Ply." Thus, lifting suits (Single-Ply and Multi-Ply) assist female powerlifters more, relatively, than they do males in terms of performance.


4. Has there been a trend of consistent improvement in lifting performance since 1988 as a result of  advancement in sports science overall?

     Sports science for powerlifting includes a variety of features, not only
equipment but also those such as athlete nutrition and training methods.
Sports science made significant progresses with passage of time, leading to better performances in competitions.

Data Findings   

 In identifying the effect of enhanced sports technology on the performances of powerlifters, the US and Russia, the two countries with most samples, were selected. The Country Selection also took account of the sporting rivalry between the US and Russia, which accelerated the development of each country's sports science.

The Age Category of "24-34" was selected due to its largest sample size. The performance records of "Male" powerlifters were selected because female participation rate was very low in the past. Furthermore, "Total KG", which is the sum of weight records in all three entries, was used as the criteria, for the purpose of taking a look on the overall performance level at once.

     As shown in the above graphs, the powerlifting performance in terms of maximum weights ("max-performance") shows an “increasing” trend for both countries. However, the same in terms of average weights ("average-performance"), fluctuated more and did not show any distinct trend for both.


 Powerlifting is about renewing the maximum lifting weights. The increasing trend in "max-performances" for both countries means that the value of the uppermost outlier has continuously increased with time. A certain continuous positive trend in upper outliers is hard to attribute to randomness, which implies that there are some driving forces. Intuitively, development of sports science in powerlifting will be among the main driving forces.

While the "max-performance" is determined by the uppermost outlier, the "average-performance" is determined by all observed results, including upper and lower outliers. This indicates that much greater randomness is involved there. Thus, it is not a surprise that one has difficulty in finding a distinct trend in the "average-performances" for both countries.


About Author

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI