Data Analysis on Tennis Player Performance

Posted on Feb 10, 2020
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

In every sport, people want to know who is the best, usually to confirm that their favorite players and teams are among them. The question is how can we determine that. My ShinyTennis application examines many statistics over decades of recorded tennis matches to decide who the dominant players are. This application uses the Tennis Match Charting Project Data Frame

Application Explanation

The application has 5 sections, which examine different aspects of player performance, which are sorted from short term to long term analysis.

Data Analysis on Tennis Player Performance

Point Data

This measures the performance of 2 players by displaying the average number of points per game and the average number of volleys in each point. It is expected that better players will defeat their opponents more quickly, so lower values here are considered better. Andre Agassi had the lowest Points per match and Volley Length Per match.


Data Analysis on Tennis Player Performance

Match Data

This measures the performance of 2 players in terms of the matches they win. It displays Win Ratios, Tournament Wins, Matches played per year, and a histogram of tournament wins per year. The best players are those that have higher Win Ratios, Match Participation, and Tournament Wins. The Histogram makes the distribution of Tournament Wins easier to understand. Roger Federer and Novak Djokovic had the most tournament wins.


Data Analysis on Tennis Player Performance

Tournament Performance Data

This measures which players are doing the best in the selected tournament. It displays the number of tournaments wins and matches played for the top 10 players for that tournament. This allows the user to determine which players are dominating each individual tournament. Pete Sampras, Roger Federer, and Steffi Graf have all won the US Open 5 times, but Roger Federer has the most matches played.


Data Analysis on Tennis Player Performance

Newbs VS. Veterans Performance

This is similar to Match Data, except it normalizes the years relative to the player's start year. This makes it possible to compare players at the same points in their career. The best players are those who can maintain high win ratios and win rates throughout their career. Serena Williams has the longest Tennis Career at 20 years, and her tournament wins have been at their highest over the last 10 years.


Data Analysis on Tennis Player Performance

Biggest Rise/Falls Over Time

This measures the derivatives of win ratios and tournament wins in order to see at what point in the chosen player's careers their performance changed the most. This makes it easy to see how players have changed the most as their career progressed. In 2015 Serena Williams had the biggest drop in Win Ratio of all players.


Future Work

If I had more time, I would give the user more control over how the data is presented. Allowing them to sort a player combobox in different ways to make it clear which players have the highest values in each plot. This would make it much easier to see who the dominant players are. I could also search additional data frames to examine who is winning the most prize money, and the nationalities of tournaments and players to see how player performance affects these variables.

About Author

Seth Jackson

Seth Jackson is an expert in logic, economics, and philosophy with over 10 years of experience writing software. After getting a BA in Computer Science, he completed NYCDSA's Data Science program in order to obtain insights from data...
View all posts by Seth Jackson >

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI