NYC Data Science Academy| Blog
Bootcamps
Lifetime Job Support Available Financing Available
Bootcamps
Data Science with Machine Learning Flagship ๐Ÿ† Data Analytics Bootcamp Artificial Intelligence Bootcamp New Release ๐ŸŽ‰
Free Lesson
Intro to Data Science New Release ๐ŸŽ‰
Find Inspiration
Find Alumni with Similar Background
Job Outlook
Occupational Outlook Graduate Outcomes Must See ๐Ÿ”ฅ
Alumni
Success Stories Testimonials Alumni Directory Alumni Exclusive Study Program
Courses
View Bundled Courses
Financing Available
Bootcamp Prep Popular ๐Ÿ”ฅ Data Science Mastery Data Science Launchpad with Python View AI Courses Generative AI for Everyone New ๐ŸŽ‰ Generative AI for Finance New ๐ŸŽ‰ Generative AI for Marketing New ๐ŸŽ‰
Bundle Up
Learn More and Save More
Combination of data science courses.
View Data Science Courses
Beginner
Introductory Python
Intermediate
Data Science Python: Data Analysis and Visualization Popular ๐Ÿ”ฅ Data Science R: Data Analysis and Visualization
Advanced
Data Science Python: Machine Learning Popular ๐Ÿ”ฅ Data Science R: Machine Learning Designing and Implementing Production MLOps New ๐ŸŽ‰ Natural Language Processing for Production (NLP) New ๐ŸŽ‰
Find Inspiration
Get Course Recommendation Must Try ๐Ÿ’Ž An Ultimate Guide to Become a Data Scientist
For Companies
For Companies
Corporate Offerings Hiring Partners Candidate Portfolio Hire Our Graduates
Students Work
Students Work
All Posts Capstone Data Visualization Machine Learning Python Projects R Projects
Tutorials
About
About
About Us Accreditation Contact Us Join Us FAQ Webinars Subscription An Ultimate Guide to
Become a Data Scientist
    Login
NYC Data Science Acedemy
Bootcamps
Courses
Students Work
About
Bootcamps
Bootcamps
Data Science with Machine Learning Flagship
Data Analytics Bootcamp
Artificial Intelligence Bootcamp New Release ๐ŸŽ‰
Free Lessons
Intro to Data Science New Release ๐ŸŽ‰
Find Inspiration
Find Alumni with Similar Background
Job Outlook
Occupational Outlook
Graduate Outcomes Must See ๐Ÿ”ฅ
Alumni
Success Stories
Testimonials
Alumni Directory
Alumni Exclusive Study Program
Courses
Bundles
financing available
View All Bundles
Bootcamp Prep
Data Science Mastery
Data Science Launchpad with Python NEW!
View AI Courses
Generative AI for Everyone
Generative AI for Finance
Generative AI for Marketing
View Data Science Courses
View All Professional Development Courses
Beginner
Introductory Python
Intermediate
Python: Data Analysis and Visualization
R: Data Analysis and Visualization
Advanced
Python: Machine Learning
R: Machine Learning
Designing and Implementing Production MLOps
Natural Language Processing for Production (NLP)
For Companies
Corporate Offerings
Hiring Partners
Candidate Portfolio
Hire Our Graduates
Students Work
All Posts
Capstone
Data Visualization
Machine Learning
Python Projects
R Projects
About
Accreditation
About Us
Contact Us
Join Us
FAQ
Webinars
Subscription
An Ultimate Guide to Become a Data Scientist
Tutorials
Data Analytics
  • Learn Pandas
  • Learn NumPy
  • Learn SciPy
  • Learn Matplotlib
Machine Learning
  • Boosting
  • Random Forest
  • Linear Regression
  • Decision Tree
  • PCA
Interview by Companies
  • JPMC
  • Google
  • Facebook
Artificial Intelligence
  • Learn Generative AI
  • Learn ChatGPT-3.5
  • Learn ChatGPT-4
  • Learn Google Bard
Coding
  • Learn Python
  • Learn SQL
  • Learn MySQL
  • Learn NoSQL
  • Learn PySpark
  • Learn PyTorch
Interview Questions
  • Python Hard
  • R Easy
  • R Hard
  • SQL Easy
  • SQL Hard
  • Python Easy
Data Science Blog > Student Works > Talent Evaluation in Pro Sports, the Critique Of Ability

Talent Evaluation in Pro Sports, the Critique Of Ability

Matt Savoca
Posted on May 14, 2019
Have teams' ability to evaluate talent in the four major U.S. sports improved over the last forty years? I take a look at the last 77,000 draft picks to find out.

Shiny App|GitHub


The critique of teams' ability to draft talented players in professional sports is far from a new endeavor. For as long as the games have existed, pundits and fans alike have nitpicked and needled the draft-day decisions of General Managers and their scouting departments, wondering how they could have chosen whomever they decided upon, while another obviously superior talent remained available to be swiftly scooped up by rival squads.

Thereโ€™s also no shortage of research in determining best tactics for player evaluation or player selection in any of the four major U.S. sports leagues. And yet, little effort has been made to compare evaluation and selection abilities across sports, over an extended period of time - despite an intriguing influx of cross-sport hirings. These Front Office decisions indicate there may be an overlap, possibly a significant one, in knowledge required to evaluate and draft players across various sports.

Evaluating Talent

In its simplest form, when on the clock, the goal for every team is to pick the best player available, with each pick available, each season. Accomplishing this requires weighing many factors simultaneously, in order to successfully mitigate risk while maximizing opportunity cost: the expected value of each given pick.1 But at the end of that process, only one thing matters for teams and fans alike: Was that player any good?

Bearing all of this in mind, letโ€™s attempt to:

  1. Research and employ best practices for valuing draft pick outcomes across sports
  2. Gather the requisite data needed to determine pick outcome values (based on step one)
  3. Rescale the pick outcome values to allow for cross-sport analysis
  4. Adjust for different draft lengths (different numbers of rounds for each sport)
  5. Explore and evaluate the results

Creating a Cross-Sport Analysis

With assistance from previous research in cross-sport analysis, I decided on four pick outcome value metrics (simply labelled value within the app), one for each sport, that would be most appropriate for evaluating the long-term success of each individual player2. This information would be gathered in addition to general pick information, such as Draft Year, Draft Round, Pick Number, and Team:

  • For NBA (Basketball) Players, Value Over Replacement Player

  • For NFL (Football) Players, Career Approximate Value

  • For NHL (Hockey) Players, Point Share

  • For MLB (Baseball) Players, Wins Above Replacement3

In order to gather the data, I employed Python and the Scrapy package to create four unique scripts (well, five, as two NBA scrapers were needed. Code for all five scrapers is available here.) that gathered Draft Pick and Player Outcome Value information for the past forty years of drafts (1979-2018) via four sites within the SportsReference.com family. In total, I gathered information for nearly 78,000 player picks from:

  • Basketball-Reference.com (for NBA data)
  • Pro-Football-Reference.com (for NFL data)
  • Hockey-Reference.com (for NHL data)
  • Baseball-Reference.com (for MLB data)

The next step was rescaling each of the four value metrics (see above) so that the highest rated player in each sport had a value of 1 and the lowest rated player had a value 0. I used minmax scaling for value calculations.4

Finally, I converted each Draft Pick to a Draft Pick Percentile, so that we can (again) more effectively evaluate across sports which have wildly different draft lengths.5

The Data

Thereโ€™s a ton of information to unpack here. After scraping, cleaning, processing and scaling the data, I combined the information into an easy-to-navigate R-Shiny App to assist myself and others in performing exploratory data analysis. 

Multidraft Exlporer

Initial Research Questions

Now that we have the necessary data, we can begin to answer questions about the general state of player evaluation across the four major sports. Some questions this research can begin to answer:

  • Are General Managers and Talent Scouts generally good at their job?
  • Has the ability to evaluate talent improved over time?
  • Are certain sports ahead or behind others in talent evaluation?
  • How should franchises best use early-round draft capital?
  • Are there particular positions that appear harder or easier to evaluate?
  • Are any franchises particularly good or bad at evaluating talent?

Let's dig into the data and see what we discover for each of these questions.

Are General Managers and Talent Scouts generally good at their job?

We'l first take a look at player value compared to the percentile in which the draft pick occurred, grouped by sport. Note actual Pick Numbers corresponding to the Draft Pick Percentiles for each sport are denoted in the subtitle of the plot.

 

The #1 overall pick in each year's draft is considered the 100th Percentile Pick, or 1 on the graph below.

The NHL sees the steepest curve in pick outcome value from through the first 25% of its draft, followed by the NFL, NBA, and MLB. Compared to other sports, MLB (baseball) player value is completely flat from ~70th Percentile Pick to the End of the draft.

Now we can take a look at only the last decade to see if there's any major discrepancies. Not much improvement at first glance:

Another way we can visualize this is by looking at average pick value by draft year, again grouped by sport:

Even taking into account the fact that not all current players who will eventually become superstars have yet to do-so, one could make an argument that, with the exception of the NHL, player outcome values on a per-pick basis are falling, not rising, over the years.

Do things look any better when we assess cumulative values for each draft? Not really.

Unsurprisingly, the NBA, with only a two round draft, offers the least cumulative value per season.

Are certain sports ahead or behind others in talent evaluation?

Let's zoom-in on the cumulative values, grouped by sport, in only the top 25% of draft picks.

Fascinatingly, NFL teams get roughly the same value from the first 64 picks of the draft than the first 300 MLB picks. That's 10 full rounds of MLB drafting equaling two rounds of NFL drafts in terms of pick value.

To check on this further, let's peruse only the Top-60 picks. As the NBA has just sixty picks per year. For this graph we use the actual pick number as opposed to the draft pick percentile, in the x-axis:

NFL pick values fall in a nearly linear fashion from pick 1 to pick 60. The other 3 sports see a steep drop-off before that point.

Per these visuals, it's starting to appear that evaluating pro baseball talent might be the most difficult of the four major sports, particularly when attempting to delineate first-rounders from say, fifth-rounders. In general, there's very little separating the outcomes of those picks, despite their multi-round difference in price.

Are particular positions harder or easier to evaluate?

For this analysis, players whose positions performed similar in-game functions were grouped into position categories (i.e. NFL Safeties and Cornerbacks were grouped into a singular Defensive Backs (DB) category) to determine if any of positions had vastly different outcomes than others:

There appears to be some value in picking a Center (C) around the 50th Percentile of drafts, as they have carried more value per-pick than the the 75th Percentile. Still, the best chance of finding a superstar at any position appears to be in the top 25% of drafts.

The Tight-End (TE) position appears to have a much flatter curve than the other positions, indicating that an early round TE pick might carry extra inherent risk.

There appears to be fluctuations in the value of Wings (W) from about the 75th to 50th Percentile, indicating there may be some value in picking that position in the middle of the draft if you miss-out on selecting one at the very top of the draft.

Baseball players show a nearly equal per-pick value after the first 15% of drafts. We can barely see the positional curves, despite this graph already being zoomed-in compared to the other three sports. Let's look at the top 10% just to see if we can find any trends at all:

Slight fluctuations in the Outfielder (OF) position from the 97.5th Percentile to 95th Percentile show possible value in not picking OF at the very top of drafts. Infield (INF) positions carry the most value in the first 3.5% of drafts.

Let's continue looking at player values, this time in the aggregate, along with the underlying value distributions:

Amongst all the sports, the NHL Defenseman appears to have the most expected value of any position. The NFL Tight End (TE) is on-par with MLB picks (the riskiest picks on a per-pick level) in terms of value.

We can see that certain positions carry significantly more value on average than others when selected in the top-10% of drafts:

NFL Quarterbacks (QB) jump Offensive Lineman (OL) in terms of value in the top-10% of drafts, which begins to explain why many Quarterback-needy NFL teams trade-up to acquire one. In the MLB, the Catcher (C) position differentiates itself as the highest-upside pick early in drafts.

Are any franchises particularly good or bad at evaluating talent?

While one could devote hundreds of thousands of words to fully answering this equation, we can begin to evaluate the differences in teams ability to draft certain positions exploring the shiny app.

Below is a comparison of the NFL's AFC North Division teams' pick outcome values, in the top 25% of drafts, grouped by position:

We can see major differences in how well these teams have drafted Quarterbacks (QB) and Running Backs (RB) early in drafts.

A similar analysis of the NBA's Southeast Division:

The Magic have been excellent at selecting Centers (C) while the Hornets have been more successful than others in selecting Guards (G)

And now the MLB's American League East Division:

Despite razor-thin value differences (notice the x-axis maxes-out at a value of 0.04 out of 1), it appears the Red Sox and Yankees have a noticeable advantage in drafting Infielders (INF) compared to their division rivals.

Conclusion

While significantly more research can be done regarding each of the above mentioned questions, there are two general conclusions I am quite comfortable making:

  • Evaluating talent in pro sports is difficult. It hasn't noticeably improved over the last forty years, and just because a player pick is earlier doesn't necessarily mean they're even close to a guarantee to be more valuable.
  • The best way to acquire more value than your opponent is to have more picks than your opponent. That means: trade and acquire more total picks.

In addition, exploratory data analysis revealed:

  • MLB picks are worth significantly less, per-pick, than the other major sports, but the amount of picks per draft (40 per team) makes up for this difference. The margin for error is razor-thin despite so many chances to find talent each year.
  • The Tight End (NFL) and Center (NBA & NHL) are the riskiest early picks.
  • Quarterback (NFL), Catcher (MLB), Defensemen (NHL), and Guard (NBA) seem to be the safest early picks.
  • It appears that one should wait to select skill-position Players (running back, tight end, and wide receiver) in the NFL.
  • NHL and NFL have most variable โ€œhit ratesโ€ per  year.

Further Research

  • I'd love to expand on previous research to create draft curves to determine the expected value of each pick of each round in each sport
  • Additionally, it would be rewarding to create a machine learning model to predict future draft pick outcome values

Explore for Yourself

There is much left to discover from this data. Feel free to explore at the app below:

Multidraft Explorer

Footnotes

1. The full list of these factors, again for each sport, would be quite exhaustive, and determining expected values and โ€œdraft curvesโ€ for different sports may be the subject of future work.

2. Michael Lopez, the Director of Data and Analytics at the NFL and author of the previously linked article, has posted a slew of cross-sport analyses that were highly influential on this article's author.

3. Each of these metrics were selected because they are considered optimal metrics for inter-positional analysis within each sport. For more information on the derivation of each these value metrics, please visit the following links: (NBA) Value Over Replacement Player, (NFL) Approximate Value,  (NHL) Point Share, (MLB) Wins Over Replacement

4. Note that in some sports (NBA, NFL, and MLB) a playerโ€™s value can dip into the negative if they are measured to be โ€œbelow replacement,โ€ but players who never play receive a value of 0 prior to rescaling - technically a greater value than the aforementioned below-replacement players. In effect, teams were rewarded for choosing not to play negative-value players in this analysis.

5. There was one additional hurdle: The baseball eligibility problem. MLB players can be (and often are) drafted more than one time. This analysis, as previous research has done, automatically sets all baseball players' pick outcome values to 0, prior to rescaling, if they were drafted again.

Shiny App|GitHub

About Author

Matt Savoca

Matt Savoca is a sports-obsessed researcher and content producer who lives in New York City. After completing a foundational coursework in statistics and data science in both R and Python, he spends his days parsing, scraping, visualizing, and...
View all posts by Matt Savoca >

Related Articles

Data Analysis
Car Sales Report R Shiny App
R Shiny
Forecasting NY State Tax Credits: R Shiny App for Businesses
R Shiny
Behind the Curtains: Insights into NYC Broadway Shows
Meetup
What can data say about work-life balance and achievement?
Python
Tech Layoffs: Exploring the Trends and Industry Shifts

Leave a Comment

Cancel reply

You must be logged in to post a comment.

No comments found.

View Posts by Categories

All Posts 2399 posts
AI 7 posts
AI Agent 2 posts
AI-based hotel recommendation 1 posts
AIForGood 1 posts
Alumni 60 posts
Animated Maps 1 posts
APIs 41 posts
Artificial Intelligence 2 posts
Artificial Intelligence 2 posts
AWS 13 posts
Banking 1 posts
Big Data 50 posts
Branch Analysis 1 posts
Capstone 206 posts
Career Education 7 posts
CLIP 1 posts
Community 72 posts
Congestion Zone 1 posts
Content Recommendation 1 posts
Cosine SImilarity 1 posts
Data Analysis 5 posts
Data Engineering 1 posts
Data Engineering 3 posts
Data Science 7 posts
Data Science News and Sharing 73 posts
Data Visualization 324 posts
Events 5 posts
Featured 37 posts
Function calling 1 posts
FutureTech 1 posts
Generative AI 5 posts
Hadoop 13 posts
Image Classification 1 posts
Innovation 2 posts
Kmeans Cluster 1 posts
LLM 6 posts
Machine Learning 364 posts
Marketing 1 posts
Meetup 144 posts
MLOPs 1 posts
Model Deployment 1 posts
Nagamas69 1 posts
NLP 1 posts
OpenAI 5 posts
OpenNYC Data 1 posts
pySpark 1 posts
Python 16 posts
Python 458 posts
Python data analysis 4 posts
Python Shiny 2 posts
R 404 posts
R Data Analysis 1 posts
R Shiny 560 posts
R Visualization 445 posts
RAG 1 posts
RoBERTa 1 posts
semantic rearch 2 posts
Spark 17 posts
SQL 1 posts
Streamlit 2 posts
Student Works 1687 posts
Tableau 12 posts
TensorFlow 3 posts
Traffic 1 posts
User Preference Modeling 1 posts
Vector database 2 posts
Web Scraping 483 posts
wukong138 1 posts

Our Recent Popular Posts

AI 4 AI: ChatGPT Unifies My Blog Posts
by Vinod Chugani
Dec 18, 2022
Meet Your Machine Learning Mentors: Kyle Gallatin
by Vivian Zhang
Nov 4, 2020
NICU Admissions and CCHD: Predicting Based on Data Analysis
by Paul Lee, Aron Berke, Bee Kim, Bettina Meier and Ira Villar
Jan 7, 2020

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day ChatGPT citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay football gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income industry Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI

NYC Data Science Academy

NYC Data Science Academy teaches data science, trains companies and their employees to better profit from data, excels at big data project consulting, and connects trained Data Scientists to our industry.

NYC Data Science Academy is licensed by New York State Education Department.

Get detailed curriculum information about our
amazing bootcamp!

Please enter a valid email address
Sign up completed. Thank you!

Offerings

  • HOME
  • DATA SCIENCE BOOTCAMP
  • ONLINE DATA SCIENCE BOOTCAMP
  • Professional Development Courses
  • CORPORATE OFFERINGS
  • HIRING PARTNERS
  • About

  • About Us
  • Alumni
  • Blog
  • FAQ
  • Contact Us
  • Refund Policy
  • Join Us
  • SOCIAL MEDIA

    ยฉ 2025 NYC Data Science Academy
    All rights reserved. | Site Map
    Privacy Policy | Terms of Service
    Bootcamp Application