Data Scraping 35 years of College Football Player Statistics

Posted on Feb 21, 2017
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.


This serves as the next phase in building my NFL Draft Outcome Prediction Tool. Previously I collected data of 30 years of NFL Draft History and resulting player outcomes. Scraping college football statistics for those players provides more potential predictor variables for NFL Draft Outcomes. The difficult part is data is not available for all positions and some positions do not have as many years of historical statistics available (Source of statistics: So the scope of the scraping effort was QB, RB, FB, WR, TE (1980-2017); K, P (1990 to 2017). Here are some of the key findings from my tool:


CFB QB Findings:

聽聽聽聽聽聽 High but not record high College QB Ratings led to the most successful NFL QBs, Every QB

聽聽聽聽聽聽 Drafted from 1985 to 2007 with a QB Rating of 150 or higher started 5 or more years in the NFL.

聽聽聽聽聽聽 No strong correlation between College Passing Yards and NFL Success

聽聽聽聽聽聽 Most Successful QBs averaged between 7 and 9.5 Yard per Attempt

聽聽聽聽聽聽 Most Successful College QBs threw 70-90 TDs in college

聽聽聽聽聽聽 No strong correlation between College Interceptions and Success

聽聽聽聽聽聽 No correlation between College Rushing Yards and Success, but poor Avg. Yards Per Rush does correlate with poor NFL success

Data Scraping 35 years of College Football Player Statistics

CFB RB/FB Findings:

聽聽聽聽聽聽 College RB/FB ended up at 8 different positions when they got to the NFL

聽聽聽聽聽聽 >750 Rushing Attempts in College correlates with poor NFL careers or over 4,000 rushing yards

聽聽聽聽聽聽 Most Successful RBs/FBs average approximately 5 yards per carry

聽聽聽聽聽聽 Rushing TDs do not correlate with NFL success

聽聽聽聽聽聽 No correlation between receptions and success

聽聽聽聽聽聽 Negative correlation between NFL Success and Receiving TDs

Data Scraping 35 years of College Football Player Statistics

CFB WR/TE Findings:

聽聽聽聽聽聽 College WR/TE ended up at 9 different positions when they got to the NFL

聽聽聽聽聽聽 Most successful WRs/TEs played 20-40 games in college

聽聽聽聽聽聽 Most successful WRs/TEs in the NFL had less than 100 total receptions

聽聽聽聽聽聽 Majority of successful WR/TEs had <1250 total receiving yards in college

聽聽聽聽聽聽 Most successful WR/TEs avg. 10-20 yards per catch

聽聽聽聽聽聽 Most successful WR/TEs had <10 receiving TDs in college

聽聽聽聽聽聽 No correlation between scrimmage yards/plays and NFL success

Data Scraping 35 years of College Football Player Statistics

Screenshot (30)

Screenshot (31)

About Author

Marc Fridson

In addition to my current participation in the Data Science Academy, I am a Course Designer/Facilitator for Columbia University's Applied Analytics Program and the CEO/Founder of Instant Analytics an analytical technology start-up. Prior to this I was the...
View all posts by Marc Fridson >

Related Articles

Leave a Comment

Google December 11, 2019
Google That may be the end of this post. Right here you will uncover some web-sites that we consider you will value, just click the hyperlinks.
Edmundo August 3, 2017
I enjoy the article
Genevieve July 29, 2017
I like the article
Timothy July 29, 2017
Thanks to the terrific guide July 29, 2017
Thanks, it's very informative
Http://Www.Gipsonwvl.Com July 28, 2017
This is really helpful, thanks. July 25, 2017
This is truly useful, thanks.
Nate July 22, 2017
Thanks for the wonderful manual July 21, 2017
It works really well for me
www.Denverlinux.Com July 20, 2017
Thanks to the great guide July 20, 2017
I spent a great deal of time to find something like this
Reuben July 20, 2017
It works very well for me July 14, 2017
I spent a lot of time to find something such as this
Www.Gipsonwvl.Com July 14, 2017
Thanks for the terrific article July 14, 2017
This is really helpful, thanks.
Cortez July 14, 2017
I spent a great deal of time to locate something such as this
Jorge June 28, 2017
Thanks for the terrific guide

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI