NYC Data Science Academy| Blog
Bootcamps
Lifetime Job Support Available Financing Available
Bootcamps
Data Science with Machine Learning Flagship ๐Ÿ† Data Analytics Bootcamp Artificial Intelligence Bootcamp New Release ๐ŸŽ‰
Free Lesson
Intro to Data Science New Release ๐ŸŽ‰
Find Inspiration
Find Alumni with Similar Background
Job Outlook
Occupational Outlook Graduate Outcomes Must See ๐Ÿ”ฅ
Alumni
Success Stories Testimonials Alumni Directory Alumni Exclusive Study Program
Courses
View Bundled Courses
Financing Available
Bootcamp Prep Popular ๐Ÿ”ฅ Data Science Mastery Data Science Launchpad with Python View AI Courses Generative AI for Everyone New ๐ŸŽ‰ Generative AI for Finance New ๐ŸŽ‰ Generative AI for Marketing New ๐ŸŽ‰
Bundle Up
Learn More and Save More
Combination of data science courses.
View Data Science Courses
Beginner
Introductory Python
Intermediate
Data Science Python: Data Analysis and Visualization Popular ๐Ÿ”ฅ Data Science R: Data Analysis and Visualization
Advanced
Data Science Python: Machine Learning Popular ๐Ÿ”ฅ Data Science R: Machine Learning Designing and Implementing Production MLOps New ๐ŸŽ‰ Natural Language Processing for Production (NLP) New ๐ŸŽ‰
Find Inspiration
Get Course Recommendation Must Try ๐Ÿ’Ž An Ultimate Guide to Become a Data Scientist
For Companies
For Companies
Corporate Offerings Hiring Partners Candidate Portfolio Hire Our Graduates
Students Work
Students Work
All Posts Capstone Data Visualization Machine Learning Python Projects R Projects
Tutorials
About
About
About Us Accreditation Contact Us Join Us FAQ Webinars Subscription An Ultimate Guide to
Become a Data Scientist
    Login
NYC Data Science Acedemy
Bootcamps
Courses
Students Work
About
Bootcamps
Bootcamps
Data Science with Machine Learning Flagship
Data Analytics Bootcamp
Artificial Intelligence Bootcamp New Release ๐ŸŽ‰
Free Lessons
Intro to Data Science New Release ๐ŸŽ‰
Find Inspiration
Find Alumni with Similar Background
Job Outlook
Occupational Outlook
Graduate Outcomes Must See ๐Ÿ”ฅ
Alumni
Success Stories
Testimonials
Alumni Directory
Alumni Exclusive Study Program
Courses
Bundles
financing available
View All Bundles
Bootcamp Prep
Data Science Mastery
Data Science Launchpad with Python NEW!
View AI Courses
Generative AI for Everyone
Generative AI for Finance
Generative AI for Marketing
View Data Science Courses
View All Professional Development Courses
Beginner
Introductory Python
Intermediate
Python: Data Analysis and Visualization
R: Data Analysis and Visualization
Advanced
Python: Machine Learning
R: Machine Learning
Designing and Implementing Production MLOps
Natural Language Processing for Production (NLP)
For Companies
Corporate Offerings
Hiring Partners
Candidate Portfolio
Hire Our Graduates
Students Work
All Posts
Capstone
Data Visualization
Machine Learning
Python Projects
R Projects
About
Accreditation
About Us
Contact Us
Join Us
FAQ
Webinars
Subscription
An Ultimate Guide to Become a Data Scientist
Tutorials
Data Analytics
  • Learn Pandas
  • Learn NumPy
  • Learn SciPy
  • Learn Matplotlib
Machine Learning
  • Boosting
  • Random Forest
  • Linear Regression
  • Decision Tree
  • PCA
Interview by Companies
  • JPMC
  • Google
  • Facebook
Artificial Intelligence
  • Learn Generative AI
  • Learn ChatGPT-3.5
  • Learn ChatGPT-4
  • Learn Google Bard
Coding
  • Learn Python
  • Learn SQL
  • Learn MySQL
  • Learn NoSQL
  • Learn PySpark
  • Learn PyTorch
Interview Questions
  • Python Hard
  • R Easy
  • R Hard
  • SQL Easy
  • SQL Hard
  • Python Easy
Data Science Blog > Student Works > Scraping NBA Play-by-Play Data with Scrapy & MongoDB

Scraping NBA Play-by-Play Data with Scrapy & MongoDB

Tom Walsh
Posted on Feb 29, 2016

In my previous projects I worked with data on NBA lineups from stats.nba.com, first exploring some of the relationships between player performance and lineup performance, and then building an interactive tool to allow for further exploration. For this project, I wished to get more granular, working with NBA play-by-play data. Scraping the data turned out to be fairly trivial (although it did take about half a week to scrape one season), but it was a challenge to transform it into a useful state.

Schedules

The 3rd tab of each NBA Game Recap page contains the play-by-play for the game. However, an NBA season consists of 1,230 regular season games, so we need an automated method of finding the game pages. Ideally, we'd like to be able to scrape a single day at a time, as this lends itself to regular daily updates. The NBA has a daily schedule page with links to that day's game recaps, and the url pattern for a given date is easy to determine. So, our workflow will be to find the schedule page for a given date, extract that game recap links for each game, and then follow those to scrape the play-by-play for each game.

Scrapy

Initially, I chose to use scrapy mostly because it supports proper selectors (both CSS and XPath) for navigation HTML documents. However, the design of the framework lends itself to efficiently executing our intended workflow. This is because scrapy allows us to queue up additional pages for scraping, and will scrape those pages in parallel, so as we parse the schedule page, we can queue up each game recap page for scraping.

Our initial parse method is quite simple:

def parse(self, response):
    for href in response.css("a.recapAnc::attr('href')"):
        url = response.urljoin(href.extract())
        yield scrapy.Request(url, callback=self.parse_game_recap)

Scrapy uses python generators to yield objects to the framework for further processing. In this case, we're finding each game recap link within the response and yielding a scrapy.Request, telling scrapy to scrape that link using the specified callback.

Parsing the game recap is a bit more complicated:

def parse_game_recap(self, response):
    away = None
    home = None
    quarter = None
    # There's some useful information in the url, so we extract it.
    # This probably should have been a single regex, but it doesn't matter much.
    game_id = re.search('([A-Z]+)', response.url).group(1)
    pbp_item = PlayByPlay() # We'll see scrapy Items shortly.
        
    # Find the play by play table and iterate its rows
    for index, row in enumerate(response.xpath('//div[@id="nbaGIPBP"]//tr')):
        # If we get a row with team names, extract them.
        if int(row.xpath('@class="nbaGIPBPTeams"').extract_first()) == 1:
           (away, home) = [x.strip() for x in row.xpath('td/text()').extract()
        else:
            # otherwise, build up the PlayByPlay item with the data in the row.
            pbp_item['quarter'] = quarter
            pbp_item['game_id'] = game_id
            pbp_item['index'] = index
            for field in row.xpath('td'):
                field_class = str(field.xpath('@class').extract_first())
                if field_class == 'nbaGIPbPTblHdr':
                    name = row.xpath('td/a/@name')
                    if len(name) > 0:
                        quarter = row.xpath('td/a/@name').extract_first()
                        pbp_item['quarter'] = quarter
                elif len(field.xpath('@id')) > 0:
                    # Sometimes we'll have rows that don't fit the structure of the
                    # PlayByPlay item.  We store them in a GameEvent item.
                    event_item = GameEvent()
                    event_item['type'] = field.xpath('@id').extract_first()
                    event_item['text'] = field.xpath('div/text()').extract_first()
                    event_item['quarter'] = quarter
                    event_item['game_id'] = game_id
                    event_item['index'] = index
                    # We can yield items to, for processing by scrape's pipelines,
                    # which we'll learn about later.
                    yield event_item
                else:
                    text = field.xpath('text()').extract_first().strip()
                    if len(text) == 0:
                        continue
                    else:
                        if field_class == 'nbaGIPbPLft' or field_class == 'nbaGIPbPLftScore':
                            pbp_item['team'] = away
                            pbp_item['text'] = text
                        elif field_class == 'nbaGIPbPRgt' or field_class == 'nbaGIPbPRgtScore':
                            pbp_item['team'] = home
                            pbp_item['text'] = text
                        elif field_class == 'nbaGIPbPMid':
                            pbp_item['clock'] = text
                        elif field_class == 'nbaGIPbPMidScore':
                            pbp_item['clock'] = text
                            pbp_item['score'] = field.xpath('text()').extract()[1].strip()
                        else:
                            raise ValueError("Unknown class: %s" % field_class)
            if 'clock' in pbp_item:
                # Yield the PlayByPlay item we've been working on and create a new one.
                yield pbp_item
                pbp_item = PlayByPlay()

We see here how a scrapy parse method can return not just scrapy Request objects, but also Item objects.

Here is one of our basic scrapy items at this stage:

class PlayByPlay(scrapy.Item):โ€จ
    game_id = scrapy.Field()
โ€จ    quarter = scrapy.Field()
โ€จ    period = scrapy.Field()
โ€จ    clock = scrapy.Field()
โ€จ    score = scrapy.Field()
โ€จ    team = scrapy.Field()โ€จ
    text = scrapy.Field()โ€จ
    index = scrapy.Field()

Dates

We still haven't told scrapy which page to parse. Let's do that now. Here's how we initialize our Spider:

import scrapy
import re
import time

from scraping.items import PlayByPlay, GameEvent

class NbaSpider(scrapy.Spider):
    name = "nba"
    allowed_domains = ["nba.com"]

    # __init__ allows us to specify custom arguments that can be passed to scrapy with the -a option
    # in this case, 'scrape_date'
    def __init__(self, scrape_date=None, *args, **kwargs):
        super(NbaSpider, self).__init__(*args, **kwargs)

        # if no scrape_date is specified, default to yesterday
        if scrape_date is None:
            scrape_date = str(int(time.strftime('%Y%m%d')) - 1)

        # Here's where we define the starting URL
        self.start_urls = ['http://www.nba.com/gameline/%s/' % scrape_date]

    def parse(self, response):
       ...

Now we can scrape a day of data like this: scrapy crawl nba -a scrape_date=20160226

Pipelines

Our basic scraper/crawler can now pull down the play-by-play for a given date, but we can't yet do anything with it. Scrape's pipelines allow us to work with our data. First, we'll store it somewhere.

MongoDB

MongoDB is a schema-less NoSQL database with an easy to use javascript-based query syntax. It lends itself to situations where we wish to engage in open-ended exploration of the data. It also saved me all the work of creating schemas for my database.

My MongoDB pipeline is very similar to the example here, except since our application has multiple Item types, we select our MongoDB collection based upon the class name. I've also elected to replace in the case of duplicates. To identify duplicates, we've added an index_fields method to each of our Item types.

class MongoPipeline(object):

    def __init__(self, mongo_uri, mongo_db):
        self.mongo_uri = mongo_uri
        self.mongo_db = mongo_db

    @classmethod
    def from_crawler(cls, crawler):
        return cls(
            mongo_uri=crawler.settings.get('MONGO_URI'),
            mongo_db=crawler.settings.get('MONGO_DATABASE', 'items')
        )

    def open_spider(self, spider):
        self.client = pymongo.MongoClient(self.mongo_uri)
        self.db = self.client[self.mongo_db]

    def close_spider(self, spider):
        self.client.close()

    def process_item(self, item, spider):
        self.db[item.__class__.__name__].replace_one(item.index_fields(), dict(item), True)
        return item

All of our Item types has index_field methods. This one is from PlayByPlay:

def index_fields(self):
    return {
        'game_id': self['game_id'],
        'index': self['index'],
        'quarter': self['quarter'],
        'date': self['date']
     }

We need to configure our MongoPipeline to be invoked on each Item:

ITEM_PIPELINES = {
    'scraping.pipelines.MongoPipeline': 300
}

Parsing Play-by-Play Data

Now comes the tough part. We need to parse the play-by-play strings to extract the underlying data. Here are some sample strings:

   "Harden Driving Layup Shot: Missed Block: Faried (2 BLK)",
    "Ellis Running Layup Shot: Made (19 PTS)",
    "Vucevic Layup Shot: Missed Block: Withey (2 BLK)",
    "Holiday 3pt Shot: Made (10 PTS) Assist: Gordon (1 AST)",
    "Kaman Foul: Offensive (2 PF) (S Foster)",
    "Parsons 3pt Shot: Made (7 PTS) Assist: Nowitzki (1 AST)",
    "McLemore Turnover : Out of Bounds - Bad Pass Turnover (1 TO)",
    "Okafor Turnaround Jump Shot: Missed Block: Adams (3 BLK)",
    "Carroll Driving Floating Bank Jump Shot: Made (7 PTS)",
    "Kaman Turnover : Foul (3 TO)",
    "Millsap Turnover : Lost Ball (4 TO) Steal:Johnson (2 ST)",
    "Williams Foul: Personal (1 PF) (B Adams)",
    "Faried Dunk Shot: Made (10 PTS) Assist: Nelson (2 AST)",
    "Young Layup Shot: Made (12 PTS) Assist: Jack (8 AST)",
    "Withey Dunk Shot: Made (8 PTS) Assist: Neto (1 AST)",
    "Holiday Pullup Jump shot: Made (12 PTS)",
    "Mozgov Turnover : Lost Ball (1 TO) Steal:Calderon (2 ST)",
    "Clarkson 3pt Shot: Made (13 PTS) Assist: Russell (4 AST)",
    "Harden Step Back Jump shot: Made (15 PTS)",
    "McConnell Driving Reverse Layup Shot: Made (6 PTS)",
    "DeRozan Driving Reverse Layup Shot: Made (5 PTS) Assist: Lowry (4 AST)",
    "Afflalo Pullup Jump shot: Made (10 PTS) Assist: Calderon (2 AST)",
    "Hibbert Foul: Defense 3 Second (3 PF) (S Twardoski)",
    "Johnson Turnover : Bad Pass (1 TO) Steal:Butler (1 ST)",
    "Asik Turnover : Lost Ball (2 TO) Steal:Lowry (2 ST)",
    "Jump Ball Crowder vs Bazemore (Sullinger gains possession)"

To make sense of this, I used a disgusting mess of regular expressions:

class TextProcessor(object):
    SHOT_RE = re.compile('(.+?) (((Tip|Alley Oop|Cutting|Dunk|Pullup|Turnaround|Running|Driving|Hook|Jump|3pt|Layup|Fadeaway|Bank|No) ?)+) [Ss]hot: (Made|Missed)( )?')
    REBOUND_RE = re.compile('(.+?) Rebound ')
    TEAM_REBOUND_RE = re.compile('Team Rebound')
    DEFENSE_RE = re.compile('(Block|Steal): ?(.+?) ')
    ASSIST_RE = re.compile('Assist: (.+?) ')
    TIMEOUT_RE = re.compile('Team Timeout : (Short|Regular|No Timeout|Official)')
    TURNOVER_RE = re.compile('(.+?) Turnover : ((Out of Bounds|Poss)? ?(- )?(Punched Ball|5 Second|Out Of Bounds|Basket from Below|Illegal Screen|No|Swinging Elbows|Double Dribble|Illegal Assist|Inbound|Palming|Kicked Ball|Jump Ball|Lane|Backcourt|Offensive Goaltending|Discontinue Dribble|Lost Ball|Foul|Bad Pass|Traveling|Step Out of Bounds|3 Second|Offensive Foul|Player Out of Bounds)( Violation)?( Turnover)?) ')
    TEAM_TURNOVER_RE = re.compile('Team Turnover : ((8 Second Violation|5 Sec Inbound|Backcourt|Shot Clock|Offensive Goaltending|3 Second)( Violation)?( Turnover)?)')
    FOUL_RE = re.compile('(.+?) Foul: (Clear Path|Flagrant|Away From Play|Personal Take|Inbound|Loose Ball|Offensive|Offensive Charge|Personal|Shooting|Personal Block|Shooting Block|Defense 3 Second)( Type (\d+))? ( )? ')
    JUMP_RE = re.compile('Jump Ball (.+?) vs (.+)( )?')
    VIOLATION_RE = re.compile('(.+?) Violation:(Defensive Goaltending|Kicked Ball|Lane|Jump Ball|Double Lane)( )?')
    FREE_THROW_RE = re.compile('(.+?) Free Throw (Flagrant|Clear Path)? ?(\d) of (\d) (Missed)? ?()?')
    TECHNICAL_FT_RE = re.compile('(.+?) Free Throw Technical (Missed)? ?()?')
    SUB_RE = re.compile('(.+?) Substitution replaced by (.+?)$')
    TEAM_VIOLATION_RE = re.compile('Team Violation : (Delay Of Game) ')
    CLOCK_RE = re.compile('')
    TEAM_RE = re.compile('

') TECHNICAL_RE = re.compile('(.+?) Technical (- )?([A-Z]+)? ?') DOUBLE_TECH_RE = re.compile('Double Technical - (.+?), (.+?) ') DOUBLE_FOUL_RE = re.compile('Foul : (Double Personal) - (.+?) , (.+?) ') EJECTION_RE = re.compile('(.+?) Ejection:(First Flagrant Type 2|Second Technical|Other)') # pts, tov, fta, pf, blk, reb, blka, ftm, fg3a, pfd, ast, fg3m, fgm, dreb, fga, stl, oreb def process_item(self, item, spider): text = item.get('text', None) if text: item['events'] = [] while text: l = len(text) m = self.SHOT_RE.match(text) if m: event = {'player': m.group(1), 'fga': 1, 'type': m.group(2)} if '3pt' in m.group(2): event['fg3a'] = 1 if m.group(5) == 'Made': event['fg3m'] = 1 event['fgm'] = 1 event['pts'] = 3 else: if m.group(5) == 'Made': event['fg3m'] = 1 event['fgm'] = 1 event['pts'] = 2 item['events'].append(event) text = text[m.end():].strip() m = self.REBOUND_RE.match(text) if m: event = {'player': m.group(1), 'reb': 1} item['events'].append(event) text = text[m.end():].strip() m = self.DEFENSE_RE.match(text) if m: event = {'player': m.group(2)} if m.group(1) == 'Block': item['events'][-1]['blka'] = 1 event['blk'] = 1 else: event['stl'] = 1 item['events'].append(event) text = text[m.end():].strip() m = self.ASSIST_RE.match(text) if m: event = {'player': m.group(1), 'ast': 1} item['events'].append(event) text = text[m.end():].strip() m = self.TIMEOUT_RE.match(text) if m: event = {'timeout': m.group(1)} item['events'].append(event) text = text[m.end():].strip() m = self.TURNOVER_RE.match(text) if m: event = {'player': m.group(1), 'tov': 1, 'note': m.group(2)} item['events'].append(event) text = text[m.end():].strip() m = self.TEAM_TURNOVER_RE.match(text) if m: event = {'turnover': m.group(1)} item['events'].append(event) text = text[m.end():].strip() m = self.TEAM_REBOUND_RE.match(text) if m: item['events'].append({'rebound': 'team'}) text = text[m.end():].strip() m = self.FOUL_RE.match(text) # TODO: Are all of these actual personal fouls? if m: event = {'player': m.group(1), 'pf': 1, 'note': m.group(2)} if m.group(4): event['type'] = m.group(4) item['events'].append(event) text = text[m.end():].strip() m = self.DOUBLE_FOUL_RE.match(text) if m: item['events'].append({'player': m.group(2), 'pf': 1, 'note': m.group(1), 'against': m.group(3)}) item['events'].append({'player': m.group(3), 'pf': 1, 'note': m.group(1), 'against': m.group(2)}) text = text[m.end():].strip() m = self.JUMP_RE.match(text) if m: item['events'].append({'player': m.group(1), 'jump': 'home'}) item['events'].append({'player': m.group(2), 'jump': 'away'}) if m.group(3): item['events'].append({'player': m.group(4), 'jump': 'possession'}) text = text[m.end():].strip() m = self.VIOLATION_RE.match(text) if m: event = {'player': m.group(1), 'violation': m.group(2)} item['events'].append(event) text = text[m.end():].strip() m = self.FREE_THROW_RE.match(text) if m: event = {'player': m.group(1), 'fta': 1, 'attempt': m.group(3), 'of': m.group(4)} if m.group(5) is None: event['pts'] = 1 event['ftm'] = 1 if m.group(2): event['special'] = m.group(2) item['events'].append(event) text = text[m.end():].strip() m = self.TECHNICAL_FT_RE.match(text) if m: event = {'player': m.group(1), 'fta': 1, 'ftm': 1, 'special': 'Technical'} if m.group(2) is None: event['pts'] = 1 event['ftm'] = 1 item['events'].append(event) text = text[m.end():].strip() m = self.SUB_RE.match(text) if m: item['events'].append({'player': m.group(1), 'sub': 'out'}) item['events'].append({'player': m.group(2), 'sub': 'in'}) text = text[m.end():].strip() m = self.TEAM_VIOLATION_RE.match(text) if m: item['events'].append({'violation': m.group(1)}) text = text[m.end():].strip() m = self.CLOCK_RE.match(text) if m: item['clock'] = m.group(1) text = text[m.end():].strip() m = self.TEAM_RE.match(text) if m: item['team_abbreviation'] = m.group(1) text = text[m.end():].strip() m = self.TECHNICAL_RE.match(text) if m: if m.group(3): item['events'].append({'team': m.group(3), 'technical': m.group(1)}) else: item['events'].append({'player': m.group(1), 'technical': True}) text = text[m.end():].strip() m = self.DOUBLE_TECH_RE.match(text) if m: item['events'].append({'player': m.group(1), 'technical': True}) item['events'].append({'player': m.group(2), 'technical': True}) text = text[m.end():].strip() m = self.EJECTION_RE.match(text) if m: item['events'].append({'player': m.group(1), 'ejection': True, 'note': m.group(2)}) text = text[m.end():].strip() if len(text) == l: raise ValueError('Could not parse text: %s' % text) if len(text) == 0: text = None return item

Problem: Who is Playing?

While the play-by-play data includes substitutions, it doesn't tell us who started each quarter. This means we don't know who was on the floor at any given point in time. However, by cross-referencing against the per-day, per-quarter lineup data, we should be able to figure this out.

First, we need to modify our Spider to fetch the lineup data.:

def parse(self, response):
    for href in response.css("a.recapAnc::attr('href')") + response.css("div.nbaFnlMnRecapDiv > a::attr('href')"):
        url = response.urljoin(href.extract())
        yield scrapy.Request(url, callback=self.parse_game_recap)
    # Create Requests for lineup data for 4 quarters, plus 10 possible overtimes
    for period in range(1,15):
        url = self.lineup_pattern % (self.date, self.date, period, self.season)
        yield scrapy.Request(url, callback=self.parse_lineups)

# Although the lineup data is a json API, we can still integrate it into our crawler
def parse_lineups(self, response):
    jsonresponse = json.loads(response.body_as_unicode())
    headers = dict([(i, str(j.lower())) for i, j in enumerate(jsonresponse['resultSets'][0]['headers'])])
    for row in jsonresponse['resultSets'][0]['rowSet']:
        item = Lineup()
        item['date'] = self.scrape_date
        item['period'] = int(re.search('Period=(\d+)', response.url).group(1))
        for index, value in enumerate(row):
            field = headers[index]
            item[field] = value
        yield item

Within the time-frame of this project, I didn't get as far as putting the lineups data together with the play-by-play data, but the basic idea would be to simulate each quarter starting with each of the lineups used in that quarter, finding the starting lineup that results in no inconsistencies in the data.

Putting it all Together

spiders/nba_spider.py

import scrapy
import re
import time
import json

from scraping.items import PlayByPlay, GameEvent, Lineup

# This is the API for play-by-play...
# http://stats.nba.com/stats/playbyplayv2?EndPeriod=10&EndRange=55800&GameID=0021500513&RangeType=2&Season=2015-16&SeasonType=Regular+Season&StartPeriod=1&StartRange=0

class NbaSpider(scrapy.Spider):
    name = "nba"
    allowed_domains = ["nba.com"]

    lineup_pattern = 'http://stats.nba.com/stats/leaguedashlineups?Conference=&DateFrom=%s&DateTo=%s&Division=&GameID=&GameSegment=&GroupQuantity=5&LastNGames=0&LeagueID=00&Location=&MeasureType=Base&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PaceAdjust=N&PerMode=PerGame&Period=%d&PlusMinus=N&Rank=N&Season=%s&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&TeamID=0&VsConference=&VsDivision='

    def __init__(self, scrape_date=None, *args, **kwargs):
        super(NbaSpider, self).__init__(*args, **kwargs)
        if scrape_date is None:
            scrape_date = str(int(time.strftime('%Y%m%d')) - 1)
        match = re.search('(\d{4})(\d{2})(\d{2})', scrape_date)
        year = int(match.group(1))
        month = int(match.group(2))
        day = int(match.group(3))
        self.date = '%02d%%2F%02d%%2F%04d' % (month, day, year)
        self.season = '%04d-%02d' % ((year, (year+1) % 100) if month > 7 else (year-1, year % 100))
        self.scrape_date = scrape_date
        self.start_urls = ['http://www.nba.com/gameline/%s/' % scrape_date]

    def parse(self, response):
        for href in response.css("a.recapAnc::attr('href')") + response.css("div.nbaFnlMnRecapDiv > a::attr('href')"):
            url = response.urljoin(href.extract())
            yield scrapy.Request(url, callback=self.parse_game_recap)
        for period in range(1,15):
            url = self.lineup_pattern % (self.date, self.date, period, self.season)
            yield scrapy.Request(url, callback=self.parse_lineups)


    def parse_game_recap(self, response):
        away = None
        home = None
        quarter = None
        date = re.search('(\d+)', response.url).group(1)
        game_id = re.search('([A-Z]+)', response.url).group(1)
        pbp_item = PlayByPlay()
        for index, row in enumerate(response.xpath('//div[@id="nbaGIPBP"]//tr')):
            if int(row.xpath('@class="nbaGIPBPTeams"').extract_first()) == 1:
                (away, home) = [x.strip() for x in row.xpath('td/text()').extract()]
            else:
                pbp_item['quarter'] = quarter
                pbp_item['game_id'] = game_id
                pbp_item['index'] = index
                pbp_item['date'] = date
                for field in row.xpath('td'):
                    field_class = str(field.xpath('@class').extract_first())
                    if field_class == 'nbaGIPbPTblHdr':
                        name = row.xpath('td/a/@name')
                        if len(name) > 0:
                            quarter = row.xpath('td/a/@name').extract_first()
                            pbp_item['quarter'] = quarter
                    elif len(field.xpath('@id')) > 0:
                        event_item = GameEvent()
                        event_item['type'] = field.xpath('@id').extract_first()
                        event_item['text'] = field.xpath('div/text()').extract_first()
                        event_item['quarter'] = quarter
                        event_item['game_id'] = game_id
                        event_item['date'] = date
                        event_item['index'] = index
                        yield event_item
                    else:
                        text = field.xpath('text()').extract_first().strip()
                        if len(text) == 0:
                            continue
                        else:
                            if field_class == 'nbaGIPbPLft' or field_class == 'nbaGIPbPLftScore':
                                pbp_item['team'] = away
                                pbp_item['text'] = text
                            elif field_class == 'nbaGIPbPRgt' or field_class == 'nbaGIPbPRgtScore':
                                pbp_item['team'] = home
                                pbp_item['text'] = text
                            elif field_class == 'nbaGIPbPMid':
                                pbp_item['clock'] = text
                            elif field_class == 'nbaGIPbPMidScore':
                                pbp_item['clock'] = text
                                pbp_item['score'] = field.xpath('text()').extract()[1].strip()
                            else:
                                raise ValueError("Unknown class: %s" % field_class)
                if 'clock' in pbp_item:
                    yield pbp_item
                    pbp_item = PlayByPlay()

    def parse_lineups(self, response):
        jsonresponse = json.loads(response.body_as_unicode())
        headers = dict([(i, str(j.lower())) for i, j in enumerate(jsonresponse['resultSets'][0]['headers'])])
        for row in jsonresponse['resultSets'][0]['rowSet']:
            item = Lineup()
            item['date'] = self.scrape_date
            item['period'] = int(re.search('Period=(\d+)', response.url).group(1))
            for index, value in enumerate(row):
                field = headers[index]
                item[field] = value
            yield item

items.py

# -*- coding: utf-8 -*-

# Define here the models for your scraped items
#
# See documentation in:
# http://doc.scrapy.org/en/latest/topics/items.html

import scrapy

class PlayByPlay(scrapy.Item):
    game_id = scrapy.Field()
    quarter = scrapy.Field()
    period = scrapy.Field()
    clock = scrapy.Field()
    score = scrapy.Field()
    team = scrapy.Field()
    text = scrapy.Field()
    index = scrapy.Field()
    date = scrapy.Field()
    events = scrapy.Field()
    seconds = scrapy.Field()
    team_abbreviation = scrapy.Field()

    def index_fields(self):
        return {
            'game_id': self['game_id'],
            'index': self['index'],
            'quarter': self['quarter'],
            'date': self['date']
         }


class GameEvent(scrapy.Item):
    type = scrapy.Field()
    text = scrapy.Field()
    quarter = scrapy.Field()
    period = scrapy.Field()
    game_id = scrapy.Field()
    index = scrapy.Field()
    date = scrapy.Field()
    events = scrapy.Field()
    clock = scrapy.Field()
    seconds = scrapy.Field()
    team_abbreviation = scrapy.Field()

    def index_fields(self):
        return {
            'game_id': self['game_id'],
            'index': self['index'],
            'quarter': self['quarter'],
            'date': self['date']
         }


class Lineup(scrapy.Item):
    group_set = scrapy.Field()
    group_id = scrapy.Field()
    group_name = scrapy.Field()
    team_id = scrapy.Field()
    team_abbreviation = scrapy.Field()
    gp = scrapy.Field()
    w = scrapy.Field()
    l = scrapy.Field()
    w_pct = scrapy.Field()
    min = scrapy.Field()
    fgm = scrapy.Field()
    fga = scrapy.Field()
    fg_pct = scrapy.Field()
    fg3m = scrapy.Field()
    fg3a = scrapy.Field()
    fg3_pct = scrapy.Field()
    ftm = scrapy.Field()
    fta = scrapy.Field()
    ft_pct = scrapy.Field()
    oreb = scrapy.Field()
    dreb = scrapy.Field()
    reb = scrapy.Field()
    ast = scrapy.Field()
    tov = scrapy.Field()
    stl = scrapy.Field()
    blk = scrapy.Field()
    blka = scrapy.Field()
    pf = scrapy.Field()
    pfd = scrapy.Field()
    pts = scrapy.Field()
    plus_minus = scrapy.Field()
    period = scrapy.Field()
    date = scrapy.Field()

    def index_fields(self):
        return {
            'group_id': self['group_id'],
            'team_id': self['team_id'],
            'date': self['date'],
            'period': self['period']
         }

pipelines.py

# -*- coding: utf-8 -*-

import pymongo
import re
# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: http://doc.scrapy.org/en/latest/topics/item-pipeline.html

class ScrapingPipeline(object):
    def process_item(self, item, spider):
        return item

class QuarterProcessor(object):
    def process_item(self, item, spider):
        if 'quarter' in item:
            m = re.match('(Q|OT|H)(\d+)', item['quarter'])
            if m.group(1) in ('Q', 'H'):
                item['period'] = int(m.group(2))
            elif m.group(1) == 'OT':
                item['period'] = int(m.group(2)) + 4
            else:
                raise ValueError("Can't process quarter: %s" % item['quarter'])
        return item

class ClockProcessor(object):
    def process_item(self, item, spider):
        if 'clock' in item:
            (minutes, seconds) = item['clock'].split(':')
            item['seconds'] = float(minutes) * 60.0 + float(seconds)
        return item

class TextProcessor(object):
    SHOT_RE = re.compile('(.+?) (((Tip|Alley Oop|Cutting|Dunk|Pullup|Turnaround|Running|Driving|Hook|Jump|3pt|Layup|Fadeaway|Bank|No) ?)+) [Ss]hot: (Made|Missed)( )?')
    REBOUND_RE = re.compile('(.+?) Rebound ')
    TEAM_REBOUND_RE = re.compile('Team Rebound')
    DEFENSE_RE = re.compile('(Block|Steal): ?(.+?) ')
    ASSIST_RE = re.compile('Assist: (.+?) ')
    TIMEOUT_RE = re.compile('Team Timeout : (Short|Regular|No Timeout|Official)')
    TURNOVER_RE = re.compile('(.+?) Turnover : ((Out of Bounds|Poss)? ?(- )?(Punched Ball|5 Second|Out Of Bounds|Basket from Below|Illegal Screen|No|Swinging Elbows|Double Dribble|Illegal Assist|Inbound|Palming|Kicked Ball|Jump Ball|Lane|Backcourt|Offensive Goaltending|Discontinue Dribble|Lost Ball|Foul|Bad Pass|Traveling|Step Out of Bounds|3 Second|Offensive Foul|Player Out of Bounds)( Violation)?( Turnover)?) ')
    TEAM_TURNOVER_RE = re.compile('Team Turnover : ((8 Second Violation|5 Sec Inbound|Backcourt|Shot Clock|Offensive Goaltending|3 Second)( Violation)?( Turnover)?)')
    FOUL_RE = re.compile('(.+?) Foul: (Clear Path|Flagrant|Away From Play|Personal Take|Inbound|Loose Ball|Offensive|Offensive Charge|Personal|Shooting|Personal Block|Shooting Block|Defense 3 Second)( Type (\d+))? ( )? ')
    JUMP_RE = re.compile('Jump Ball (.+?) vs (.+)( )?')
    VIOLATION_RE = re.compile('(.+?) Violation:(Defensive Goaltending|Kicked Ball|Lane|Jump Ball|Double Lane)( )?')
    FREE_THROW_RE = re.compile('(.+?) Free Throw (Flagrant|Clear Path)? ?(\d) of (\d) (Missed)? ?()?')
    TECHNICAL_FT_RE = re.compile('(.+?) Free Throw Technical (Missed)? ?()?')
    SUB_RE = re.compile('(.+?) Substitution replaced by (.+?)$')
    TEAM_VIOLATION_RE = re.compile('Team Violation : (Delay Of Game) ')
    CLOCK_RE = re.compile('')
    TEAM_RE = re.compile('

') TECHNICAL_RE = re.compile('(.+?) Technical (- )?([A-Z]+)? ?') DOUBLE_TECH_RE = re.compile('Double Technical - (.+?), (.+?) ') DOUBLE_FOUL_RE = re.compile('Foul : (Double Personal) - (.+?) , (.+?) ') EJECTION_RE = re.compile('(.+?) Ejection:(First Flagrant Type 2|Second Technical|Other)') # pts, tov, fta, pf, blk, reb, blka, ftm, fg3a, pfd, ast, fg3m, fgm, dreb, fga, stl, oreb def process_item(self, item, spider): text = item.get('text', None) if text: item['events'] = [] while text: l = len(text) m = self.SHOT_RE.match(text) if m: event = {'player': m.group(1), 'fga': 1, 'type': m.group(2)} if '3pt' in m.group(2): event['fg3a'] = 1 if m.group(5) == 'Made': event['fg3m'] = 1 event['fgm'] = 1 event['pts'] = 3 else: if m.group(5) == 'Made': event['fg3m'] = 1 event['fgm'] = 1 event['pts'] = 2 item['events'].append(event) text = text[m.end():].strip() m = self.REBOUND_RE.match(text) if m: event = {'player': m.group(1), 'reb': 1} item['events'].append(event) text = text[m.end():].strip() m = self.DEFENSE_RE.match(text) if m: event = {'player': m.group(2)} if m.group(1) == 'Block': item['events'][-1]['blka'] = 1 event['blk'] = 1 else: event['stl'] = 1 item['events'].append(event) text = text[m.end():].strip() m = self.ASSIST_RE.match(text) if m: event = {'player': m.group(1), 'ast': 1} item['events'].append(event) text = text[m.end():].strip() m = self.TIMEOUT_RE.match(text) if m: event = {'timeout': m.group(1)} item['events'].append(event) text = text[m.end():].strip() m = self.TURNOVER_RE.match(text) if m: event = {'player': m.group(1), 'tov': 1, 'note': m.group(2)} item['events'].append(event) text = text[m.end():].strip() m = self.TEAM_TURNOVER_RE.match(text) if m: event = {'turnover': m.group(1)} item['events'].append(event) text = text[m.end():].strip() m = self.TEAM_REBOUND_RE.match(text) if m: item['events'].append({'rebound': 'team'}) text = text[m.end():].strip() m = self.FOUL_RE.match(text) # TODO: Are all of these actual personal fouls? if m: event = {'player': m.group(1), 'pf': 1, 'note': m.group(2)} if m.group(4): event['type'] = m.group(4) item['events'].append(event) text = text[m.end():].strip() m = self.DOUBLE_FOUL_RE.match(text) if m: item['events'].append({'player': m.group(2), 'pf': 1, 'note': m.group(1), 'against': m.group(3)}) item['events'].append({'player': m.group(3), 'pf': 1, 'note': m.group(1), 'against': m.group(2)}) text = text[m.end():].strip() m = self.JUMP_RE.match(text) if m: item['events'].append({'player': m.group(1), 'jump': 'home'}) item['events'].append({'player': m.group(2), 'jump': 'away'}) if m.group(3): item['events'].append({'player': m.group(4), 'jump': 'possession'}) text = text[m.end():].strip() m = self.VIOLATION_RE.match(text) if m: event = {'player': m.group(1), 'violation': m.group(2)} item['events'].append(event) text = text[m.end():].strip() m = self.FREE_THROW_RE.match(text) if m: event = {'player': m.group(1), 'fta': 1, 'attempt': m.group(3), 'of': m.group(4)} if m.group(5) is None: event['pts'] = 1 event['ftm'] = 1 if m.group(2): event['special'] = m.group(2) item['events'].append(event) text = text[m.end():].strip() m = self.TECHNICAL_FT_RE.match(text) if m: event = {'player': m.group(1), 'fta': 1, 'ftm': 1, 'special': 'Technical'} if m.group(2) is None: event['pts'] = 1 event['ftm'] = 1 item['events'].append(event) text = text[m.end():].strip() m = self.SUB_RE.match(text) if m: item['events'].append({'player': m.group(1), 'sub': 'out'}) item['events'].append({'player': m.group(2), 'sub': 'in'}) text = text[m.end():].strip() m = self.TEAM_VIOLATION_RE.match(text) if m: item['events'].append({'violation': m.group(1)}) text = text[m.end():].strip() m = self.CLOCK_RE.match(text) if m: item['clock'] = m.group(1) text = text[m.end():].strip() m = self.TEAM_RE.match(text) if m: item['team_abbreviation'] = m.group(1) text = text[m.end():].strip() m = self.TECHNICAL_RE.match(text) if m: if m.group(3): item['events'].append({'team': m.group(3), 'technical': m.group(1)}) else: item['events'].append({'player': m.group(1), 'technical': True}) text = text[m.end():].strip() m = self.DOUBLE_TECH_RE.match(text) if m: item['events'].append({'player': m.group(1), 'technical': True}) item['events'].append({'player': m.group(2), 'technical': True}) text = text[m.end():].strip() m = self.EJECTION_RE.match(text) if m: item['events'].append({'player': m.group(1), 'ejection': True, 'note': m.group(2)}) text = text[m.end():].strip() if len(text) == l: raise ValueError('Could not parse text: %s' % text) if len(text) == 0: text = None return item #TODO, figure out offensive/defensive rebounds... we need to know teams for that class MongoPipeline(object): def __init__(self, mongo_uri, mongo_db): self.mongo_uri = mongo_uri self.mongo_db = mongo_db @classmethod def from_crawler(cls, crawler): return cls( mongo_uri=crawler.settings.get('MONGO_URI'), mongo_db=crawler.settings.get('MONGO_DATABASE', 'items') ) def open_spider(self, spider): self.client = pymongo.MongoClient(self.mongo_uri) self.db = self.client[self.mongo_db] def close_spider(self, spider): self.client.close() def process_item(self, item, spider): self.db[item.__class__.__name__].replace_one(item.index_fields(), dict(item), True) return item

settings.py

BOT_NAME = 'scraping'

SPIDER_MODULES = ['scraping.spiders']
NEWSPIDER_MODULE = 'scraping.spiders'

MONGO_URI = 'localhost:27017'
MONGO_DATABASE = 'nba'

ITEM_PIPELINES = {
    'scraping.pipelines.QuarterProcessor': 100,
    'scraping.pipelines.ClockProcessor': 102,
    'scraping.pipelines.TextProcessor': 101,
    'scraping.pipelines.MongoPipeline': 300
}

scrape_season.py

#!/usr/bin/env python

import sys
import os

season = int(sys.argv[1])

for year in (season, season+1):
    months = range(9, 13) if season == year else range(1, 8)
    for month in months:
        for day in range(1, 32):
            os.system('scrapy crawl nba -a scrape_date=%04d%02d%02d' % (year, month, day))

Next Steps

Moving forward, I'll probably switch from scraping the play-by-play data to using the API. However, I'm optimistic that much of the code for parsing the text will still be applicable. I have observed some differences between the API text and the text on the recap pages.

Once that switch is made, I'll need to integrate the play-by-play and lineup data. This will provide me with a data set where for every play I have both what happened and who was on the floor (offense and defense). This opens up a lot of possibilities.

The supreme goal is to predict the probabilities of various outcomes for a given lineup. However, this data can also be used to answer a lot of other questions. For example, a recent ESPN article looked at the impact of exhaustion on team performance. With this data set, we can investigate this at the lineup level, seeing how lineup-level performance is impacted by the minutes played.

About Author

Tom Walsh

Tom Walsh (M.Sc. Computer Science, University of Toronto) developed a desire to get deeper into the data while leading a team of developers at BSports building Scouting Information Systems for Major League Baseball teams. A course on Basketball...
View all posts by Tom Walsh >

Leave a Comment

Cancel reply

You must be logged in to post a comment.

Mytvxweb Iptv Donation January 4, 2018
Attractive component of content. I just stumbled upon your web site and in accession capital to say that I get in fact enjoyed account your blog posts. Anyway I will be subscribing to your feeds and even I achievement you access persistently quickly.
solitaire December 22, 2017
Tremendous things here. I'm very satisfied to see your article. Thank yyou a lot and I'm having a look forward to touch you. Wiill you please drop me a e-mail?
Fran April 24, 2017
Thanks for sharing this great work !
Rebecca July 11, 2016
I was just looking at your Scraping NBA Play-by-Play Data with Scrapy & MongoDB - NYC Data Science Academy BlogNYC Data Science Academy Blog website and see that your site has the potential to get a lot of visitors. I just want to tell you, In case you don't already know... There is a website service which already has more than 16 million users, and most of the users are interested in websites like yours. By getting your site on this network you have a chance to get your site more visitors than you can imagine. It is free to sign up and you can find out more about it here: http://ezurl.dk/gfc8 - Now, let me ask you... Do you need your site to be successful to maintain your way of life? Do you need targeted visitors who are interested in the services and products you offer? Are looking for exposure, to increase sales, and to quickly develop awareness for your website? If your answer is YES, you can achieve these things only if you get your site on the network I am talking about. This traffic service advertises you to thousands, while also giving you a chance to test the service before paying anything at all. All the popular websites are using this network to boost their traffic and ad revenue! Why arenโ€™t you? And what is better than traffic? Itโ€™s recurring traffic! That's how running a successful site works... Here's to your success! Find out more here: http://inflightvideo.tv/a/b
http://www.marbellamoving.com/sv/ifk-goteborg-matchtroja/ March 30, 2016
http://www.marbellamoving.com/sv/ifk-goteborg-matchtroja/, http://www.marbellamoving.com/sv/troja-engelska/, http://www.marbellamoving.com/sv/tjock-troja/ Leta, Leta, Leta, Leta, Leta, Leta, Leta, Leta, Leta, Leta,

View Posts by Categories

All Posts 2399 posts
AI 7 posts
AI Agent 2 posts
AI-based hotel recommendation 1 posts
AIForGood 1 posts
Alumni 60 posts
Animated Maps 1 posts
APIs 41 posts
Artificial Intelligence 2 posts
Artificial Intelligence 2 posts
AWS 13 posts
Banking 1 posts
Big Data 50 posts
Branch Analysis 1 posts
Capstone 206 posts
Career Education 7 posts
CLIP 1 posts
Community 72 posts
Congestion Zone 1 posts
Content Recommendation 1 posts
Cosine SImilarity 1 posts
Data Analysis 5 posts
Data Engineering 1 posts
Data Engineering 3 posts
Data Science 7 posts
Data Science News and Sharing 73 posts
Data Visualization 324 posts
Events 5 posts
Featured 37 posts
Function calling 1 posts
FutureTech 1 posts
Generative AI 5 posts
Hadoop 13 posts
Image Classification 1 posts
Innovation 2 posts
Kmeans Cluster 1 posts
LLM 6 posts
Machine Learning 364 posts
Marketing 1 posts
Meetup 144 posts
MLOPs 1 posts
Model Deployment 1 posts
Nagamas69 1 posts
NLP 1 posts
OpenAI 5 posts
OpenNYC Data 1 posts
pySpark 1 posts
Python 16 posts
Python 458 posts
Python data analysis 4 posts
Python Shiny 2 posts
R 404 posts
R Data Analysis 1 posts
R Shiny 560 posts
R Visualization 445 posts
RAG 1 posts
RoBERTa 1 posts
semantic rearch 2 posts
Spark 17 posts
SQL 1 posts
Streamlit 2 posts
Student Works 1687 posts
Tableau 12 posts
TensorFlow 3 posts
Traffic 1 posts
User Preference Modeling 1 posts
Vector database 2 posts
Web Scraping 483 posts
wukong138 1 posts

Our Recent Popular Posts

AI 4 AI: ChatGPT Unifies My Blog Posts
by Vinod Chugani
Dec 18, 2022
Meet Your Machine Learning Mentors: Kyle Gallatin
by Vivian Zhang
Nov 4, 2020
NICU Admissions and CCHD: Predicting Based on Data Analysis
by Paul Lee, Aron Berke, Bee Kim, Bettina Meier and Ira Villar
Jan 7, 2020

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day ChatGPT citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay football gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income industry Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI

NYC Data Science Academy

NYC Data Science Academy teaches data science, trains companies and their employees to better profit from data, excels at big data project consulting, and connects trained Data Scientists to our industry.

NYC Data Science Academy is licensed by New York State Education Department.

Get detailed curriculum information about our
amazing bootcamp!

Please enter a valid email address
Sign up completed. Thank you!

Offerings

  • HOME
  • DATA SCIENCE BOOTCAMP
  • ONLINE DATA SCIENCE BOOTCAMP
  • Professional Development Courses
  • CORPORATE OFFERINGS
  • HIRING PARTNERS
  • About

  • About Us
  • Alumni
  • Blog
  • FAQ
  • Contact Us
  • Refund Policy
  • Join Us
  • SOCIAL MEDIA

    ยฉ 2025 NYC Data Science Academy
    All rights reserved. | Site Map
    Privacy Policy | Terms of Service
    Bootcamp Application