NYC Data Science Academy| Blog
Bootcamps
Lifetime Job Support Available Financing Available
Bootcamps
Data Science with Machine Learning Flagship 🏆 Data Analytics Bootcamp Artificial Intelligence Bootcamp New Release 🎉
Free Lesson
Intro to Data Science New Release 🎉
Find Inspiration
Find Alumni with Similar Background
Job Outlook
Occupational Outlook Graduate Outcomes Must See 🔥
Alumni
Success Stories Testimonials Alumni Directory Alumni Exclusive Study Program
Courses
View Bundled Courses
Financing Available
Bootcamp Prep Popular 🔥 Data Science Mastery Data Science Launchpad with Python View AI Courses Generative AI for Everyone New 🎉 Generative AI for Finance New 🎉 Generative AI for Marketing New 🎉
Bundle Up
Learn More and Save More
Combination of data science courses.
View Data Science Courses
Beginner
Introductory Python
Intermediate
Data Science Python: Data Analysis and Visualization Popular 🔥 Data Science R: Data Analysis and Visualization
Advanced
Data Science Python: Machine Learning Popular 🔥 Data Science R: Machine Learning Designing and Implementing Production MLOps New 🎉 Natural Language Processing for Production (NLP) New 🎉
Find Inspiration
Get Course Recommendation Must Try 💎 An Ultimate Guide to Become a Data Scientist
For Companies
For Companies
Corporate Offerings Hiring Partners Candidate Portfolio Hire Our Graduates
Students Work
Students Work
All Posts Capstone Data Visualization Machine Learning Python Projects R Projects
Tutorials
About
About
About Us Accreditation Contact Us Join Us FAQ Webinars Subscription An Ultimate Guide to
Become a Data Scientist
    Login
NYC Data Science Acedemy
Bootcamps
Courses
Students Work
About
Bootcamps
Bootcamps
Data Science with Machine Learning Flagship
Data Analytics Bootcamp
Artificial Intelligence Bootcamp New Release 🎉
Free Lessons
Intro to Data Science New Release 🎉
Find Inspiration
Find Alumni with Similar Background
Job Outlook
Occupational Outlook
Graduate Outcomes Must See 🔥
Alumni
Success Stories
Testimonials
Alumni Directory
Alumni Exclusive Study Program
Courses
Bundles
financing available
View All Bundles
Bootcamp Prep
Data Science Mastery
Data Science Launchpad with Python NEW!
View AI Courses
Generative AI for Everyone
Generative AI for Finance
Generative AI for Marketing
View Data Science Courses
View All Professional Development Courses
Beginner
Introductory Python
Intermediate
Python: Data Analysis and Visualization
R: Data Analysis and Visualization
Advanced
Python: Machine Learning
R: Machine Learning
Designing and Implementing Production MLOps
Natural Language Processing for Production (NLP)
For Companies
Corporate Offerings
Hiring Partners
Candidate Portfolio
Hire Our Graduates
Students Work
All Posts
Capstone
Data Visualization
Machine Learning
Python Projects
R Projects
About
Accreditation
About Us
Contact Us
Join Us
FAQ
Webinars
Subscription
An Ultimate Guide to Become a Data Scientist
Tutorials
Data Analytics
  • Learn Pandas
  • Learn NumPy
  • Learn SciPy
  • Learn Matplotlib
Machine Learning
  • Boosting
  • Random Forest
  • Linear Regression
  • Decision Tree
  • PCA
Interview by Companies
  • JPMC
  • Google
  • Facebook
Artificial Intelligence
  • Learn Generative AI
  • Learn ChatGPT-3.5
  • Learn ChatGPT-4
  • Learn Google Bard
Coding
  • Learn Python
  • Learn SQL
  • Learn MySQL
  • Learn NoSQL
  • Learn PySpark
  • Learn PyTorch
Interview Questions
  • Python Hard
  • R Easy
  • R Hard
  • SQL Easy
  • SQL Hard
  • Python Easy
Data Science Blog > Capstone > The Convenience Factor: How Grocery Stores Impact Property Values

The Convenience Factor: How Grocery Stores Impact Property Values

dimmediato
Posted on Jan 3, 2025

Residential Valuation Using Machine Learning

What determines a property’s value? Conventional wisdom often places the blame on market dynamics—the ever-changing conditions based on supply and demand, interest rates, and economic stability. Still, property valuation, or the assessment of the value of a property, is usually more nuanced. It takes into many factors, each having varied levels of an impact. Internal factors like square footage, age, condition, and features like the number of bedrooms, bathrooms are also key components in determining a property’s price. However, what of external factors: the amenities surrounding locations that buyers have to assess in determining if the area is where they want to live?

Parents might want schools to be closer so that their children could walk there, individuals without a personal vehicle might want a home closer to a transportation hub (busses, trains, subways, etc.), and others might be more interested in parks or employment hubs. With the advent of big data and machine learning, there is an opportunity to more precisely quantify the impact of these Points of Interest (POIs) on residential valuation. Using methods like Gradient Boosting or Random Forest, one can predict property prices with higher accuracy than traditional hedonic pricing models. However, is there a group of external POIs that stands out against the rest?

Our github includes a full write-up and code. This is a condensed version that does not encompass all aspects that were covered in the project. Distinctions are mentioned (Img. by Clem Onojeghuo).

What is the Convenience Factor?

Whatever amenity it may be, one is always recognized as an essential service among the others: the convenience factor—grocery stores and hybrid shopping centers where essential goods and services required for modern-day living are carried. In other words, without modern grocery shopping centers, people wouldn’t have this convenience factor and that would severely hamper their every day functioning. At best, a home owner would have to shop at a local farm, at worst, they would have to travel for miles to get essential supplies. As an external influence, we wanted to see how grocery stores influenced current listing prices.

Research Design

It can generally be asserted that sale price is affected in some way by distance from grocery stores. This is based on conclusions from other external amenities, as well as prior research. Therefore, we decided to measure the factor's influence by distance, or "residence's proximity to the closest POI", and density, the number of POIs within a certain meter radius.  Custom categories for the grocery stores were created, split between traditional and non-traditional grocery stores:

  1. Supermarkets: A full service grocery stores that often sells a variety of non-food products as well. These are almost always part of a chain (e.g., Publix, Harris Teeter, Piggly Wiggly, BiLo, Ingles, Bells, Earthfare).
  2. Variety Stores: Retailers that sell inexpensive items, typically with a single price point for all products (e.g., Dollar Tree, Family Dollar, Dollar General).
  3. Supercenter: Non-traditional, large food-and-drug store combinations that also sell mass merchandise. At least 40% of the products are devoted to groceries (e.g., Target, Walmart).
  4. Convenience Store: Non-traditional, limited stores that sell a variety of general merchandise, including packaged food products (e.g., 7-Eleven, Quick Trip).
  5. Warehouse Club: Non-traditional membership-retailer hybrids that sell bulk products in a warehouse environment, with at least 40% of products devoted to groceries (e.g., Costco, Sam’s Club, BJ’s).

 

Leaflet: Making an App for Data Collection

The problem with collecting the data for distance and density was that the data wasn't readily available to us. Therefore, we created an app to easily collect the distance and density data for us. Leaflet is a JavaScript open-source library for interactive maps, and can be utilized using HTML if we had our data in a GeoJSON format. The map in question being used was OpenStreetMap API, a geographic database available to anyone and updated by the community. The HTML code had to be crafted around our dataset and around the API.

First, the actual map had to be created using a script that set the parameters to the Atlanta, Georgia Metropolitan Region. This was the location for our case study for price valuation. Then, the POIs and the "query" options had to be added. OpenStreetMap uses Overpass API for map point data, usually reading as something like "brand=Publix" or "amenity=restaurant". These are called "tags". Our categories were designed to align with Overpass, reading as 'shop=supermarket', 'shop=variety_store', 'shop=wholesale', 'shop=convenience', as well as brand names for stores that could be classified as supercenters, such as Target, Walmart, and, technically, Amazon and Aldi. However, there weren’t enough of these stores in the region to be considered relevant.

(Fig. 1 shows the distance-based map for a specific address near Cleveland Ave SW, Atlanta, GA. When a user clicks on a blue location marker, the app displays the address, the name of the nearest POI (in this case, a Kroger supermarket, and the distance to that POI., which would be downloaded into a CSV file and used as data. Note that there is a 'shopping cart' icon at East Point that does not have the grocery store marker. This is because we were searching for supermarkets, and the store at that location is a Walmart, classified as a supercenter. Certain brands can be excluded from Overpass searches, or entirely excluded in the code.)

 

Acquiring the density required a modification from the distance code, as it was now dealing with three different radii: a one-mile, a three-mile, and a five-mile radius. Rather than finding the "nearest POI", the app would now find the number of queried POIs within the chosen radius.

 

(Fig. 2 shows the density-based Overpass search of what would be downloaded to a CSV and used as data. One may note that a Kroger is closest, but only the number of POIs are shown and reported.)

 

Once this process was completed for the distance and the radii, the data was combined with the original dataset based on the type of POIs that were searched. This created twenty features based on the five we outlined, each grouped by whether they are distance-based or density-based.

 

Data Analysis

There were two goals for the data analysis. The first was to see if the scores for the distance and the density results were better than the "baseline". The baseline is defined as the features without any of the testing features, namely the grocery stores. The second is to see the impact both distance and density has. In other words, does density matter more than distance or vice versa?

Distance-Based Scores

Only one set of results had to be reported for the distance-based models. In our full code, which can be seen through the github, we used five different methods to acquire the scores using four different metrics. These were XGBoost, Random Forest, Multiple Linear Regression, Lasso Regression, and Ridge Regression. However, it was discovered that XGBoost had the highest score out of all of the models used.

XGBoost models are non-linear and capable of capturing complex relationships between the features and target variable. MLR, being a linear statistical model, assumes linearity and struggles with large datasets. It had the lowest scores of the methods used. The Random Forest models were computationally expensive, with certain parameters significantly affecting the results and requiring considerable time to tune. Therefore, for simplicity's sake, only the XGBoost shall be shown.

The baseline serves as a confirmation of our null hypothesis, which is that "neither proximity nor the amount of grocery stores affects housing prices". If the scores are not better than the baseline, then it shows that the features aren't impacting the dataset enough to be relevant.

Table 1 shows the scores for the baseline. The scores were retrieved through a three-way split, which produces a test set, a train-val set, and a validation set. The test set is the held-out portion of the data that the model is never trained on during tuning. It gives the best measure of the model's performance. The validation set is used for tuning, as it is 20% of the data. The train-val set is used for final training once the best model is selected.

We can see here that there is some overfitting because the train-val score is different from the test score. However, this was originally done intentionally as trying to get the results closer would just lower the test score even more. If any model fails to outperform the baseline, it indicates that the new features did not improve predictive power. Worse performance would suggest that the added complexity introduced more noise rather than enhancing predictions.

Table 1: Score Comparison for XGBoost Baseline

Metric Test Set Train-Val Set Validation Set
R2 0.4764 0.5805 0.5875
MSE 441042.7435 344791.1960 385699.1357
RMSE 664.1105 587.1892 621.0468
MAPE 0.1686 0.1561 0.1585

 

Table 2 is our distance-based model. Immediately, we notice that every single metric is superior than the baseline in every aspect. Typically, a higher R² is desired, but it’s important to note that R² does not measure predictive power. The MAPE of 14.93% indicates that the model's predictions deviate from the actual values by an average of 14.93%, which is more informative. The RMSE alone is not particularly insightful, as its value is influenced by the scale of the target data. Therefore, it is more useful when compared to the RMSE values of other models.

Table 2: Score Comparison for XGBoost Distance

Metric Test Set Train-Val Set Validation Set
R2 0.6042 0.7165 0.7170
MSE 333416.2553 232965.2795 264625.4240
RMSE 577.4221 482.6648 514.4176
MAPE 0.1493 0.1319 0.1334

Density Based Scores   

Unlike with distance, density has three models for the XGBoost since it had to accommodate for each radius change. The one-mile radius had the fewest POIs, simply by default, and it also had the highest number of locations with zero POIs within the selected radius, which can significantly impact the results. Table 3 is the first of the density models, displaying the one-mile results. Its scores are lower than the distance-based results, yet higher than the baseline.

Table 3: Score Comparison for XGBoost One Mile

Metric Test Set Train-Val Set Validation Set
R2 0.5394 0.6371 0.6300
MSE 387973.6284 298231.6637 345940.0183
RMSE 622.8753 546.1059 588.1667
MAPE 0.1613 0.1478 0.1507

 

Table 4 displays the three-mile density scores. The expectation is that, as the radius increases is parameter, then the scores will increase. In our full write-up, the three-mile models were our cut-off. It sometimes surpassed the distance-based model, depending on which model was being used. As for the XGBoost model, the difference isn't much. It surpasses it minutely for R-Squared as well as MAPE. As expected, though, the three-mile model does improve off of the one-mile model, at least visually. Therefore, it is also superior to the baseline model.

Table 4: Score Comparison for XGBoost Three Miles

Metric Test Set Train-Val Set Validation Set
R2 0.6065 0.7330 0.7355
MSE 331476.7945 219402.3056 247331.0408
RMSE 575.7402 468.4040 497.3239
MAPE 0.1451 0.1281 0.1292

 

The last of the density models, the five-mile model, shows less of an increase in score from the one-mile to three-mile scores. However, according to the full write-up, it is by-far the best scoring model of all the ones tested. It notably surpasses the distance-based model, the baseline, and all other density-based models.

Table 5: Score Comparison for XGBoost Five Miles

Metric Test Set Train-Val Set Validation Set
R2 0.6307 0.7382 0.7497
MSE 311078.9243 215139.6725 234024.0497
RMSE 557.7445 463.8315 483.7603
MAPE 0.1417 0.1265 0.1287

 

Score Evaluation

Model Comparison

The distance-based model really only was better than the baseline and one-mile model. As the radius increased, so did the score, but this increased started to plateau from the three-mile to five-mile models. Because the distance-based models outperformed the one-mile models, we can attempt to draw a few conclusions. One possibility is that distance plays a more significant role, initially, when fewer grocery stores are considered. It is important to note that many houses had no grocery stores within one mile (for certain categories), which inherently gives the distance-based models an advantage.

Tables 6-9 demonstrates the statistical significance between the distance and density models and the baseline. For every single model, it is shown that there is a statistically significant difference, with, as expected, the five-mile model having the strongest difference. Also expected is that the weakest difference is the one-mile model, although it still counts as statistically significant. All metrics for all models are at least "weak" for substantively significant, with it becoming medium-to-strong for the three-mile and five-mile models.

Table 6: Significance Between Distance and the Baseline XGBoost Scores

Metric T-Stat P-Value Cohen's d
R² 12.98040 0.0 4.10476
MSE 12.91056 0.0 4.08268
MAPE 24.89549 0.0 7.87265
RMSE 14.50156 0.0 4.58580

Table 7: Significance Between Density One Mile and the Baseline XGBoost Scores

Metric T-Stat P-Value Cohen's d
R² 10.14706 0.0 3.20878
MSE 10.63525 0.0 3.36316
MAPE 15.71720 0.0 4.97021

Table 8: Significance Between Density Three Miles and the Baseline XGBoost Scores

Metric T-Stat P-Value Cohen's d
R² 14.88335 0.0 4.70653
MSE 12.73996 0.0 4.02873
MAPE 32.45928 0.0 10.26453
RMSE 15.07872 0.0 4.76831

Table 9: Significance Between Density Five Miles and the Baseline XGBoost Scores

Metric T-Stat P-Value Cohen's d
R² 20.24031 0.0 6.40055
MSE 12.79712 0.0 4.04680
MAPE 38.37219 0.0 12.13435
RMSE 17.27271 0.0 5.46211

 

Finally, we want to check our assumption that the three-mile model is superior to the distance-based model. Table 10 shows us that, despite the small differences, it is still statistically significant. However, it is the weakest presented so far.

Table 10: Significance Between Distance and Density Three Miles XGBoost Scores

Metric T-Stat P-Value Cohen's d
R² -4.90789 0.00084 -1.55201
MSE -4.08738 0.00273 -1.29254
MAPE -5.84076 0.00025 -1.84701
RMSE -4.44782 0.00161 -1.40652

Data Analysis: Feature Impact

Feature Importance

We now know if the models are significant, which models are superior to the others, and if they improve over the baseline. However, one of the goals was to see which features impacted housing prices. Feature importance is how much each feature contributes to the model's predictions. XGBoost focuses on improvement over folds more, while the Random Forest focuses more on reduction in impurity over fold splits. Regardless, both can be used for feature ranking, and a high importance means that feature significantly affects performance.

Figure 3 shows the feature importance for the baseline model. We have Living Area as the most important feature, followed by Total Bathrooms, Year Built, and Total Bedrooms. Figure 4 then shows the XGBoost distance model. The baseline features outperform the distance-based features, with the first distance feature to appear being Wholesale. 'Variety Store' exhibits low feature importance, indicating that it contributes minimally to the model's predictive power. We also note that, overall, the XGBoost model has less maximum importance than the baseline.

(Fig. 3: Baseline XGBoost Model Feature Importance)

(Fig. 4: Distance XGBoost Model Feature Importance)

 

The density models give us clearer insights into the makeup of the data itself. Figure 5 tells us that supercenters and wholesale stores not only have the least amount of predictive power, but according to the Random Forest version (see write-up), wholesale doesn’t have influence at one mile radius. This is simply explained that there are not enough wholesale stores to make an impact in regards to predictive power. More importantly, we now see that supermarkets and convenience stores have actually managed to surpass baseline features even at one mile. In Figure 6, this effect becomes even more profound, with the supermarket feature rapidly increasing its importance as the radius increases.

(Fig. 5: XGBoost Density One Mile Feature Importance)

(Fig. 6: XGBoost Density Five Miles Feature Importance)

Partial Dependency

While feature importance helps us get a vague assessment of predictive power, it gives us little insight on each feature’s impact on the actual target. We want to know predicted outcomes as the values change, which would give us an actual explanation as to why the wholesale feature was so low. A partial dependence plot (PDP) is a way of isolating features to see how certain features impacts a model's output. If a value increases/decreases, does the predicted outcome also increase/decrease in turn? Not only will we know how much a feature contributes to predictions, but the PDP will show us how it contributes, such as whether it is linear, exponential, or if it varies at certain thresholds.

Figure 7 presents the PDP for both the baseline and distance features, using the XGBoost model and the train-validation set. In these plots, the Y-axis represents the predicted outcome, influenced by the features shown on the X-axis—specifically, the average housing prices. This visualization allows us to observe how predictions change as the values of the features vary. For instance, an increase in living area corresponds to a rising trend in predicted housing prices.

(Fig. 7: XGBoost PDP of Distance Features)

 

The density PDPs yield similar results to the distance PDPs, but with an opposite trend, as a higher number of stores within a given mile radius corresponds to higher values. For example, in Figure 8, we observe that supermarkets exhibit a steady increase in predicted housing prices. In other words, the greater the number of supermarkets within one mile, the more likely housing prices are to be higher.

(Fig. 8: XGBoost PDP of Density One Mile Features)

 

Given that the distance scores surpassed the density-one model in the score comparison, it may be a less reliable predictor of housing prices. Fortunately, Figure 9 presents a more concise version of Figure 8, clearly showing an increase in predicted housing prices as the number of POIs increases. The wholesale feature decreases at one, which is explained by Figure 8, as no feature had more than one wholesale store within one mile. Interestingly, variety stores also continue their downward trend, potentially indicating a negative externality.

(Fig. 9: XGBoost PDP of Density Five Miles Features)

SHAP Impact

Now that we know how each of the features affect the predicted outcome, we can also show the overall impact of the features using the SHAP (SHapley Additive exPlanations) plot. Essentially, we want to see if a feature changes the target value in a certain direction for every observation reported. Features with a positive value increase the target, and those with a negative value decrease the target.

SHAP measures feature importance differently, and therefore ranks the features differently. The ranking reflects the average impact of each feature on predictions, instead of frequency associated with the feature across all tree splits as with the XGBoost. In the write-up, its baseline was not varied enough to require making a separate graph, showing that it wasn't effected by multicollinearity.

Figure 10 shows a continuation of the trend shown by the PDPs. For at least supercenters, supermarkets, and convenience stores, the closer the store is to a property, the greater the increase in housing value. Conversely, the opposite trend seems to hold true for supercenters, where stores located farther away appear to negatively impact predicted values.

(Fig. 10: SHAP Summary Plot of Distance Features)

 

Fig. 11 once again shows a reverse in trend for the features, only because a “red” high value would now mean something positive for the impact. We can see that, even at one mile, supermarkets and convenience stores lead to greater predicted housing value, if there are more of them. However, the results are inconclusive for the rest of the stores, even though variety stores do have a number of stores within one mile.

(Fig. 11: SHAP Summary Plot of Density One Mile Features)

 

If we observe the five miles plot, the impact is more apparent. Fig. 12 shows that supermarkets have become the most important feature, surpassing even living area. Convenience stores keep with the same trend, and supercenters start to show that they very slightly have a positive impact on housing values. However, for the second time, as Fig. 9 demonstrated, the number of variety stores seem to have a negative impact on housing values. Although a concrete determination cannot be made, preliminary results would also indicate that the number of wholesale stores also have a negative impact on the target.

(Fig. 12: SHAP Summary Plot of Density Five Miles Features)

Conclusion

The null hypothesis posited that grocery stores do not affect housing prices. Table 12 demonstrated a significant difference between the baseline model and the features representing the distance between the POIs and house locations. Table 2 revealed that the distance-based models outperformed the baseline scores (Table 1). Furthermore, Figures 7 and 10 illustrated the importance and impact of each distance feature on housing prices, showing that, for most features, proximity to the store is associated with an increase in predicted housing values.

The alternative hypothesis was that both proximity and density affected property values. Tables 7-9 confirmed the latter part of the hypothesis, demonstrating a significant difference between the density models and the baseline model. Figures 9 and 12 indicated a general increase in the predicted housing values as the values of the features increased. Assuming the calculation is based on average impact rather than frequency or gain, supermarkets even emerged as the most significant factor.

Overall, it appears that the null hypothesis can be rejected based  on our findings. The general trend indicates that the closer a grocery store is to a house, the higher the predicted housing values. Furthermore, an increased number of grocery stores in proximity to the house may also lead to higher predicted values. It would seem that supermarkets and convenience stores have a tendency to be a stronger predictive factor than the other features. The only exceptions to this trend are variety stores and potentially wholesale stores, which may actually decrease housing values as their presence increases. Supercenters seem to give mixed results, showing a negative impact for distance, but not for density.

Therefore, an ideal housing location that would maximize price would be one situated in what is commonly referred to as a "convenient" area—one that is close to supermarkets and convenience stores, with a variety of these establishments nearby. The distinction between "traditional" and "non-traditional" categories does not appear to significantly impact the results, as both types of features were drawn from different categories. Additionally, it seems that being too close to a supercenter, and especially a variety store, may actually decrease property value; however, this could be an indirect effect, possibly influenced by literature suggesting that lower-income brackets tend to prefer these types of locations.

About Author

dimmediato

Hello, my name is Daniel. I'm a data enthusiast with a passion for uncovering insights through analytics and machine learning. My projects have ranged from tackling Kaggle challenges—like my Ames, Iowa housing analysis—to building interactive R-Shiny applications that...
View all posts by dimmediato >

Related Articles

Capstone
Catching Fraud in the Healthcare System
Capstone
Acquisition Due Dilligence Automation for Smaller Firms
Machine Learning
Pandemic Effects on the Ames Housing Market and Lifestyle
Machine Learning
The Ames Data Set: Sales Price Tackled With Diverse Models
AWS
Automated Data Extraction and Transformation Using Python, OpenAI, and AWS

Leave a Comment

No comments found.

View Posts by Categories

All Posts 2399 posts
AI 7 posts
AI Agent 2 posts
AI-based hotel recommendation 1 posts
AIForGood 1 posts
Alumni 60 posts
Animated Maps 1 posts
APIs 41 posts
Artificial Intelligence 2 posts
Artificial Intelligence 2 posts
AWS 13 posts
Banking 1 posts
Big Data 50 posts
Branch Analysis 1 posts
Capstone 206 posts
Career Education 7 posts
CLIP 1 posts
Community 72 posts
Congestion Zone 1 posts
Content Recommendation 1 posts
Cosine SImilarity 1 posts
Data Analysis 5 posts
Data Engineering 1 posts
Data Engineering 3 posts
Data Science 7 posts
Data Science News and Sharing 73 posts
Data Visualization 324 posts
Events 5 posts
Featured 37 posts
Function calling 1 posts
FutureTech 1 posts
Generative AI 5 posts
Hadoop 13 posts
Image Classification 1 posts
Innovation 2 posts
Kmeans Cluster 1 posts
LLM 6 posts
Machine Learning 364 posts
Marketing 1 posts
Meetup 144 posts
MLOPs 1 posts
Model Deployment 1 posts
Nagamas69 1 posts
NLP 1 posts
OpenAI 5 posts
OpenNYC Data 1 posts
pySpark 1 posts
Python 16 posts
Python 458 posts
Python data analysis 4 posts
Python Shiny 2 posts
R 404 posts
R Data Analysis 1 posts
R Shiny 560 posts
R Visualization 445 posts
RAG 1 posts
RoBERTa 1 posts
semantic rearch 2 posts
Spark 17 posts
SQL 1 posts
Streamlit 2 posts
Student Works 1687 posts
Tableau 12 posts
TensorFlow 3 posts
Traffic 1 posts
User Preference Modeling 1 posts
Vector database 2 posts
Web Scraping 483 posts
wukong138 1 posts

Our Recent Popular Posts

AI 4 AI: ChatGPT Unifies My Blog Posts
by Vinod Chugani
Dec 18, 2022
Meet Your Machine Learning Mentors: Kyle Gallatin
by Vivian Zhang
Nov 4, 2020
NICU Admissions and CCHD: Predicting Based on Data Analysis
by Paul Lee, Aron Berke, Bee Kim, Bettina Meier and Ira Villar
Jan 7, 2020

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day ChatGPT citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay football gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income industry Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI

NYC Data Science Academy

NYC Data Science Academy teaches data science, trains companies and their employees to better profit from data, excels at big data project consulting, and connects trained Data Scientists to our industry.

NYC Data Science Academy is licensed by New York State Education Department.

Get detailed curriculum information about our
amazing bootcamp!

Please enter a valid email address
Sign up completed. Thank you!

Offerings

  • HOME
  • DATA SCIENCE BOOTCAMP
  • ONLINE DATA SCIENCE BOOTCAMP
  • Professional Development Courses
  • CORPORATE OFFERINGS
  • HIRING PARTNERS
  • About

  • About Us
  • Alumni
  • Blog
  • FAQ
  • Contact Us
  • Refund Policy
  • Join Us
  • SOCIAL MEDIA

    © 2025 NYC Data Science Academy
    All rights reserved. | Site Map
    Privacy Policy | Terms of Service
    Bootcamp Application