NYC Data Science Academy| Blog
Bootcamps
Lifetime Job Support Available Financing Available
Bootcamps
Data Science with Machine Learning Flagship πŸ† Data Analytics Bootcamp Artificial Intelligence Bootcamp New Release πŸŽ‰
Free Lesson
Intro to Data Science New Release πŸŽ‰
Find Inspiration
Find Alumni with Similar Background
Job Outlook
Occupational Outlook Graduate Outcomes Must See πŸ”₯
Alumni
Success Stories Testimonials Alumni Directory Alumni Exclusive Study Program
Courses
View Bundled Courses
Financing Available
Bootcamp Prep Popular πŸ”₯ Data Science Mastery Data Science Launchpad with Python View AI Courses Generative AI for Everyone New πŸŽ‰ Generative AI for Finance New πŸŽ‰ Generative AI for Marketing New πŸŽ‰
Bundle Up
Learn More and Save More
Combination of data science courses.
View Data Science Courses
Beginner
Introductory Python
Intermediate
Data Science Python: Data Analysis and Visualization Popular πŸ”₯ Data Science R: Data Analysis and Visualization
Advanced
Data Science Python: Machine Learning Popular πŸ”₯ Data Science R: Machine Learning Designing and Implementing Production MLOps New πŸŽ‰ Natural Language Processing for Production (NLP) New πŸŽ‰
Find Inspiration
Get Course Recommendation Must Try πŸ’Ž An Ultimate Guide to Become a Data Scientist
For Companies
For Companies
Corporate Offerings Hiring Partners Candidate Portfolio Hire Our Graduates
Students Work
Students Work
All Posts Capstone Data Visualization Machine Learning Python Projects R Projects
Tutorials
About
About
About Us Accreditation Contact Us Join Us FAQ Webinars Subscription An Ultimate Guide to
Become a Data Scientist
    Login
NYC Data Science Acedemy
Bootcamps
Courses
Students Work
About
Bootcamps
Bootcamps
Data Science with Machine Learning Flagship
Data Analytics Bootcamp
Artificial Intelligence Bootcamp New Release πŸŽ‰
Free Lessons
Intro to Data Science New Release πŸŽ‰
Find Inspiration
Find Alumni with Similar Background
Job Outlook
Occupational Outlook
Graduate Outcomes Must See πŸ”₯
Alumni
Success Stories
Testimonials
Alumni Directory
Alumni Exclusive Study Program
Courses
Bundles
financing available
View All Bundles
Bootcamp Prep
Data Science Mastery
Data Science Launchpad with Python NEW!
View AI Courses
Generative AI for Everyone
Generative AI for Finance
Generative AI for Marketing
View Data Science Courses
View All Professional Development Courses
Beginner
Introductory Python
Intermediate
Python: Data Analysis and Visualization
R: Data Analysis and Visualization
Advanced
Python: Machine Learning
R: Machine Learning
Designing and Implementing Production MLOps
Natural Language Processing for Production (NLP)
For Companies
Corporate Offerings
Hiring Partners
Candidate Portfolio
Hire Our Graduates
Students Work
All Posts
Capstone
Data Visualization
Machine Learning
Python Projects
R Projects
About
Accreditation
About Us
Contact Us
Join Us
FAQ
Webinars
Subscription
An Ultimate Guide to Become a Data Scientist
Tutorials
Data Analytics
  • Learn Pandas
  • Learn NumPy
  • Learn SciPy
  • Learn Matplotlib
Machine Learning
  • Boosting
  • Random Forest
  • Linear Regression
  • Decision Tree
  • PCA
Interview by Companies
  • JPMC
  • Google
  • Facebook
Artificial Intelligence
  • Learn Generative AI
  • Learn ChatGPT-3.5
  • Learn ChatGPT-4
  • Learn Google Bard
Coding
  • Learn Python
  • Learn SQL
  • Learn MySQL
  • Learn NoSQL
  • Learn PySpark
  • Learn PyTorch
Interview Questions
  • Python Hard
  • R Easy
  • R Hard
  • SQL Easy
  • SQL Hard
  • Python Easy
Data Science Blog > Data Visualization > Watt's the Point: Data Tracking Building Energy in the US

Watt's the Point: Data Tracking Building Energy in the US

Andrew Feierman
Posted on Feb 4, 2017
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

We don't know enough about how buildings use energy. Granted, we know some things: on a large scale, the US Department of Energy's Energy Information Administration releases data on estimates of energy consumption at both the Federal and State level, while at a smaller scale, energy modeling can estimate the energy consumption of any individual building. And yet, given that buildings consume almost 75% of all electricity generated in the US and accounts for nearly 40% of all greenhouse gas (GHG) emissions in the country, we know far too little about how buildings consume energy.

About a decade ago, progressive cities and states in the US begun passing legislation that requires the owners of large buildings to measure and report building energy consumption on an annual basis.  Cities then, in most cases, compile and release these data on open data portals. My project analyzed some of this publicly available data, exploring trends in building energy consumption and building a proof-of-concept to see if building energy data across the country could be easily compiled into a central, easy to use repository.

Enter the Building Energy Dashboard:

App1

Ideally, buildings in the United States would be very energy efficient. Wasted energy is, in essence, wasted money. Efficient buildings don't waste energy. Having energy efficient buildings would save money for business and households alike, and reduce GHG emissions. And yet, buildings in the United States are not very efficient! Why are our buildings inefficient? It's complicated! Buildings range in size, climate, use, and occupancy. It's really, really difficult to create a common standard of energy consumption for buildings to meet.

 

About 20 cities currently have regulations requiring building energy data transparency on the books, and roughly half of these cities have reached a point where they are releasing useable open data sets. Some cities have already used their energy transparency data to create awesome visualization tools-- shoutout to Philadelphia's Mayor's Office of Sustainability for their excellent Building Benchmarking tool. However, most cities do not take the extra step to visualize their results, and instead release CSV files on their Open Data portal.

For my project, I attempted a proof-of-concept to see if data from multiple cities could be brought together to create a centralized tool for energy transparency data. My project involved downloading, cleaning, and combining data from San Francisco, New York City, and Washington, DC.

Does Location Influence Building Efficiency? Is Location a Proxy for Other Building Characteristics?

After combining energy data from each city into a tidy format, I used leaflet to plot building energy data onto a map of each city. Building location was not always available, so I aggregated buildings by ZIP Code. Not having building location data did limit analysis somewhat, but still allowed for me to see if different neighborhoods had noticeable differences in energy consumption. Each ZIP Code is represented on the map by a circle whose color represents how much energy an average building in the area used, and whose size represents the number of buildings that reported in that ZIP Code during a given year.

It is likely that location does not cause buildings to be energy efficient or not. More likely is that buildings in a given neighborhood tend to cluster in their age, size, and use, all of which tend to influence building energy consumption. So, while the mapped portion of my tool was interesting to look through, it was most useful as a starting point to identify possible trends, which I would explore in another area of my app.

SFZip

ZIP Codes Appearing to Cluster by Energy Consumption in San Francisco

 

Can we identify building efficiency trends on a city-wide level?

Each year, an increasingly large number of cities are releasing data on building energy consumption. This growth of open data is generally positive-- however, this growth leads to a need for user-friendly tools to help people explore, understand, and identify trends within large data sets.

Using the second part of my app, the simply named Data Explorer, users can visualize data themselves using custom parameters.

DataExplorer

A scatterplot of building size vs energy intensity. Each point is a building from either DC or San Francisco.

One issue I encountered when plotting these data (about 100,000 observations in total, across the 5 years of data collected from 3 cities) was that the data appeared "clumped." That is, while the data spanned a large set of values, the majority of the data fit within a smaller range of observations. When thinking about buildings, this is intuitive-- while there are some very large buildings (airports and skyscrapers) and big energy users (manufacturing facilities and data centers), most buildings fall into a small range of size and uses. To help identify trends among this clumped data, I added the ability to overlay a trendline for each city displayed in the data explorer:

de2

Trendlines aggregating data from DC and San Francisco

What can we tell from this? I have my ideas, but more importantly, the tool is set up for any user to come in and draw their own conclusions from the data.

Some Other Features

Not all users will be interested in energy consumption data on a national scale. In fact, I anticipate that most users of a tool like this would be interested in getting information about one particular city (either as a resident, prospective investor, or policy maker). To this end, a user can use the "City Explorer" to get a high-level overview of energy disclosure and use within a given city:

Screenshot from 2017-02-05 10-56-01

A City-Level View of Washington, DC

This city-specific view can provide a variety of insights, but for this blog, I'm going to focus on the violin plot in the bottom right-hand corner.

DCceViolin

A violin plot for Washington, DC. Hospitals have the greatest variety in energy intensity, while offices and educational buildings have more uniform energy consumption.

How do different property types use data?

Using a violin plot, we can quickly see how buildings that have different uses (essentially, different property types) consume energy differently. A violin plot is similar to the more common box-and-whisker plot. On this plot, each building type is represented by a different color. The y-axis of this plot is energy use per square foot, which allows us to compare buildings of different sizes. Each value on the y-axis is measuring how much energy is used per square foot (a unit of size) within a building.

Let's focus on the red Education property type on the left. In a violin plot, the width of a shape represents how much data falls within a given range being measured. In this case, the red Education plot is very wide towards the bottom of the shape, at an energy use per square foot value (the y-axis) of about 125 kBtu/ft2. This part of the shape being very wide means that most educational buildings have similar energy use per square foot, so they are grouped together in the wide part of the plot. By contrast, the yellow Hospital shape shows that hospitals have greater variance in how much energy they use, as the yellow Hospital shape is long and narrow.

What's Next?

Moving forward, this tool could be a starting point for an aggregated visualization platform for all 20+ cities who have passed energy benchmarking regulations across the country. Quite simply, buildings are complicated. Improving building efficiency is going to be an long process, requiring many actors and localized knowledge to drive actionable insights. In some cases, great research on building efficiency data has already been done. However, as more cities release energy data in the coming years, it will be increasingly important for end-users to have an accessible platform to view building energy data. By enabling users see when, where, and how our buildings use energy, this tool can help people understand why and if their buildings are energy efficient.

IMT_US_policies_map_07

Cities that have passed building benchmarking regulations as of 12/2016. Image from the Institute for Market Transformation, a non-profit based in Washington, DC.

About Author

Andrew Feierman

I like to have fun with data, and believe the big picture is often as important as the underlying math. Professionally, I am passionate about sustainability, renewable energy, and energy efficiency. In my spare time you can find...
View all posts by Andrew Feierman >

Related Articles

Capstone
Catching Fraud in the Healthcare System
Data Analysis
Car Sales Report R Shiny App
Data Analysis
Injury Analysis of Soccer Players with Python
Capstone
The Convenience Factor: How Grocery Stores Impact Property Values
Capstone
Acquisition Due Dilligence Automation for Smaller Firms

Leave a Comment

Cancel reply

You must be logged in to post a comment.

Doug August 18, 2017
Thanks Andrew! Kyle's work seems like it could give me some really solid direction. Maybe after plowing through that, I can think of some contributions here I could make. I will be engaged in the near future. Thanks much!
Andrew Feierman August 17, 2017
Hey Doug, Glad you enjoyed this blog post. Have you seen Kyle Bradbury's blog? He heads a large energy + data initiative at Duke University, and posts some great stuff related to energy + data science. Beyond that, I encourage you to dig directly into the source code that made this project possible. It's on github here: https://github.com/afeierman/benchmarking-data-explorer. Contributions are always welcome!
Doug L August 17, 2017
Hey Andrew, Just looked this up, as I was trying to consider a first project using data science and energy efficiency. I have toyed around a bit with R, but was wondering if you might know of some good starting projects specifically related to energy efficiency and data science. I have found other possible projects (like this https://www.analyticsvidhya.com/blog/2016/10/17-ultimate-data-science-projects-to-boost-your-knowledge-and-skills/) but I specifically wanted to focus on energy and energy efficiency if at all possible. Any recommendation would be appreciated!

View Posts by Categories

All Posts 2399 posts
AI 7 posts
AI Agent 2 posts
AI-based hotel recommendation 1 posts
AIForGood 1 posts
Alumni 60 posts
Animated Maps 1 posts
APIs 41 posts
Artificial Intelligence 2 posts
Artificial Intelligence 2 posts
AWS 13 posts
Banking 1 posts
Big Data 50 posts
Branch Analysis 1 posts
Capstone 206 posts
Career Education 7 posts
CLIP 1 posts
Community 72 posts
Congestion Zone 1 posts
Content Recommendation 1 posts
Cosine SImilarity 1 posts
Data Analysis 5 posts
Data Engineering 1 posts
Data Engineering 3 posts
Data Science 7 posts
Data Science News and Sharing 73 posts
Data Visualization 324 posts
Events 5 posts
Featured 37 posts
Function calling 1 posts
FutureTech 1 posts
Generative AI 5 posts
Hadoop 13 posts
Image Classification 1 posts
Innovation 2 posts
Kmeans Cluster 1 posts
LLM 6 posts
Machine Learning 364 posts
Marketing 1 posts
Meetup 144 posts
MLOPs 1 posts
Model Deployment 1 posts
Nagamas69 1 posts
NLP 1 posts
OpenAI 5 posts
OpenNYC Data 1 posts
pySpark 1 posts
Python 16 posts
Python 458 posts
Python data analysis 4 posts
Python Shiny 2 posts
R 404 posts
R Data Analysis 1 posts
R Shiny 560 posts
R Visualization 445 posts
RAG 1 posts
RoBERTa 1 posts
semantic rearch 2 posts
Spark 17 posts
SQL 1 posts
Streamlit 2 posts
Student Works 1687 posts
Tableau 12 posts
TensorFlow 3 posts
Traffic 1 posts
User Preference Modeling 1 posts
Vector database 2 posts
Web Scraping 483 posts
wukong138 1 posts

Our Recent Popular Posts

AI 4 AI: ChatGPT Unifies My Blog Posts
by Vinod Chugani
Dec 18, 2022
Meet Your Machine Learning Mentors: Kyle Gallatin
by Vivian Zhang
Nov 4, 2020
NICU Admissions and CCHD: Predicting Based on Data Analysis
by Paul Lee, Aron Berke, Bee Kim, Bettina Meier and Ira Villar
Jan 7, 2020

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day ChatGPT citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay football gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income industry Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI

NYC Data Science Academy

NYC Data Science Academy teaches data science, trains companies and their employees to better profit from data, excels at big data project consulting, and connects trained Data Scientists to our industry.

NYC Data Science Academy is licensed by New York State Education Department.

Get detailed curriculum information about our
amazing bootcamp!

Please enter a valid email address
Sign up completed. Thank you!

Offerings

  • HOME
  • DATA SCIENCE BOOTCAMP
  • ONLINE DATA SCIENCE BOOTCAMP
  • Professional Development Courses
  • CORPORATE OFFERINGS
  • HIRING PARTNERS
  • About

  • About Us
  • Alumni
  • Blog
  • FAQ
  • Contact Us
  • Refund Policy
  • Join Us
  • SOCIAL MEDIA

    Β© 2025 NYC Data Science Academy
    All rights reserved. | Site Map
    Privacy Policy | Terms of Service
    Bootcamp Application