The Power sector and its Environmental impact in the NorthEastern US

Posted on Jul 21, 2015

Project SuperPower


The goal of this project was to map the power sector and see how emissions (CO2 ,NOx , SO2) have been evolving in the Northeastern US.

The reason why I chose to focus on this region is that these states are involved in the RGGI or Regional Greenhouse Gas Initiative. This initiative is the first market-based regulatory program in the United States to reduce greenhouse gas emissions. Part of this program makes mandatory to all power plants operator to communicate their emissions. More info on:

A big part of my time on this project was spent on finding the right data. The data sources I used can be found on the EPA (for greenhouse emissions data) and EIA (for power production and geolocation) websites.


And then comes the coding...The first module focuses on cleaning and organizing the data.That is the code for the EPA website data:

#First we import the 2 files
emidata = read.csv("Data/emission_07-03-2015.csv")
facildata = read.csv("Data/facility_07-03-2015.csv")
emidata = tbl_df(emidata)
facildata = tbl_df(facildata)
#remove all doubles by using distinct ORIS
facildata = facildata %>% distinct(Facility.ID..ORISPL.)
#Then we agregate them in 1 table
sumdata=left_join(emidata,facildata,by = c("Facility.ID..ORISPL."))
#Lets clean this table
sumdata=mutate(sumdata, Period = as.Date(paste(sumdata$Year.x, sumdata$Month, 1, sep = "-")))
 SO2.Tons = SO2..tons.,NOx.Tons = NOx..tons.,CO2.Short.Tons = CO2..short.tons.,
#Lets add an effciency metric of the plant
sumdata$Efficiency.CO2sTons.MWh = sumdata$CO2.Short.Tons/sumdata$GrossLoad.MWh

sumdata=filter(sumdata,Period< "2015-04-01")
sumdata= filter(sumdata,is.finite(Efficiency.CO2sTons.MWh))
sumdata= filter(sumdata,Efficiency.CO2sTons.MWh<10*mean(Efficiency.CO2sTons.MWh))

sumdata =within(sumdata,{

write.csv(sumdata, file="sumdata.csv")


write.csv(sumdatasimpl, file="sumdatasimpl.csv")

Once organised and cleaned, Shiny entered into action! Shiny is a really flexible visualization tool which allowed me to map all the power plants of the 9 RGGI States. For this, I used  a map template from

Once this was done, I also added a side Panel in which I would be able to select different metrics to visualize on the map but also on a historical and scatter plot. That again involved some coding:

shinyServer(function(input, output, session) {
  ## Interactive Map ###########################################
  # Create the map
  output$map <- renderLeaflet({
    leaflet() %>%
        urlTemplate = "//{s}{z}/{x}/{y}.png",
        attribution = 'Maps by <a href="">Mapbox</a>'
      ) %>%
      setView(lng = -70, lat = 37.45, zoom = 5)
  output$lineHisto <- renderPlot({
    colorBy <- input$color
    sizeBy <- input$size
    seasonemi = sumdata %>%
    s_group_by("Period")  %>%  
    s_summarise(paste("TotalColor = sum(",input$color,",na.rm=T)",sep=''))
  PlotLine <- function(DS, x, y) {    
    ggplot(DS, aes_string(x = x, y = y)) + 
      geom_line() +
      ylab(input$color) +
      xlab('') +
      labs(title="History") +
      theme(legend.position="bottom") +
      stat_smooth(method="lm") +
      scale_y_continuous(labels = comma)

  output$scatterPower <- renderPlot({
    colorBy <- input$color
    sizeBy <- input$size
    #print(xyplot(as.formula(paste(colorBy ,"~", sizeBy)), data = sumdata, ))  
    qplot(x=sumdata[[sizeBy]],y=sumdata[[colorBy]]) +
      ylab(colorBy) +
  # This observer is responsible for maintaining the circles and legend,
  # according to the variables the user has chosen to map to color and size.

    colorBy <- input$color
    sizeBy <- input$size

    colorData <- sumdatasimpl[[colorBy]]
    if (colorBy == "FuelType.Primary"){
      pal <- colorFactor("Spectral", colorData)
      pal <- colorBin("Spectral", colorData)
    radius <- sumdatasimpl[[sizeBy]] / max(sumdatasimpl[[sizeBy]],na.rm=TRUE) * 100000
    leafletProxy("map", data = sumdatasimpl) %>%
      clearShapes() %>%
      addCircles(~Facility.Longitude, ~Facility.Latitude, radius=radius, layerId=~Facility.ORIS.ID,
                 stroke=FALSE, fillOpacity=0.4, fillColor=pal(colorData)) %>%
      addLegend("bottomleft",pal =pal, values=colorData, title=colorBy,
  # Show a popup at the given location
  showPlantPopup <- function(plantname, lat, lng) {
    selectedPlant <- sumdata[sumdata$Facility.ORIS.ID == plantname,]
    content <- as.character(tagList(
      tags$strong(sprintf("Type: %s",unique(selectedPlant$FuelType.Primary))),tags$br(),
      sprintf("Period Jan2009-Mar2015"), tags$br(),
      sprintf("CO2 Emissions: %s s Tons", format(sum(selectedPlant$CO2.Short.Tons,na.rm=T),scientific=F,big.mark=",")), tags$br(),
      sprintf("SO2 Emissions: %s Tons", format(sum(selectedPlant$SO2.Tons,na.rm=T),scientific=F,big.mark=",")), tags$br(),
      sprintf("NOx Emissions: %s Tons", format(sum(selectedPlant$NOx.Tons,na.rm=T),scientific=F,big.mark=","))
    leafletProxy("map") %>% addPopups(lng, lat, content, layerId = selectedPlant$Facility.Name)
    #Update the historical line graph
    output$lineHisto <- renderPlot({
      colorBy <- input$color
      sizeBy <- input$size
      seasonemi = sumdata[sumdata$Facility.ORIS.ID == plantname,] %>%
        s_group_by("Period")  %>%  
        s_summarise(paste("TotalColor = sum(",input$color,",na.rm=T)",sep=''))
  # When map is clicked, show a popup with city info
    leafletProxy("map") %>% clearPopups()
    event <- input$map_shape_click
    if (is.null(event))
      showPlantPopup(event$id, event$lat, event$lng)
  # When reset is clicked well reset
    output$lineHisto <- renderPlot({
    colorBy <- input$color
    sizeBy <- input$size
    seasonemi = sumdata %>%
      s_group_by("Period")  %>%  
      s_summarise(paste("TotalColor = sum(",input$color,",na.rm=T)",sep=''))

All this allowed me to get this result: A nice shiny App!



That now takes us to a more interesting topic. Does this tool allow us to make conclusions on emissions in the RGGI area?

Luckily it does...Some of the conclusions of this project are:

- Emissions and especially CO2 have indeed reduced in the 2009-2015 Period by prox. 30%:


- That was mainly driven by a reduction in the production (ie. the demand) thanks to energy efficiency:


- Some of the Coal power production was displaced by Gas which is positive. Gas emits half CO2 emissions compared to Coal. This however isnt as impressive as it could be and Renewable production (Wind-Solar) doesnt seem to have increased much:


About Author

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp