R Shiny Global Power Plant Tracker

Posted on Oct 20, 2022


The question of how energy should be produced has been hotly contested by different administrations in US history. Some groups want to maintain traditional fossil fuel methods like oil, gas, and coal, while some groups push for more renewable sources like wind and solar. There are also those who have also considered going nuclear. Arguments range between what's more environmentally friendlier, what is more cost effective, and other questions. For this project I focus on  one question:  Which type of power plant  produces more energy?"

To that end,I created The Global Power Plant Shiny App, which gives a survey over a country's energy production over a 5 year period between 2013-2017. It also displays a count of each plant per type and gives a map of the global distribution of plants identified by type. The source of this data is the Global Power Plant Database from the World Resources Institute from  June 2021. The app is divided into two pages. The first page is a country based question page displaying three figures: one displaying total energy produced over the five year span, another figure to illustrate the fluctuations of the two power plant types, and a third figure to count the number of plants of each type. The second page is a type based page, where users can see the locations of power plants of a certain type all over the world.

Using this app, I'm going to be answering the questions above comparing wind versus oil for the United States. With the figures mentioned above, I want to answer the following:

  • Which power plant type produced the most energy?
  • How does  energy production fluctuate over time?
  • How many plants of each type are built?
  • Where are the wind and oil plants  located in the United States?

For added speculation, I investigated where oil and wind plants are located in other countries.  Even though energy efficiency alone should not be the sole factor of choosing a power plant type, these five questions could influence where the country should focus its efforts in producing energy.


The diagram above explains the process of how each figure is made once the user makes their selection.

I. Data Clean Up

After downloading the datasheet, I replaced nan values with zeros. I then removed all power plants that provided no power plant energy responses.

II. Preparing cleaned data for figures

Method sections II.a-II.c require one to pick a country and filter the power plants belonging to said country. This new data frame can be then grouped by power plant type.

II.a) Energy Accumulated between 2013-2017

The dataset had estimated energy produced for each country up from 2013 to 2017 and actual energy production from 2013-2019. Even after removing non-reporting power plants, some power plants only had estimated data. Other power plants had only actual data. All remaining power plants provided both. To simplify the confusion, I chose to look at 2013-2017 data, and for each year I added up the actual data value for each type and the estimated data value for each type and divided it by two. If a power plant provided both the energy production and the energy estimation, I get a new estimated value for energy produced off by some factor. If the power plant only provided actual or only estimated data, then the energy value is off by a factor of .5. (Reader may recall a favorite STEM factor/magnitude joke at this point). 

In hindsight, all these estimated recalculations could have been avoided by choosing one column and inserting the other column's value, on condition that the chosen column's value is null. Nevertheless, this app uses the calculation previously mentioned. This app should be used for analyzing the power plant energy production within a selected country and not for comparing energy values with other countries.

For this figure I sum up all the newly calculated energy values for each type grouping per year column. After acquiring the previously mentioned sum for each power plant type grouping per year column, I sum the year columns together to get the total energy produced over a five year value for each power plant type for the selected country.

II.b) Energy Produced Over The Years per Type

To acquire this figure, I followed the same procedure as Methods II.a, except instead of adding the values over the five year period, I transformed the grouped data frame to a regular data frame, added year and type labels, and transposed the data frame to make years as columns and types as rows. After all transpositioning and labeling is done, we use a ggplotly line plot to display the energy production evolution over the five year time span. The major benefit in using ggplotly line plots is that one can click on lines in the legend to hide or show certain curves on the plot. For this wind versus oil comparison, I toggled off all other plant type curves and toggled on only oil and wind curves.

III.c) Number of Plants Per Type

To get the number of plants per type, I simply count the number of plants of each type from the grouped data frame.

III.d) Power plant Distributions by Fuel Type

This data set included the latitudes and longitudes of each plant. After removing plants that did not contribute data, I filtered by type and plotted the coordinates with a leaflet map with the following parameters: popup = ~primary_fuel, label = ~country_long, clusterOptions = markerClusterOptions(). The last parameter is important because it gathers all the power plants into clusters. Users hover over the cluster and see a highlighted area that encompasses the power plants that cluster represents.


I. Energy Accumulated between 2013-2017


The energy accumulated by various different United States power plant types varies between a few GWH generated by storage to several TWH generated by gas. With regards to the oil versus wind comparison, wind outperforms oil by half a magnitude of 10 (i.e wind production ~ oil production x 10^0.5).

II. Energy Produced Over The Years per Type

Hiding all other power plant types via ggplotly's line plot feature, it can be seen that wind steadily increased in energy production incrementally over the five year period of interest. Oil on the other hand experienced a 60% dip in energy production between 2014 and 2016, but then it tripled within a year after 2016. Nevertheless, wind outperforms oil by over a magnitude and a half (i.e. wind ~ oil x 10^1.5).

III. Number of Plants Per Type

In the United States, petcoke has fewer than 10 power plants. Solar is far ahead  by  over 10^3.5 or just above 3100 plants. Looking at the oil versus wind comparison, wind slightly outperforms solar, but they both have about 1000 plants.

IV. Powerplant Distributions by Fuel Type

In the United States, most oil plants are found in the Great Plains and the Mid-Atlantic regions.

The highest three clusters of wind plants can be found around the Oklahoma Panhandle (hovering over the grouped marker shows that this region include North Texas, almost all of Oklahoma and Nebraska, East Colorado, and East New Mexico), the Great Plains, and the Great Lakes.

Amongst global clusters, most oil clusters are found in the Americas.

There are high clusters of wind plants in the United States, Brazil, India, and China, but the largest wind cluster is found in Europe.


The number of plants illustrates that even though there are more wind plants than oil plants, they are close in number at  about 1000 plants each. However, within the five year period wind produces about 10 times more energy than oil does. It should be noted that oil energy production dropped off by 60% over two years prior to 2016 and then tripled within a year after 2016, which might be correlated with a particular political shift in the United States government. Also, it should be noted that wind energy increased steadily without much turbulence.

Looking at the United States map page, half of the oil plants encompass rural areas and half the oil plants are encompassing urban areas. Companies, including power plants, still run on funding from the areas they serve and their budgets. As a thought experiment, let's say these power plant companies either fully shut down to 0% productivity or fully spool up to 100% productivity based on finances. If half the oil plants went down, or conversely the number of fully operational oil plants doubled, it did not explain why oil under-performed wind energy by a factor of 10. When hovering over the three wind cluster markers, the highlighted areas encompass states of different palettes of the political spectrum ensuring bipartisan support.

Globally, plants geographically correlate where one can find the resource. I was expecting more oil plants to be found in the Middle East, though . There are  larger clusters of plants where North Americans and Europeans are still hunting for untapped resources. The low number of oil plants in the Middle East could also be due to the removal of plants that did not provide actual or estimated numbers of their energy production. Wind, on the other hand, appears to be ubiquitous, and first-world countries, the two most populous countries, and Brazil are taking advantage of that fact.

Average Annual Wind Speeds of the United States at 30 m according to NREL

However, how ubiquitous is wind? There are various areas in the country where wind is very abundant that tornadoes are a concern and other areas where one cannot fly a kite. The power generated by wind is proportional to the wind velocity cubed multiplied by the area of the rotor and the air density. For simplicity, we’ll use the air density at sea level, which is about 1.24 kg/m^3 according to the Engineering Toolbox . The average wind turbine blade is 116 ft, or approximately 35.36 meters, according to Utility Dive.  The figure in Results II, dictates that all the United States wind turbines generate 210k +/- 50k MWH per year on average between 2013 and 2017. This means, on average, each plant generates 210 +/- 50 MWH per year, which is roughly 24 +/- 6 kW per plant. Knowing the power each plant generates, the average air density of the United States, and the average blade size, one can calculate that each plant would need the winds to be blowing roughly 2.14 m/s or 4.79 miles per hour. According to the NREL map above dictating average annual wind speeds at the elevation of 30 m, wind turbine energy would do best in the midwest and the yellowish areas East and West of the American Heartland.

Conclusion and Future Works

Worldwide there is a larger number of wind plants than oil plants. In the United States, though, there are about the same  number of wind-based power plants as there are oil-based power plants. When looking at both the time based line plot and the total accumulated energy bar chart, wind energy production out performs oil energy production by a factor of ten. As a reminder this blog is only a comparison of wind and oil in the United States, but the app can be used for different countries and different power plant fuel types.

If I were to continue this project, I would consider adding a boxplot, like the one above, describing the average energy accumulated over the five year investigation period. Also, this study focused on wind versus oil; however, visitors to the app might want to apply the same analytical thought process to coal versus solar.

I originally thought the reason there was a similar number of power plants between wind and oil was due to party oscillation between presidential administrations. However, further conversations with those who reviewed my work revealed that although wind is more ideal than oil, oil is more practical than wind. Those concerned about climate change have expressed that fossil fuels, like coal, are more affordable and mobile than renewable energy in both Europe and the United Statesproduce less climate change problems than electric cars, and kill less airborne animals than wind. In fact, nuclear energy is cleaner and more cost efficient than solar or wind.

With that said, energy output is not the only factor in deciding how to fund energy production. Funding sources and worker welfare are also major factors in deciding what plants are built, promoted, demoted, and destroyed. This is because leaders (e.g. politicians, CEOs, etc...) depend on the health and productivity of their working class and the backing of their supporters. Thus, when wondering why a region or country is funding a certain energy worldview, look at all the factors, numerical, environmental, and social.

The Global Power Plant Tracker app is published on https://ggsglobalpowerplanttracker.shinyapps.io/globalpowerplantshinyapp/


Github Link

  • https://github.com/GGSimmons1992/globalPowerPlantShinyApp

About Author

Gary Simmons

Open-minded and tenacious data scientist and machine learning programmer familiar with large dataset analysis, Angular user interface enhancement, .NET Core REST API problem solving, and relational database management. My Applied Physics BS, Physics MS, and software development background...
View all posts by Gary Simmons >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI