NYC Data Science Academy| Blog
Bootcamps
Lifetime Job Support Available Financing Available
Bootcamps
Data Science with Machine Learning Flagship ๐Ÿ† Data Analytics Bootcamp Artificial Intelligence Bootcamp New Release ๐ŸŽ‰
Free Lesson
Intro to Data Science New Release ๐ŸŽ‰
Find Inspiration
Find Alumni with Similar Background
Job Outlook
Occupational Outlook Graduate Outcomes Must See ๐Ÿ”ฅ
Alumni
Success Stories Testimonials Alumni Directory Alumni Exclusive Study Program
Courses
View Bundled Courses
Financing Available
Bootcamp Prep Popular ๐Ÿ”ฅ Data Science Mastery Data Science Launchpad with Python View AI Courses Generative AI for Everyone New ๐ŸŽ‰ Generative AI for Finance New ๐ŸŽ‰ Generative AI for Marketing New ๐ŸŽ‰
Bundle Up
Learn More and Save More
Combination of data science courses.
View Data Science Courses
Beginner
Introductory Python
Intermediate
Data Science Python: Data Analysis and Visualization Popular ๐Ÿ”ฅ Data Science R: Data Analysis and Visualization
Advanced
Data Science Python: Machine Learning Popular ๐Ÿ”ฅ Data Science R: Machine Learning Designing and Implementing Production MLOps New ๐ŸŽ‰ Natural Language Processing for Production (NLP) New ๐ŸŽ‰
Find Inspiration
Get Course Recommendation Must Try ๐Ÿ’Ž An Ultimate Guide to Become a Data Scientist
For Companies
For Companies
Corporate Offerings Hiring Partners Candidate Portfolio Hire Our Graduates
Students Work
Students Work
All Posts Capstone Data Visualization Machine Learning Python Projects R Projects
Tutorials
About
About
About Us Accreditation Contact Us Join Us FAQ Webinars Subscription An Ultimate Guide to
Become a Data Scientist
    Login
NYC Data Science Acedemy
Bootcamps
Courses
Students Work
About
Bootcamps
Bootcamps
Data Science with Machine Learning Flagship
Data Analytics Bootcamp
Artificial Intelligence Bootcamp New Release ๐ŸŽ‰
Free Lessons
Intro to Data Science New Release ๐ŸŽ‰
Find Inspiration
Find Alumni with Similar Background
Job Outlook
Occupational Outlook
Graduate Outcomes Must See ๐Ÿ”ฅ
Alumni
Success Stories
Testimonials
Alumni Directory
Alumni Exclusive Study Program
Courses
Bundles
financing available
View All Bundles
Bootcamp Prep
Data Science Mastery
Data Science Launchpad with Python NEW!
View AI Courses
Generative AI for Everyone
Generative AI for Finance
Generative AI for Marketing
View Data Science Courses
View All Professional Development Courses
Beginner
Introductory Python
Intermediate
Python: Data Analysis and Visualization
R: Data Analysis and Visualization
Advanced
Python: Machine Learning
R: Machine Learning
Designing and Implementing Production MLOps
Natural Language Processing for Production (NLP)
For Companies
Corporate Offerings
Hiring Partners
Candidate Portfolio
Hire Our Graduates
Students Work
All Posts
Capstone
Data Visualization
Machine Learning
Python Projects
R Projects
About
Accreditation
About Us
Contact Us
Join Us
FAQ
Webinars
Subscription
An Ultimate Guide to Become a Data Scientist
Tutorials
Data Analytics
  • Learn Pandas
  • Learn NumPy
  • Learn SciPy
  • Learn Matplotlib
Machine Learning
  • Boosting
  • Random Forest
  • Linear Regression
  • Decision Tree
  • PCA
Interview by Companies
  • JPMC
  • Google
  • Facebook
Artificial Intelligence
  • Learn Generative AI
  • Learn ChatGPT-3.5
  • Learn ChatGPT-4
  • Learn Google Bard
Coding
  • Learn Python
  • Learn SQL
  • Learn MySQL
  • Learn NoSQL
  • Learn PySpark
  • Learn PyTorch
Interview Questions
  • Python Hard
  • R Easy
  • R Hard
  • SQL Easy
  • SQL Hard
  • Python Easy
Data Science Blog > R Shiny > Solar-California Renewables: predicting solar generation

Solar-California Renewables: predicting solar generation

Dmitri Levonian
Posted on May 18, 2020
  • Project github
  • Shiny app 
The skills I demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
California is leading America and the world on clean energy

Over the past 10 years, California has established itself as a prominent leader in adopting and implementing ambitious clean energy especially solar energy policies. California matters โ€“ economically and environmentally โ€“ not only in the American but in the global context. If California were a country, it would be the worldโ€™s 5th largest economy, bigger than the UK, France, or India.

The recent Renewables Portfolio Standard implemented in 2018 requires that 50% of California's electricity come from zero-carbon sources by 2025, 60% by 2025, and 100% by 2045. The cumulative result of these bold policies and unprecedented private investment is encouraging. Over the last decade, California was able to steadily reduce the physical volumes of fossil fuel-derived generation by about 25% while growing the economy by 40%.

California's transition towards renewable energy comes with significant challenges on the economic, technological, and regulatory fronts. My Shiny app investigates some of these challenges and explores a machine learning approach to address one of the biggest uncertainties: the variability of solar generation.

The project has two specific objectives:

  1. To identify the most accurate forecasting method and to assess possible economic benefits,
  2. To provide a primer on using various techniques for autoregressive time series forecasting.

Current challenges: too much solar?

About 80% of Californiaโ€™s energy is delivered by the California Independent System Operator (CAISO), a non-profit organization responsible for balancing electricity supply and demand and ensuring Californiaโ€™s grid reliability.  The source data for the Shiny app came from the 10 years of hourly electric generation published by CAISO (a sample of the daily data).  

While the annual trend looks like a steady buildup of the renewable generation, monthly breakdown shows significant seasonal variation. Solar generation spikes at summer to about 3 TWh/month with close to 14 hours of daylight (and higher intensity) and goes down to about 1 TWh/month in December. Fortunately, seasonal solar production is peaking in sync with surging summer demand driven by air conditioning:

Yet despite this resounding success at the big picture level, policymakers, media, and CAISO itself describe the situation as alarming. You can stumble upon recounts of how California grapples with ever-growing amounts of renewable energy, and what to do with the solar energy that CAISO has to curtail.

The Duck Curve for Solar Generation

The problem becomes apparent when we zoom in to the hourly level. Solar generation in California peaks at 2-4 pm, while traditional power plants are turned down to the minimum. From around 4 pm to 8 pm, solar generation declines to zero โ€“ exactly at the time when demand is peaking, which requires large and fast power ramping from traditional sources.

This is what April 30, 2020 looked like at CAISO:

Note also that the short-term power storage capacity (so far confined largely to lithium ion technology) is too small to smooth out this daily solar cycle in a meaningful way. In other words, today, energy needs to be consumed the instant it is generated.

CAISO coined the term โ€˜the duck curveโ€™ to describe the formidable intraday swings in the net load (total demand less solar and wind generation). 

This is how the net load on the same day (April 30) has changed from 2010 to 2020:

Over the past decade, the duckโ€™s belly has got deeper and deeper, and today these swings amount to 15GW capacity and more in a matter of 3-4 hours. To put this into perspective, this is roughly equivalent to the total installed capacity of a medium-sized country such as Switzerland or Israel. In other words, CAISO has to turn an entire countryโ€™s electric generation on and back off, every day, to smooth out the solar output. This is the greatest challenge facing renewable energy in California today.

Forecasting Solar Generation

And yet there is another aspect of solar that makes it even more complicated for CAISO: the poor predictability. Not only is solar generation intermittent, it is also inherently irregular, affected by the cloud cover and numerous other factors such as dust, precipitation, and temperature. Within a typical month, the actual daily solar generation may fluctuate by 30-50%:

CAISO needs to forecast this volatile supply to balance the entire system and ensure grid stability. Thatโ€™s why predicting solar and wind generation is at the center of CAISOโ€™s attention. CAISO runs many different types of forecasts ranging from 15 minutes ahead to 1 hour ahead to 24 hours and beyond.

The goal of this project is to identify the best method of predicting the solar generation for the purely autoregressive model in the absence of any external inputs. Of the 5 forecasting methods analyzed, the best accuracy was achieved by an ensemble of a classical autoregressive SARIMA and a recurrent neural network.

On average, this ensemble improved forecasting accuracy by about 25% (0.07 GW) compared to the best non-machine learning method (Differencing):

solar

For CAISO, this improved accuracy would translate into less reserve capacity requirements. Californiaโ€™s current Capacity Procurement Mechanism sets the price for extra capacity at $75 kW-years, which would translate into at least $5 million annual saving for CAISO.

Conclusions

Solar generation is the centerpiece of Californiaโ€™s bold clean energy policies. However, it is intermittent and irregular. My Shiny app shows that uncertainty in solar generation forecasts can be reduced significantly by machine learning methods, which would lead to sizable economic benefits for CAISO.

 

Autoregressive time series forecasting: A technical primer

The dataset for this project consists of 87,936 timesteps: slightly over 10 years of hourly electricity generation data for the period of April 20, 2010โ€“April 30, 2020 provided by CAISO.

Each forecasting model was trained on data ending on December 31, 2018.  Incremental retraining was not implemented โ€“ both autoregressive SARIMA and RNNs took from 10 to 20 minutes to train on Tesla P100 GPU, so retraining for each new timestep was not feasible.

All forecasts are built on actual hourly series for the 1-hour ahead horizon. Modelsโ€™ performance is tested and compared on 2019-2020 test data, i.e. completely out-of-sample.

Naรฏve Forecast of Solar Generation

In time series forecasting, the naive prediction of F(t+1)=F(t) often serves as a basic benchmark because it turns out surprisingly hard to beat (especially in aperiodic, low signal-to-noise environments). CAISO uses such benchmarks (also called persistence forecasts) extensively in tactical planning.

Below is a naรฏve forecast of the solar generation for the last 3 days of April 2020, a typical picture of how volatile the generation is. The MAE of the naรฏve forecast over these three days is 0.87 GW.

solar

Differenced Forecast

A slightly less naรฏve approach, applicable only for periodic series, relies on smoothing the signal as compared to previous periods:

This forecast does not learn any patterns directly from the data. This is essentially a potentially moving average MA(1) process for the once-differenced I(1) series with fixed coefficients. A grid search of relevant parameters produced the optimal configuration, which turned out to be very simple:

This is essentially forecasting the next hour as the current generation (naรฏve) adjusted for the difference in generation between the same adjacent timesteps 24 hours ago. We can see from the graph that this 24-hour differencing forecast fits the data remarkably well, producing a simple and reliable benchmark. 

The three-day MAE is 0.33 GW, which is less than 40% of the naรฏve error.

solar

SARIMA Solar Forecast

My third approach was to implement a classical SARIMA autoregressive model, which was motivated by the following:

  • The strict 24-hour period warrants at least one order of differencing.
  • When there is inertia and mean-reversion in the underlying data-generating process, it is best described by a moving average MA(q) process. There is clearly inertia in cloud cover, precipitation, and temperature, which tend to affect solar generation in multi-hour stretches.  

  • The regular autoregressive part of the process is inferred from the shape of ACF/PACF correlograms and tested by a grid search.

The raw hourly generation is of course highly autocorrelated: 

solar

The raw process is highly non-stationary, with trend and seasonality. We achieve quasi-stationarity only after double differencing by 24 and 1 periods. The ACF for this I(2) series shows strong autocorrelation for h=1 and h=24 time shifts:

solar

We can conclude that the differenced series is probably a MA(1) process. This means that the model will adjust its predictions by some portion of the error it made in the previous time step, which may have been caused by a random shock such as cloud cover.

Grid search for the best configuration produces the following compact SARIMA:

SARIMA model works particularly well in this case because solar generation is inherently periodic and strongly autocorrelated. This is our best forecast with MAE = 0.25 GW  (all metrics are out-of-sample):

solar

Recurrent Neural Network

Finally, I deployed an RNN-based neural network that was trained on approximately 2.5 years of data, from June 2016 to December 2018, and tested on 2019-2020 data. Each training sample consisted of 48 hours of historical generation and the network was trained to predict the following hourโ€™s generation.

RNN's error was worse than SARIMA's, at 0.33 GWh for our 3-day sample. Apparently, there is little non-linearity nor long-term dependencies, which RNNs are very good at.

In particular, RNN seems to miss the peaks of the cycle more than SARIMA. If this bias proves to be systematic, it may be possible to compensate for. 

solar

Ensemble

The best accuracy is obtained by averaging the SARIMA and RNN models. A simple 50/50 average achieves a sizable decrease in Mean Absolute Error compared to either SARIMA or RNN alone.

Note that all of the models above are autoregressive. Including exogenous predictors such as weather forecasts would further improve the accuracy. However, it would imply additional operational complexity for CAISO since the cloud cover, precipitation and temperature forecasts need to be accounted for each of the 700+ of Californiaโ€™s solar sites.

 To learn more about how to build a SARIMA model with statsmodels API, how to deploy a time-series data pipeline in TensorFlow, or how to set up a learning rate decay schedule, please visit the projectโ€™s github.

About Author

Dmitri Levonian

Dmitri has managed diverse private assets in Europe for the past 15 years. He is a practitioner of deep learning and member of the TensorFlow Certificate network.
View all posts by Dmitri Levonian >

Leave a Comment

meritking May 27, 2023
Solar-California Renewables: predicting solar generation https://schoolblogs.rockyview.ab.ca/globalencounters2018/category/projects/prototyping/

View Posts by Categories

All Posts 2399 posts
AI 7 posts
AI Agent 2 posts
AI-based hotel recommendation 1 posts
AIForGood 1 posts
Alumni 60 posts
Animated Maps 1 posts
APIs 41 posts
Artificial Intelligence 2 posts
Artificial Intelligence 2 posts
AWS 13 posts
Banking 1 posts
Big Data 50 posts
Branch Analysis 1 posts
Capstone 206 posts
Career Education 7 posts
CLIP 1 posts
Community 72 posts
Congestion Zone 1 posts
Content Recommendation 1 posts
Cosine SImilarity 1 posts
Data Analysis 5 posts
Data Engineering 1 posts
Data Engineering 3 posts
Data Science 7 posts
Data Science News and Sharing 73 posts
Data Visualization 324 posts
Events 5 posts
Featured 37 posts
Function calling 1 posts
FutureTech 1 posts
Generative AI 5 posts
Hadoop 13 posts
Image Classification 1 posts
Innovation 2 posts
Kmeans Cluster 1 posts
LLM 6 posts
Machine Learning 364 posts
Marketing 1 posts
Meetup 144 posts
MLOPs 1 posts
Model Deployment 1 posts
Nagamas69 1 posts
NLP 1 posts
OpenAI 5 posts
OpenNYC Data 1 posts
pySpark 1 posts
Python 16 posts
Python 458 posts
Python data analysis 4 posts
Python Shiny 2 posts
R 404 posts
R Data Analysis 1 posts
R Shiny 560 posts
R Visualization 445 posts
RAG 1 posts
RoBERTa 1 posts
semantic rearch 2 posts
Spark 17 posts
SQL 1 posts
Streamlit 2 posts
Student Works 1687 posts
Tableau 12 posts
TensorFlow 3 posts
Traffic 1 posts
User Preference Modeling 1 posts
Vector database 2 posts
Web Scraping 483 posts
wukong138 1 posts

Our Recent Popular Posts

AI 4 AI: ChatGPT Unifies My Blog Posts
by Vinod Chugani
Dec 18, 2022
Meet Your Machine Learning Mentors: Kyle Gallatin
by Vivian Zhang
Nov 4, 2020
NICU Admissions and CCHD: Predicting Based on Data Analysis
by Paul Lee, Aron Berke, Bee Kim, Bettina Meier and Ira Villar
Jan 7, 2020

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day ChatGPT citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay football gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income industry Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI

NYC Data Science Academy

NYC Data Science Academy teaches data science, trains companies and their employees to better profit from data, excels at big data project consulting, and connects trained Data Scientists to our industry.

NYC Data Science Academy is licensed by New York State Education Department.

Get detailed curriculum information about our
amazing bootcamp!

Please enter a valid email address
Sign up completed. Thank you!

Offerings

  • HOME
  • DATA SCIENCE BOOTCAMP
  • ONLINE DATA SCIENCE BOOTCAMP
  • Professional Development Courses
  • CORPORATE OFFERINGS
  • HIRING PARTNERS
  • About

  • About Us
  • Alumni
  • Blog
  • FAQ
  • Contact Us
  • Refund Policy
  • Join Us
  • SOCIAL MEDIA

    ยฉ 2025 NYC Data Science Academy
    All rights reserved. | Site Map
    Privacy Policy | Terms of Service
    Bootcamp Application