NYC Data Science Academy| Blog
Bootcamps
Lifetime Job Support Available Financing Available
Bootcamps
Data Science with Machine Learning Flagship ๐Ÿ† Data Analytics Bootcamp Artificial Intelligence Bootcamp New Release ๐ŸŽ‰
Free Lesson
Intro to Data Science New Release ๐ŸŽ‰
Find Inspiration
Find Alumni with Similar Background
Job Outlook
Occupational Outlook Graduate Outcomes Must See ๐Ÿ”ฅ
Alumni
Success Stories Testimonials Alumni Directory Alumni Exclusive Study Program
Courses
View Bundled Courses
Financing Available
Bootcamp Prep Popular ๐Ÿ”ฅ Data Science Mastery Data Science Launchpad with Python View AI Courses Generative AI for Everyone New ๐ŸŽ‰ Generative AI for Finance New ๐ŸŽ‰ Generative AI for Marketing New ๐ŸŽ‰
Bundle Up
Learn More and Save More
Combination of data science courses.
View Data Science Courses
Beginner
Introductory Python
Intermediate
Data Science Python: Data Analysis and Visualization Popular ๐Ÿ”ฅ Data Science R: Data Analysis and Visualization
Advanced
Data Science Python: Machine Learning Popular ๐Ÿ”ฅ Data Science R: Machine Learning Designing and Implementing Production MLOps New ๐ŸŽ‰ Natural Language Processing for Production (NLP) New ๐ŸŽ‰
Find Inspiration
Get Course Recommendation Must Try ๐Ÿ’Ž An Ultimate Guide to Become a Data Scientist
For Companies
For Companies
Corporate Offerings Hiring Partners Candidate Portfolio Hire Our Graduates
Students Work
Students Work
All Posts Capstone Data Visualization Machine Learning Python Projects R Projects
Tutorials
About
About
About Us Accreditation Contact Us Join Us FAQ Webinars Subscription An Ultimate Guide to
Become a Data Scientist
    Login
NYC Data Science Acedemy
Bootcamps
Courses
Students Work
About
Bootcamps
Bootcamps
Data Science with Machine Learning Flagship
Data Analytics Bootcamp
Artificial Intelligence Bootcamp New Release ๐ŸŽ‰
Free Lessons
Intro to Data Science New Release ๐ŸŽ‰
Find Inspiration
Find Alumni with Similar Background
Job Outlook
Occupational Outlook
Graduate Outcomes Must See ๐Ÿ”ฅ
Alumni
Success Stories
Testimonials
Alumni Directory
Alumni Exclusive Study Program
Courses
Bundles
financing available
View All Bundles
Bootcamp Prep
Data Science Mastery
Data Science Launchpad with Python NEW!
View AI Courses
Generative AI for Everyone
Generative AI for Finance
Generative AI for Marketing
View Data Science Courses
View All Professional Development Courses
Beginner
Introductory Python
Intermediate
Python: Data Analysis and Visualization
R: Data Analysis and Visualization
Advanced
Python: Machine Learning
R: Machine Learning
Designing and Implementing Production MLOps
Natural Language Processing for Production (NLP)
For Companies
Corporate Offerings
Hiring Partners
Candidate Portfolio
Hire Our Graduates
Students Work
All Posts
Capstone
Data Visualization
Machine Learning
Python Projects
R Projects
About
Accreditation
About Us
Contact Us
Join Us
FAQ
Webinars
Subscription
An Ultimate Guide to Become a Data Scientist
Tutorials
Data Analytics
  • Learn Pandas
  • Learn NumPy
  • Learn SciPy
  • Learn Matplotlib
Machine Learning
  • Boosting
  • Random Forest
  • Linear Regression
  • Decision Tree
  • PCA
Interview by Companies
  • JPMC
  • Google
  • Facebook
Artificial Intelligence
  • Learn Generative AI
  • Learn ChatGPT-3.5
  • Learn ChatGPT-4
  • Learn Google Bard
Coding
  • Learn Python
  • Learn SQL
  • Learn MySQL
  • Learn NoSQL
  • Learn PySpark
  • Learn PyTorch
Interview Questions
  • Python Hard
  • R Easy
  • R Hard
  • SQL Easy
  • SQL Hard
  • Python Easy
Data Science Blog > AWS > Forecasting Economic Risk in the EU into 2020

Forecasting Economic Risk in the EU into 2020

Jack Yip and Chen Trilnik
Posted on Jun 27, 2017

Written by Chen Trilnik and Jack Yip. To view the original source code, visit our Github repo here.

 

I. Introduction

Business Objective

In the last decade, the European Union (EU) economy has been negatively impacted by a series of events, most notably the global financial crisis (2008), the European debt crisis (2009), and the Brexit vote for the UK to leave the European Union (2016). In this era of political instability, investors and companies alike are curious to know whether this is a time of financial opportunity or risk. This analysis aims to assess the economic state of the EU into 2020 using a data science approach.

 

Gross Domestic Product (GDP) Growth as a Proxy for Assessing Economic State

To answer our business question, we determined that forecasting GDP growth is an appropriate proxy for predicting the overall economic state of the EU. To simplify our analysis, we selected EU countries that, 1) are leaders by total GDP and 2) demonstrate unstable GDP growth in the last decade. We believe this subset of countries is pivotal to the short-term future overall economic state of the EU.

 

What is GDP?

The gross domestic product (GDP) is one of the primary indicators used to gauge the health of a country's economy. It represents the total dollar value of all goods and services produced over a specific time period.

GDP is often expressed as a comparison to the previous quarter or year. Accordingly, a year-to-year GDP growth of 3% can be read as the economy having grown 3% during the one-year period.

Forecasting GDP can provide valuable information to different groups. For example,

  • Stock investors may choose to invest more aggressively in a country where they expect GDP growth to be optimistic
  • Foreign manufacturing companies may consider to open a new factory in Europe given a stable GDP growth

 

About the European Union (EU) and the Eurozone

The European Union (EU) is an economic and political partnership among 28 European countries. It was founded after World War II to foster economic co-operation, with the idea that countries which trade together are more likely to avoid going to war with each other. The Eurozone, on the other hand, spans across 19 of EU member countries that use a common currency called the Euro.

 

The EU in a Snapshot

Since its establishment, the EU grew from 6 to 28 members, added 300M to its population, and grew GDP by more than seven-folds. (Source)

II. Exploring EU Macroeconomic Data

AMECO Macroeconomic Data

AMECO is the annual macro-economic database of the European Commission. The database contains data for EU-28, the euro area, EU Member States, candidate countries and other OECD countries. In total, AMECO provides annual data points for over 460 macro indicators across 18 categories.

Total GDP & Population Per Country

Germany, France, United Kingdom, Italy, and Spain lead the EU with both the largest GDP and population in 2017.

GDP Per Capita

GDP per capita, a proxy for measuring average individual wealth in a country, varies greatly among the EU members.

Unemployment Rates in EU Countries

Unemployment rates in the EU are at record highs in the past decade, including Italy and Spain, who contribute a large portion of total GDP in the EU.

Government Debt As Percentage of GDP

More than half of the EU members exceeded a debt-to-GDP ratio of 60%, which is alarming for developed countries.

III. High-Level Project Workflow

 

Data Cleaning

The data was restructured to a tidy format where:

  • Each variable measured is in one column
  • Each different observation of that variable is in a different row
  • Additionally, non-EU countries were removed from the analysis.

 

Amazon Web Services (AWS)

Storing our data in a MySQL database through Amazon Web Services Relational Database Services (AWS RDS) facilitates more effective collaboration and allows for reproducible research.

 

Country Selection Using K-Means Clustering

As mentioned previously, this analysis focuses on EU countries that, 1) are leaders by total GDP and 2) demonstrate unstable GDP growth in the last decade. While the former can easily be determined, k-means clustering was used to assess the latter.

Along with using GDP growth as an input, we determined the five most important macroeconomic features to predicting it (process discussed in the Feature Selection section). Using these six features, k-means clustering was used to group each observation into one of five categories. We interpreted each of these five groups as different economic states a country has shown for a particular year, ranging from very bad to very good.

Of the top 5 EU countries by GDP in 2017, three countries (UK, Italy, and Spain) demonstrated unstable economic states over the past decade (i.e. fluctuations in economic states and/or high number of years with poor economic state.) These countries are selected for further GDP forecast analysis.

 

IV. Forecasting GDP Growth - ARIMA Time Series

About ARIMA Time Series

The main application of an Autoregressive Integrated Moving Average (ARIMA) model is in the area of short term forecasting, requiring at least 40 historical data points. It works best when the data exhibits a stable or consistent pattern over time with minimum number of outliers.

In our analysis, we have at least 50 historical data points for each of the macroeconomic indicators, which makes ARIMA time series forecasting appropriate to use.

 

Feature Engineering - Transform to Growth Rates

Prior to forecasting GDP growth, we transformed all 460+ features to growth rates. This process acts as a form of standardization to make comparison possible between countries. Additionally, standardization is necessary to mitigate unwanted bias when conducting feature importance such as Lasso shrinkage and predictions using linear regressions.

The example below shows the transformation of the population feature to growth rates.

 

Feature Selection - ARIMA Time Series

To forecast GDP growth using an ARIMA model, the only feature required is the GDP growth itself.

 

Checking For Stationarity

โ€œA stationary time series is one whose statistical properties such as mean, variance, autocorrelation, etc. are all constant over time. Most statistical forecasting methods are based on the assumption that the time series can be rendered approximately stationary.โ€

Prior to forecasting, both Dickey-Fuller and KPSS tests were performed to validate the stationarity of the GDP growth rates for each of the three selected countries.

 

Cross Validation - Select Best p & q for ARIMA Model

An ARIMA model is comprised of three components: Auto-Regressive (AR), Integration/Differencing (I), and Moving Average (MA). These components correspond to the parameters p, d, and q, respectively, when configuring an ARIMA model.

Provided that our GDP growth rates are stationary (d = 0), cross validation can be performed across a range of p and q values to identify an ARMA model minimizing the Schwartz Bayesian Information Criterion (BIC) metric. Essentially, the model with the lowest BIC is more efficient in predictions as it favors models with high accuracy but penalizes models that are complex.

Choosing the best model with the lowest BIC, we forecasted the GDP growth rates for each of the three countries into 2020. The forecasts can be found toward the end of the blogpost.

 

V. Forecasting GDP Growth - Multiple Linear Regression & Logistic Regression

Limitations with Small Sample Size

When choosing appropriate machine learning models, we took into consideration of the limitations of using a small sample size. As we are limited to less than 60 observations for each feature across each country, it was important to use simpler models to avoid the issue of overfitting.

We determined that using multiple linear regression and logistics regression models were most appropriate for our circumstance. Because these models do not handle complexity very well, we perform feature selection in the next section to limit the number of predictors used to forecast GDP growth.

 

Feature Selection Using Lasso Regularization & Chi-Square

Of over 460 macroeconomic indicators, the 5 most important predictors of GDP growth were chosen to conduct the forecasts of GDP growth.

Lasso regularization works by weeding out less important features by shrinking their coefficients to zeros using the L1 penalty.

The visualization below demonstrates the shrinkage of the feature coefficients by varying the penalty factor ฮป.

The chi-square test measures dependence between stochastic variables, so using this function weeds out the features that are the most likely to be independent of class and therefore irrelevant for predictions.

The screenshot below on the right side shows the feature importance as determined by the chi-square test.

Using lasso regularization and chi-square test for feature importance, we selected the following five features for predicting GDP growth.

 

Top 5 Features for Predicting GDP Growth

  • Private final consumption expenditure at current prices
  • Consumption of fixed capital at current prices: total economy
  • Domestic demand excluding stocks at current prices
  • Final demand at current prices
  • Domestic income at current prices

 

Auto.ARIMA for Forecasting Top 5 Features

Previously, GDP growth was forecasted using itself as a predictor (i.e. ARIMA time series forecasting). In this section, GDP growth is forecasted using the top five predictors identified during the feature selection process. In order to forecast GDP growth using these predictors, we used the โ€˜auto.arimaโ€™ function from the R โ€˜forecastโ€™ library to forecast each of these five features into 2020. These forecasted values will help us forecast GDP growth values using a multiple linear regression and logistics regression.

The illustration below shows the process of forecasting each of the five selected features for predicting GDP growth.

 

 

Multiple Linear Regression

 

Cross Validation - Select Best Training Window Size

To choose the best multiple linear regression model for forecasting GDP growth, we conducted a cross validation across different training window sizes. Since the features and target variable at hand are time series, using a traditional k-fold cross validation will not consider the time series trend. Instead, we implemented a moving window cross validation.

Fixing the test window size at 3 years, we conducted a cross validation with different training window sizes, from length to 1 to n-3, where n represents the number of observations for each feature. For each training window size, we iterated through the observations using a forward-looking strategy. This avoids predicting past data points using future information. In the end, the mean R2 is evaluated across the iterations.

 

Forecast With Best Training Window Size

To forecast GDP growth into 2020, we chose the model with the training window size that provides the best mean R2. The forecasts are discussed toward the end of the blogpost.

 

Logistic Regression

 

Categorizing GDP Growth

To forecast GDP growth using a logistic regression, we needed to transform GDP growth rates to categories. We did not need to transform the predictors, as only the target variable needs to be in categorical form.

In classifying the GDP growth rates, the median GDP growth of the past 12 years for each country was used as the benchmark. Each year was categorized into either โ€œbelow medianโ€ or โ€œabove median.โ€

The illustration below shows how the GDP growth rates are transformed to categories assuming a median of 0.12.

 

Cross Validation - Grid Search

A grid search cross validation using l1 (lasso) and l2 (ridge) penalty is used across a range of  ฮป values to identify the model with the best accuracy. This model is then used to forecast the GDP growth categories.

 

Summary of GDP Growth Forecasts

The table below outlines the forecasts using each of the models discussed for the United Kingdom, Italy, and Spain. The first three rows (2015 to 2017) are actual GDP growths based on historical data from AMECO. The following three rows (2018 to 2020) are forecasts provided by each of the three models (i.e. ARIMA, Multiple Linear Regression, and Logistic Regression).

In summary, our three models predicted positive GDP growths in the next three years for the United Kingdom, Italy, and Spain. Assuming a stable political climate in the near future, foreign investors and companies can expect economic conditions to grow in low single-digit rates.

 

 

About Authors

Jack Yip

Jack is passionate about using state-of-the-art data analytic techniques to help companies get ahead of the curve in monetizing their data. He combines effective storytelling and simple visualizations to translate highly technical analyses into actionable insights. Jack has...
View all posts by Jack Yip >

Chen Trilnik

View all posts by Chen Trilnik >

Related Articles

Capstone
Catching Fraud in the Healthcare System
Capstone
The Convenience Factor: How Grocery Stores Impact Property Values
Capstone
Acquisition Due Dilligence Automation for Smaller Firms
Machine Learning
Pandemic Effects on the Ames Housing Market and Lifestyle
Machine Learning
The Ames Data Set: Sales Price Tackled With Diverse Models

Leave a Comment

Cancel reply

You must be logged in to post a comment.

Dataviz of the week, 12/7/17 | Robert Grant's stats blog July 12, 2017
[โ€ฆ] found it at http://nycdatascience.edu/blog/student-works/forecasting-economic-risk-eu-2020/ and they got it [โ€ฆ]
Forecasting Economic Risk in the EU into 2020 | A bunch of data June 27, 2017
[โ€ฆ] post Forecasting Economic Risk in the EU into 2020 appeared first on NYC Data Science Academy [โ€ฆ]
Forecasting Economic Risk in the EU into 2020 โ€“ Mubashir Qasim June 27, 2017
[โ€ฆ] post Forecasting Economic Risk in the EU into 2020 appeared first on NYC Data Science Academy [โ€ฆ]

View Posts by Categories

All Posts 2399 posts
AI 7 posts
AI Agent 2 posts
AI-based hotel recommendation 1 posts
AIForGood 1 posts
Alumni 60 posts
Animated Maps 1 posts
APIs 41 posts
Artificial Intelligence 2 posts
Artificial Intelligence 2 posts
AWS 13 posts
Banking 1 posts
Big Data 50 posts
Branch Analysis 1 posts
Capstone 206 posts
Career Education 7 posts
CLIP 1 posts
Community 72 posts
Congestion Zone 1 posts
Content Recommendation 1 posts
Cosine SImilarity 1 posts
Data Analysis 5 posts
Data Engineering 1 posts
Data Engineering 3 posts
Data Science 7 posts
Data Science News and Sharing 73 posts
Data Visualization 324 posts
Events 5 posts
Featured 37 posts
Function calling 1 posts
FutureTech 1 posts
Generative AI 5 posts
Hadoop 13 posts
Image Classification 1 posts
Innovation 2 posts
Kmeans Cluster 1 posts
LLM 6 posts
Machine Learning 364 posts
Marketing 1 posts
Meetup 144 posts
MLOPs 1 posts
Model Deployment 1 posts
Nagamas69 1 posts
NLP 1 posts
OpenAI 5 posts
OpenNYC Data 1 posts
pySpark 1 posts
Python 16 posts
Python 458 posts
Python data analysis 4 posts
Python Shiny 2 posts
R 404 posts
R Data Analysis 1 posts
R Shiny 560 posts
R Visualization 445 posts
RAG 1 posts
RoBERTa 1 posts
semantic rearch 2 posts
Spark 17 posts
SQL 1 posts
Streamlit 2 posts
Student Works 1687 posts
Tableau 12 posts
TensorFlow 3 posts
Traffic 1 posts
User Preference Modeling 1 posts
Vector database 2 posts
Web Scraping 483 posts
wukong138 1 posts

Our Recent Popular Posts

AI 4 AI: ChatGPT Unifies My Blog Posts
by Vinod Chugani
Dec 18, 2022
Meet Your Machine Learning Mentors: Kyle Gallatin
by Vivian Zhang
Nov 4, 2020
NICU Admissions and CCHD: Predicting Based on Data Analysis
by Paul Lee, Aron Berke, Bee Kim, Bettina Meier and Ira Villar
Jan 7, 2020

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day ChatGPT citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay football gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income industry Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI

NYC Data Science Academy

NYC Data Science Academy teaches data science, trains companies and their employees to better profit from data, excels at big data project consulting, and connects trained Data Scientists to our industry.

NYC Data Science Academy is licensed by New York State Education Department.

Get detailed curriculum information about our
amazing bootcamp!

Please enter a valid email address
Sign up completed. Thank you!

Offerings

  • HOME
  • DATA SCIENCE BOOTCAMP
  • ONLINE DATA SCIENCE BOOTCAMP
  • Professional Development Courses
  • CORPORATE OFFERINGS
  • HIRING PARTNERS
  • About

  • About Us
  • Alumni
  • Blog
  • FAQ
  • Contact Us
  • Refund Policy
  • Join Us
  • SOCIAL MEDIA

    ยฉ 2025 NYC Data Science Academy
    All rights reserved. | Site Map
    Privacy Policy | Terms of Service
    Bootcamp Application