Correlation between stock prices in different industrial sectors

Posted on May 1, 2016

Contributed by Joseph Wang. He is currently in the NYC Data Science Academy 12 week full time Data Science Bootcamp program taking place between April 11th to July 1st, 2016. This post is based on his first class project - R visualization (due on the 2nd week of the program).
Motivation:

With the recent down turn in the energy industry, I was curious to know if other industries, such as semiconductor and financial, may be hit based on the statistical inference from the analysis. For initial exploration, I picked two key players from each sector. For the energy industry, Exxon (XOM) and Chevron (CVX) are chosen. For the finance sector, J. P. Morgan (JPM) and Goldman Sachs (GS) are selected.  AMD and Intel (INTL) are sure candidates for the semiconductor industry in USA.

Data Exploration:

I gathered all the time series that I was interested in from Yahoo Finance by using R package. The duration of data was selected based on the completeness of data across all the stocks. The time duration in the series ranged from June 1, 1999 to January 1, 2016. Since the maximal stock price for Goldman Sachs was much larger than other stocks, I scaled each stock by its maximal stock price during the long time duration for visualization. From FIG. 1, we observe Chevron's stock price almost collapses into Exxon's price in the past decade. It was interesting to see the seasonal oscillation at a period around four years in the energy stock prices through the course of history from the end of 2001. The regular oscillation did not occur for other sectors. However, one can sense the strong correlation between stocks within finance and energy sectors when energy stock prices plumbed. However, this was not true for semiconductor sector. By the trend of the time series, we could tell there were no symmetrical counts for the stock prices to follow normal distribution without filtering seasonal trends and bias. Instead, we could understand the correlation between stocks from a different perspective.

FIG. 1: Scaled Stock Prices versus Business Days with first day and last day of the time series labeled by date

For stock trading, what is more interesting is the "up and down" for the stock prices which is defined as the difference of the stock prices in adjacent days, which could be fairly easily calculated by Matlab or R. Based on basic calculus, one can know the daily stock prices based on the time integration of the difference signals we discuss later. In other words, if one can learn from the difference signals which will be shown as Gaussian, it is likely that we can make a prediction for future stock prices.

In FIG. 2, we show the signals for all the stocks we selected, and we can see the signals are likely to be normal distribution as shown be comparable counts of positive and negative values with respect to the mean value which is approximately zero despite the highly non-normal distribution for original stock prices. In addition, we also observe there might be strong correlation between the signals under the same sectors. Let us investigate further in details by histograms.

FIG. 2: Stock Prices Daily versus Business Days

FIG. 2: Stock Prices Daily versus Business Days

In FIG. 3, we show the histograms for different sectors. The signals for each sector are done by the summation of constituent stocks under that sector. We observe the amazing symmetric normal distribution. This gives us a hope to draw statistical inference based firmly on Gaussian distribution.

FIG. 3: Stock Price Difference Daily for each sector divided into 100 bins

FIG. 3: Stock Price Difference Daily for each sector divided into 100 bins

In FIG. 4, we show the scatter plots for the difference signals between sectors. We observe the a stronger correlation between Finance and Energy sectors but much weaker correlation between other combinations. If we assume that the null hypothesis is that there is no correlation on the difference signals between different sectors. The correlation matrix between sectors and p-values can be numerically calculated as the following correlation matrix Cij where the indices i=1 to 3 is not equal to j=1 to 3(1: semiconductor sector; 2: finance sector ; 3: energy sector ). The linear correlation between sectors is given by off-diagonal Cij :C12=C21=0.3382, C13=C31=0.1968, and C32=c23=0.4984. The corresponding p values are almost zero to double precision. This means our null hypothesis is statistically rejected. Therefore, we can be statistically confident that there are linear correlation between different sectors. Based on the larger p-values between semiconductor sector and finance sector as well as finance sector and semiconductor sector, we are far more confident that they are correlated than the correlation between energy sector and semiconductor sector.

 

FIG. 4: Scatter Plots for Stock Price Difference between sectors

FIG. 4: Scatter Plots for Stock Price Difference between sectors

 

Conclusion and Discussion:

Based on a different strategy, we can identify the stronger linear correlation for the stock prices between finance sector and other sectors. The semiconductor and energy sector is 95% confident to be linearly correlated but is not strong. In order to model the shorter time correlations, we may need to further filter the difference stock prices on the scale shorter than days so that the seasonal signals and bias on the time scale of days can be accounted for. For longer time scales, the difference stock signal processing should be able to get rid of the bias and filter out the seasonal trends.

Appendix:

Import time series data through R by R codes:

library(quantmod)

data <- getSymbols("XOM", src = "yahoo", from = "1999-06-01", to = "2016-01-01", auto.assign = FALSE)

write.csv(data, file="XOM.csv")

data <- getSymbols("CVX", src = "yahoo", from ="1999-06-01", to = "2016-01-01", auto.assign = FALSE)

write.csv(data, file="CVX.csv")

data <- getSymbols("AMD", src = "yahoo", from ="1999-06-01", to = "2016-01-01", auto.assign = FALSE)

write.csv(data,file="AMD.csv")

data <- getSymbols("INTC", src = "yahoo", from ="1999-06-01", to = "2016-01-01", auto.assign = FALSE)

write.csv(data,file="INTC.csv")

data <- getSymbols("GS", src = "yahoo", from ="1999-06-01", to = "2016-01-01", auto.assign = FALSE)

write.csv(data, file="GS.csv")

data <- getSymbols("JPM", src = "yahoo", from ="1999-06-01", to = "2016-01-01", auto.assign = FALSE )

write.csv(data, file="JPM.csv")

Next we read these csv files into Matlab data format files to prepare for visualization for our results in Matlab scripts (from this point on, codes are written in Matlab script .m files): 

M=csvread('XOM.csv'); save('XOM.mat','M');

M=csvread('CSV.csv'); save('CSV.mat','M');

M=csvread('AMD.csv'); save('AMD.mat','M');

M=csvread('INTC.csv');save('INTC.mat','M');

M=csvread('GS.csv');save('GS.mat','M');

M=csvread('JPM.csv');save('JPM.mat','M');

Now we load the .mat files into vector variables so that we can do data processing in Matlab languages:

clear all

%After downloading the time serie data from Yahoo Finance through R

%library(quantmod), we save the data into .csv files and then converted into

%Matlab data files in .mat

%Time series data are loaded based on closing time on business days.

load XOM %EXXON stock prices

load CVX %Chevron stock price

load INTC %Intel stock price

load AMD %AMD stock price

load JPM %J.P. Morgan stock price

load GS %Goldman Sachs stock price

x=0:1:length(XOM(:,6))-1;

plot(x,XOM(:,6),'k') %Plot the sixth column of the Exxon data which is the adjusted stock price

hold on;

plot(x,CVX(:,6),'b');%Plot the Chevron data

hold on

plot(x,INTC(:,6),'r');%Plot the intel data

hold on

plot(x,AMD(:,6),'y') %Plot the AMD data

hold on;

plot(x,JPM(:,6),'m'); %Plot the JPM data

hold on

plot(x,GS(:,6),'c'); %Plot the GS data

ylabel('Scaled Stock prices(dolloars)','fontsize',14,'fontweight','b');

%To observe better on the trend, we renormalize each stock

%prices based on its maxima price through the selected time series

figure

plot(x,XOM(:,6)/max(XOM(:,6)),'k')

hold on;

plot(x,CVX(:,6)/max(CVX(:,6)),'b');

hold on

plot(x,INTC(:,6)/max(INTC(:,6)),'r');

hold on

plot(x,AMD(:,6)/max(AMD(:,6)),'y')

hold on;

plot(x,JPM(:,6)/max(JPM(:,6)),'m');

hold on

plot(x,GS(:,6)/max(GS(:,6)),'c');

xlabel('Business Days','fontsize',14,'fontweight','b');

ylabel('Renormalized Stock prices(dolloars)','fontsize',14,'fontweight','b');

%By observing the trend, we do not expect the data is useful

%for statistical inference due to its non-normal distribution.

%Instead, what is more interesting is the "up and down" for the stock

%prices which is defined as the difference of the stock prices in adjacent

%days, which can be calculated by diff function in MATLAB.

figure

diff_XOM=diff(XOM(:,6));

diff_CVX=diff(CVX(:,6));

diff_INTC=diff(INTC(:,6));

diff_AMD=diff(AMD(:,6));

diff_JPM=diff(JPM(:,6));

diff_GS=diff(GS(:,6));

xx=0:1:length(XOM(:,6))-2;

subplot(6,1,1)

plot(xx,diff_XOM,'k')

subplot(6,1,2)

plot(xx,diff_CVX,'b');

subplot(6,1,3)

plot(xx,diff_INTC,'r');

subplot(6,1,4)

plot(xx,diff_AMD,'y')

subplot(6,1,5)

plot(xx,diff_JPM,'m');

subplot(6,1,6)

plot(xx,diff_GS,'c');

xlabel('Business Days','fontsize',14,'fontweight','b')

ylabel('Stock Price Difference Daily','fontsize',14,'fontweight','b')

%Histograms showing the normal distributed stock price difference

subplot(1,3,1)

hist(diff_XOM+diff_CVX,100,'b')

ylabel('Counts in 100 bins','fontsize',14,'fontweight','b')

subplot(1,3,2)

hist(diff_INTC+diff_AMD,100,'r')

xlabel('Stock Price Difference Daily for semiconductor sector ','fontsize',14,'fontweight','b')

ylabel('Counts in 100 bins','fontsize',14,'fontweight','b')

subplot(1,3,3)

hist(diff_JPM+diff_GS,100,'g')

ylabel('Counts in 100 bins','fontsize',14,'fontweight','b')

%Sacatter plot between companies

%figure

%plot(diff_INTC,diff_XOM,'O')

%xlabel('INTC');ylabel('XOM')

%figure

%plot(diff_INTC,diff_CVX,'*')

%xlabel('INTC');ylabel('CVX')

%figure

%plot(diff_AMD,diff_XOM,'p')

%xlabel('AMD');ylabel('XOM')

%figure

%plot(diff_AMD,diff_CVX,'+')

%xlabel('AMD');ylabel('CVX')

%hold on;

%Plot scatter plots for different sectors

%Stock prices from the same industrial sectors are added together

subplot(1,3,1)

plot(diff_AMD+diff_INTC,diff_JPM+diff_GS,'.')

subplot(1,3,2)

plot(diff_JPM+diff_GS,diff_XOM+diff_CVX,'.')

subplot(1,3,3)

plot(diff_AMD+diff_INTC,diff_XOM+diff_CVX,'.')

%Calculation of correlation between companies and sectors

%X=[diff_AMD diff_INTC diff_JPM diff_GS diff_XOM diff_CVX];

%[correlation_com,pval_com]=corr(X);

%Calculation of correlation matrix and p values

Y=[diff_AMD+diff_INTC diff_JPM+diff_GS diff_XOM+diff_CVX];

[correlation_sec,pval_sec] = corr(Y);

%Here we only care about the correlation between the signs of the diffference of stocks

%What will be the probability of stocks in one sector goes up or down next

%day and the stocks in another sectors also goes up or down

%I found almost that almost 66 percent of the time this occured.

%Y1=diff_AMD+diff_INTC;

%Y2=diff_JPM+diff_GS;

%Y3=diff_XOM+diff_CVX;

%Only catch the sign

%for j=1:length(Y1)

% if Y1(j)>0

% Y1(j)=1;

% else

% Y1(j)=-1;

% end

% if Y2(j)>0

% Y2(j)=1;

% else

% Y2(j)=-1;

% end

% if Y3(j)>0

% Y3(j)=1;

% else

% Y3(j)=-1;

% end

%end

%Count the number of days both stocks are all up or down

%N1=0;

%N2=0;

%N3=0;

%for j=1:length(Y1)

% if Y1(j)*Y2(j)>0

% N1=N1+1;

% end

% if Y1(j)*Y3(j)>0

% N2=N2+1;

% end

% if Y2(j)*Y3(j)>0

% N3=N3+1;

% end

%end

%P1=N1/length(Y1);

%P2=N2/length(Y2);

%P3=N3/length(Y3);

 

 

 

About Author

Joseph Wang

Joseph Wang is a theoretical physicist with 20 years of proven research experience in modeling collective phenomena and exploration numerical simulation to make predictions in complex systems. Identifying correlations between different degrees of freedoms, connecting those to the...
View all posts by Joseph Wang >

Related Articles

Leave a Comment

Google March 5, 2021
Google Every when inside a whilst we pick blogs that we read. Listed below would be the most current sites that we select.
Google March 5, 2021
Google Every after inside a although we choose blogs that we read. Listed below would be the latest websites that we select.
Google July 1, 2020
Google Just beneath, are a lot of absolutely not associated sites to ours, on the other hand, they may be certainly worth going over.
Google June 26, 2020
Google Usually posts some very interesting stuff like this. If you’re new to this site.
hut 17 coins August 26, 2016
Many thanks really handy. Will share site with my pals hut 17 coins http://alma51hl.is-programmer.com/2016/8/26/cheap-pokemon-go-account.205467.html
Joseph Wang June 25, 2016
Thanks for your nice comments. I am happy that it clicks. Joseph
Joseph Wang June 25, 2016
Thanks, Gary. I will keep up with good posting. Joseph
Joseph Wang June 25, 2016
Thanks for your sincere comments. I will strive to publish quality stuff. thanks Joseph
Joseph Wang June 25, 2016
Thanks for your comments. You are welcome to make a link to my website if you refer to my contents. thanks Joseph
Joseph Wang June 25, 2016
Thanks for your comments. This is just an exploratory analysis. thanks Joseph
Joseph Wang June 25, 2016
Thanks for your nice comments. Hope to write something interesting once in a while. Best Joseph
Joseph Wang June 25, 2016
Thanks for your comments.
clash royale hack ios June 25, 2016
I was waiting for this type of topic. Thank you very much for the post.
Garry June 20, 2016
This is the right website for anyone who desires to learn about this matter. You realize so much its almost hard to argue with you (not that I actually would desire… HaHa). You undoubtedly set a brand new spin on a subject thats been written about for years. Fantastic stuff, just amazing!
Kathaleen June 20, 2016
After study a few of the blog posts on your web site now, and I actually like your manner of blogging. I bookmarked it to my bookmark website list and will be checking back soon. Pls check out my site as well and I would like to know what you believe.
dungeon hunter 5 hack apk June 13, 2016
There are some interesting points in time in this post but I do’t know if I see all of them centre to heart. There is some validity but I'll take hold opinion until I look into it further. Great article , thanks and we need more! Added to FeedBurner as well.
lords mobile hack online June 10, 2016
Fine post. I learn something more ambitious on different sites regular. It'll always be stimulating to read content from other writers and practice a little something from their store. I’d favor to use some with the content on my site whether you do’t mind. Natually I’ll give you a link on your web blog. Thanks for sharing.
marvel future fight hack download June 10, 2016
There is clearly a bundle to know about it. I suppose you made certain fine points in attributes also.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI