Analyzing Transportation Equities using R

Avatar
Posted on Jul 21, 2014

Contributed by Peggy Sobolewski.  Peggy took our Data Science with R - Beginner Level with Vivian Zhang in June 2014.  This post was based on her final project submission

Project: The main purpose of this project is to analyze my universe to create a portfolio.  Through R I will analyze 22 equities and compare the equities' returns, market cap, price, shares outstanding and volume  in a graphical representation. First step in starting this project is to download the required packages:

#set-up
Sys.setenv(JAVA_HOME='C:Program FilesJavajre7')
install.packages("Rbbg", repos="http://r.findata.org")
require(PerformanceAnalytics)
require(reshape)
require(ggplot2)
library(rJava)
library(Rbbg)

The next step after properly install the necessary packages is to download the data from the Bloomberg API.

conn <- blpConnect()

After establishing a connection with the Bloomberg API, we now get to get the data we require for our securities. (One quick side note - to get the equities that I want to examine I ran an EQS (Equity Screening) function on a Bloomberg Terminal using screens including: 1) only primary securities, 2) country of domicile: USA (only wanted domestic companies), 3) market cap of 1 billion (relatively large cap only), 4) price > 1 (allows to eliminate any recent public companies [IPOs]), and finally 5) a daily volume traded greater than .5 million (so theres available liquidity in the equity).

#now adding those securities that came up through the screening into R through Rbbg
#defining both securities and fields for the bdh (bloomberg historical data) function
securities <- c("ALK US Equity", "DAL US Equity", "JBLU US Equity", "LUV US Equity", "SAVE US Equity",
"UAL US Equity", "CHRW US Equity", "EXPD US Equity", "FDX US Equity", "HUBG US Equity", 
"UPS US Equity", "UTIW US Equity", "XPO US Equity", "CSX US Equity", 
"KSU US Equity", "NSC US Equity", "UNP US Equity", "CAR US Equity", "CNW US Equity", 
"HTZ US Equity", "JBHT US Equity")
fields <- c("PX_LAST", "TOT_MKT_VAL", "VOLATILITY_90D", "EQY_SH_OUT", "VOLUME")
allsecurities <- bdh(conn, securities, fields, Sys.Date()-730, always.display.tickers=TRUE,include.non.trading.days=FALSE, dates.as.row.names=FALSE)
#using Sys.Date()-730 instead of start and end date that way it can be continous to 
#whenever re-run the code (always updated) - dated three years back (730 trading days)

After getting the data into R, I want to add a column to the original data,frame of returns from PX_LAST (closing price) so that comparing the price will be normalized.

all <- c(NA,diff(log(allsecurities$PX_LAST)))
maindata <- transform(allsecurities, returns=all)
head(maindata) #making sure it went through - it did!

Next step: lets make sure the data looks good -

head(maindata)
tail(maindata)
dim(maindata) 
#makes sense since there are 22 securities and we have the data going back a little over 18    #months
names(maindata)
summary(maindata) #helpful to see low and high of all the variables (gives a good idea of w    #here values will fall)
str(maindata)
class(maindata) #just to verify it is a data.frame
sapply(maindata, class) #everything is right except date 
#(ticker=character, PX_LAST, TOT_MKT_VAL and VOL are all numeric, but date appears as a character, we need it as its proper form = date)
maindata$date <- as.Date(maindata$date) 
#lets try sapply(maindata, class) one more time to verify
sapply(maindata, class) #worked perfectly!

So now that the data is loaded and looks good - were on to the fun part! Examing it! What should we do first?? Lets see if theres any corelation in the stocks by breaking them down into their GICS subindustries (for my purpose those will be: 1) air freight and logistics, 2) airlines, 3) railroads, and 4) trucking

#freight and logistics
freight_logistics <- c("CHRW US Equity", "EXPD US Equity", "FDX US Equity", 
"HUBG US Equity", "UPS US Equity", "UTIW US Equity", "XPO US Equity")
frlo <- bdh(conn, freight_logistics, fields, Sys.Date()-730, always.display.tickers=TRUE,  include.non.trading.days=FALSE, dates.as.row.names=FALSE)
head(frlo)
frloreturns <- c(NA,diff(log(frlo$PX_LAST)))
freightlogistics <- transform(frlo, returns=frloreturns)head(freightlogistics)
fl.data <- melt(freightlogistics,id=c("ticker","date"))
head(fl.data)
unique(fl.data$variable)
rfl.data <- cast(subset(fl.data,variable=="returns"),date~ticker, sum)summary(rfl.data)
chart.Correlation(rfl.data)

I then proceeded to use the same format of code to create all the correlation plots.  Lets see what they produced:

Freight & Logistics Returns Correlation Railroad Returns Correlation Airline Returns Correlation

Lets see which equities are closely correlated: Freight & Logistics:

  1. HubGroup (HUBG) & UTI Worldwide (UTIW) - .97
  2. FedEx (FDX) & UPS (UPS) - .96
  3. HubGroup (HUBG) & FedEx (FDX) - .90

Airlines:

  1. Delta (DAL) & JetBlue (JBLU) - .97
  2. Delta (DAL) & United (UAL) - .95
  3. JetBlue (JBLU) & United (UAL) - .93

Railroads:

  1. Norfolk Southern (NSC) & Union Pacific (UNP) - .92
  2. CSX (CSX) & Norfolk Sout (NSC) - .78
  3. Kansas City Southern (KSU) & Union Pacific (UNP) - .74

Trucking:

  1. Hertz (HTZ) & Swift (SWFT) - .93
  2. JB Hunt (JBHT) & Swift (SWFT) - .84
  3. Con-Way (CNW) & Swift (SWFT) - .84

Alright that showed a good amount of info, but lets dig a little deeper.  Let's see which company has the highest market cap and if it is because of their price or amount of shares outstanding.

mktcap <- ggplot(data=maindata, aes(x=ticker, y=TOT_MKT_VAL, colour=ticker)) +
 geom_point() +
 theme_bw() +
 theme(panel.grid.major = element_blank(),
 panel.background = element_blank(),
 legend.background=element_rect(fill="white", colour="white") ) +
 labs(title="Total Market Cap for Each Security for the Last 3 Years", x="Ticker",
 y="Total Market Cap")
print(mktcap)
price <- ggplot(data=maindata, aes(x=ticker, y=PX_LAST, colour=ticker))+
 geom_point() +
 theme_bw() +
 theme(panel.grid.major = element_blank(),
 panel.background = element_blank(),
 legend.background=element_rect(fill="white", colour="white") ) +
 labs(title="Price for Each Security for the Last 3 Years", x="Ticker",
 y="Last Price")
print(price)
shares <- ggplot(data=maindata, aes(x=ticker, y=EQY_SH_OUT, colour=ticker))+
 geom_point() +
 theme_bw() +
 theme(panel.grid.major = element_blank(),
 panel.background = element_blank(),
 legend.background=element_rect(fill="white", colour="white") ) +
 labs(title="Amount of Shares for Each Security for the Last 3 Years", x="Ticker",
 y="Amount of Shares Outstanding")
print(shares)

Market Cap

PX_LAST

SharesOut

So what insight can we tell from these graphs?

Market Cap: Union Pacific and UPS have the highest market cap

Price: FedEx has the highest and largest spread with price and Union Pacific and UPS are around similar areas Stock Outstanding: Union Pacific has issued a lot of stock (2nd most out of the group) and CSX is the largest but doesnt show any major significance in the other graphs And lastly, if I wanted to invest a large amount into one of these companies would I be able to do so? And furthermore, how long would it take me based on daily volume?

volume <- ggplot(data=maindata, aes(x=ticker, y=VOLUME, colour=ticker)) +
 geom_point() +
 theme_bw() +
 theme(panel.grid.major = element_blank(),
 panel.background = element_blank(),
 legend.background=element_rect(fill="white", colour="white") ) +
 labs(title="Volume for Each Security for the Last 3 Years", x="Ticker", 
      y="Volume per Day")
print(volume)
Volume

Volume

Wow! Delta certainly has a large outlier based on this data! Let's take a furter look and see when it happened? Maybe it was an earnings call or some special event?

DAL <- subset(maindata, ticker=="DAL US Equity")
 DALvol <- ggplot(data=DAL, aes(x=date, y=VOLUME, colour=ticker))+
 geom_point() +
 theme_bw() +
 theme(panel.grid.major = element_blank(),
 panel.background = element_blank(),
 legend.background=element_rect(fill="white", colour="white"))
 labs(title="Delta's (DAL US Equity) Daily Volume for the Last 3 Years", x="Date",
 y="Volume per Day")
print(DALvol)
Delta (DAL) Volume

Delta (DAL) Volume

hmm.. seems like the bigjump in volume happened on September 10, 2013 - oh thats right! The day before a news article came out saying that Delta was getting re-initiated into the S&P 500!

All data was taken from Bloomberg LP

About Author

Related Articles

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp