Contributed by Peggy Sobolewski. Peggy took our Data Science with R - Beginner Level with Vivian Zhang in June 2014. This post was based on her final project submission
Project: The main purpose of this project is to analyze my universe to create a portfolio. Through R I will analyze 22 equities and compare the equities' returns, market cap, price, shares outstanding and volume in a graphical representation. First step in starting this project is to download the required packages:
The next step after properly install the necessary packages is to download the data from the Bloomberg API.
conn <- blpConnect()
After establishing a connection with the Bloomberg API, we now get to get the data we require for our securities. (One quick side note - to get the equities that I want to examine I ran an EQS (Equity Screening) function on a Bloomberg Terminal using screens including: 1) only primary securities, 2) country of domicile: USA (only wanted domestic companies), 3) market cap of 1 billion (relatively large cap only), 4) price > 1 (allows to eliminate any recent public companies [IPOs]), and finally 5) a daily volume traded greater than .5 million (so theres available liquidity in the equity).
#now adding those securities that came up through the screening into R through Rbbg
#defining both securities and fields for the bdh (bloomberg historical data) function
securities <- c("ALK US Equity", "DAL US Equity", "JBLU US Equity", "LUV US Equity", "SAVE US Equity",
"UAL US Equity", "CHRW US Equity", "EXPD US Equity", "FDX US Equity", "HUBG US Equity",
"UPS US Equity", "UTIW US Equity", "XPO US Equity", "CSX US Equity",
"KSU US Equity", "NSC US Equity", "UNP US Equity", "CAR US Equity", "CNW US Equity",
"HTZ US Equity", "JBHT US Equity")
fields <- c("PX_LAST", "TOT_MKT_VAL", "VOLATILITY_90D", "EQY_SH_OUT", "VOLUME")
allsecurities <- bdh(conn, securities, fields, Sys.Date()-730, always.display.tickers=TRUE,include.non.trading.days=FALSE, dates.as.row.names=FALSE)
#using Sys.Date()-730 instead of start and end date that way it can be continous to
#whenever re-run the code (always updated) - dated three years back (730 trading days)
After getting the data into R, I want to add a column to the original data,frame of returns from PX_LAST (closing price) so that comparing the price will be normalized.
all <- c(NA,diff(log(allsecurities$PX_LAST)))
maindata <- transform(allsecurities, returns=all)
head(maindata) #making sure it went through - it did!
Next step: lets make sure the data looks good -
head(maindata)
tail(maindata)
dim(maindata)
#makes sense since there are 22 securities and we have the data going back a little over 18 #months
names(maindata)
summary(maindata) #helpful to see low and high of all the variables (gives a good idea of w #here values will fall)
str(maindata)
class(maindata) #just to verify it is a data.frame
sapply(maindata, class) #everything is right except date
#(ticker=character, PX_LAST, TOT_MKT_VAL and VOL are all numeric, but date appears as a character, we need it as its proper form = date)
maindata$date <- as.Date(maindata$date)
#lets try sapply(maindata, class) one more time to verify
sapply(maindata, class) #worked perfectly!
So now that the data is loaded and looks good - were on to the fun part! Examing it! What should we do first?? Lets see if theres any corelation in the stocks by breaking them down into their GICS subindustries (for my purpose those will be: 1) air freight and logistics, 2) airlines, 3) railroads, and 4) trucking
#freight and logistics
freight_logistics <- c("CHRW US Equity", "EXPD US Equity", "FDX US Equity",
"HUBG US Equity", "UPS US Equity", "UTIW US Equity", "XPO US Equity")
frlo <- bdh(conn, freight_logistics, fields, Sys.Date()-730, always.display.tickers=TRUE, include.non.trading.days=FALSE, dates.as.row.names=FALSE)
head(frlo)
frloreturns <- c(NA,diff(log(frlo$PX_LAST)))
freightlogistics <- transform(frlo, returns=frloreturns)head(freightlogistics)
fl.data <- melt(freightlogistics,id=c("ticker","date"))
head(fl.data)
unique(fl.data$variable)
rfl.data <- cast(subset(fl.data,variable=="returns"),date~ticker, sum)summary(rfl.data)
chart.Correlation(rfl.data)
I then proceeded to use the same format of code to create all the correlation plots. Lets see what they produced:
Lets see which equities are closely correlated: Freight & Logistics:
HubGroup (HUBG) & UTI Worldwide (UTIW) - .97
FedEx (FDX) & UPS (UPS) - .96
HubGroup (HUBG) & FedEx (FDX) - .90
Airlines:
Delta (DAL) & JetBlue (JBLU) - .97
Delta (DAL) & United (UAL) - .95
JetBlue (JBLU) & United (UAL) - .93
Railroads:
Norfolk Southern (NSC) & Union Pacific (UNP) - .92
CSX (CSX) & Norfolk Sout (NSC) - .78
Kansas City Southern (KSU) & Union Pacific (UNP) - .74
Trucking:
Hertz (HTZ) & Swift (SWFT) - .93
JB Hunt (JBHT) & Swift (SWFT) - .84
Con-Way (CNW) & Swift (SWFT) - .84
Alright that showed a good amount of info, but lets dig a little deeper. Let's see which company has the highest market cap and if it is because of their price or amount of shares outstanding.
mktcap <- ggplot(data=maindata, aes(x=ticker, y=TOT_MKT_VAL, colour=ticker)) +
geom_point() +
theme_bw() +
theme(panel.grid.major = element_blank(),
panel.background = element_blank(),
legend.background=element_rect(fill="white", colour="white") ) +
labs(title="Total Market Cap for Each Security for the Last 3 Years", x="Ticker",
y="Total Market Cap")
print(mktcap)
price <- ggplot(data=maindata, aes(x=ticker, y=PX_LAST, colour=ticker))+
geom_point() +
theme_bw() +
theme(panel.grid.major = element_blank(),
panel.background = element_blank(),
legend.background=element_rect(fill="white", colour="white") ) +
labs(title="Price for Each Security for the Last 3 Years", x="Ticker",
y="Last Price")
print(price)
shares <- ggplot(data=maindata, aes(x=ticker, y=EQY_SH_OUT, colour=ticker))+
geom_point() +
theme_bw() +
theme(panel.grid.major = element_blank(),
panel.background = element_blank(),
legend.background=element_rect(fill="white", colour="white") ) +
labs(title="Amount of Shares for Each Security for the Last 3 Years", x="Ticker",
y="Amount of Shares Outstanding")
print(shares)
So what insight can we tell from these graphs?
Market Cap: Union Pacific and UPS have the highest market cap
Price: FedEx has the highest and largest spread with price and Union Pacific and UPS are around similar areas Stock Outstanding: Union Pacific has issued a lot of stock (2nd most out of the group) and CSX is the largest but doesnt show any major significance in the other graphs And lastly, if I wanted to invest a large amount into one of these companies would I be able to do so? And furthermore, how long would it take me based on daily volume?
volume <- ggplot(data=maindata, aes(x=ticker, y=VOLUME, colour=ticker)) +
geom_point() +
theme_bw() +
theme(panel.grid.major = element_blank(),
panel.background = element_blank(),
legend.background=element_rect(fill="white", colour="white") ) +
labs(title="Volume for Each Security for the Last 3 Years", x="Ticker",
y="Volume per Day")
print(volume)
Volume
Wow! Delta certainly has a large outlier based on this data! Let's take a furter look and see when it happened? Maybe it was an earnings call or some special event?
DAL <- subset(maindata, ticker=="DAL US Equity")
DALvol <- ggplot(data=DAL, aes(x=date, y=VOLUME, colour=ticker))+
geom_point() +
theme_bw() +
theme(panel.grid.major = element_blank(),
panel.background = element_blank(),
legend.background=element_rect(fill="white", colour="white"))
labs(title="Delta's (DAL US Equity) Daily Volume for the Last 3 Years", x="Date",
y="Volume per Day")
print(DALvol)
Delta (DAL) Volume
hmm.. seems like the bigjump in volume happened on September 10, 2013 - oh thats right! The day before a news article came out saying that Delta was getting re-initiated into the S&P 500!