An Exploration of FX

Daniel Chen
Posted on Dec 13, 2018

Capstone Project - You will use what you have learned in the bootcamp, including but not limited to exploratory data analysis, shiny, web scraping, machine learning, and apply them to solve a real-life data science problem.

Visit my project here:

News articles web-scraped from Reuters:

Jan 2007 - Dec 2018 Tick data downloaded with Tickstory Lite:


     The trading rooms are the trenches where the battle is joined, where each trader confronts the market, customers, competitors, and other players, and where each institution plays out its fundamental business strategy and sees it succeed or fail. A winning strategy and a sound battle plan are essential, and teamwork - with each trader being aware of the actions of others in the group and of developments in related markets - is of enormous importance to success.
- Cross, Sam Y. "Chapter Seven - How Dealers Conduct Foreign Exchange Operations." All about the Foreign Exchange Market in the United States. New York: Federal Reserve Bank of New York, 1998.

The battle between the bulls and the bears is a diurnal affair. Whether you are traveling abroad to Europe or purchasing a Japanese import car, you are participating in the global currency exchange market. I was a retail trader for the majority of my working life. When I was first starting out, I had the feeling that the markets had a personal agenda against me when I traded. This, among many other misconceptions that new traders have, can be better understood with facts and data science.

The data set I had been wanting to use since classes started was the minute data of nine currency pairs: Europe, Japan, Canada, Switzerland, Australia, Canada, New Zealand, Sweden, and Gold valued against the US Dollar. The data spans from Jan 1, 2007 to Dec 1, 2018. That's 4,461,334 rows of observations, 15+ GB of data just sitting in my hard drive. Now that classes are nearing completion, I have the skill set needed to examine the data. Let's dive in and hope the water isn't frozen.

The Fundamentals

     If you know the enemy and know yourself, you need not fear the result of a hundred battles. If you know yourself but not the enemy, for every victory gained you will also suffer a defeat. If you know neither the enemy nor yourself, you will succumb in every battle.
- Sun Tzu, The Art of War. 5th century BC.

While you can start trading with no prior knowledge, maybe based only your good gut feelings, a nice place to start as any would be to understand the fundamentals. How do major news events and government policies affect the foreign exchange market? To provide visual assistance I thought it suitable to have a graph of major currencies annotated with major news events. To do this I went to the Reuters archives of their foreign exchange analysis headlines (UK Reuters has a larger archive dating back to May 17, 2010).

Using Scrapy in Python I web scraped all 10,200 articles:

When you scroll through the chart, the titles of all Reuters articles published on that day are shown in a text box. On average there were four articles a day but on Jan 15, 2015 there were 36 news articles. It was a Thursday and the Swiss National Bank had suddenly announced that they were dropping their currency cap (floor) of 1.2 francs per euro. The value of the franc soared nearly 30 percent at one point (USDCHF fell like a knife). Brokers went bust, and when your clients' accounts drop to the negatives, which isn't supposed to happen with margin calls and stop out fail-safes in place, how are you to recuperate what the clients owe when they might reside halfway across the globe? You don't. Maybe the Swiss National Bank should have announced on a Sunday while markets were closed?

You can also see what happens before and after a news event is announced, is it as they say, "Buy the rumor, sell the news"?

Nowadays there are apps which analyze real-time news events and trade based on article sentiment. Remember the time when president Donald Trump tweeted that the US Dollar was 'getting too strong' and that a strong dollar 'will hurt ultimately' the US? This was back in April 2017. I remember the markets eerily quiet during the hours leading up to the tweet, when EURUSD suddenly spiked upwards; people were selling the dollar and buying euros. I knew something had happened.

Announcements regarding the US dollar are generally left to the Fed!

For NLP sentiment and topic analysis I used AWS Comprehend, a service offered by Amazon Web Services. I uploaded the topics into an S3 storage bucket and the results were returned to me minutes later. This was highly effective and it only cost me $2 USD.

Currency Correlations

 It is a general truism of this world that anything long divided will surely unite, and anything long united will surely divide.
- Romance of the Three Kingdoms. 1522.

I explored the correlations between the directional movement of different currency pairs. Below is a 2007-2018 scatter plot of USDCHF and EURUSD minute bar closing prices. The value of USDCHF reads as "one US Dollar is valued at 1.2345 Swiss Francs" while EURUSD is "one Euro is worth 1.2345 US Dollars".

If this was a Rorschach test I would say, "That's a dragon." Or I'd say it's reminiscent of the Bernoulli random walk plotted on a chart. But it's not easy to tell the correlation between currency pairs over such a long time frame. They do walk in tandem though:

Looking at only the extracted 2014 values of USDCHF and EURUSD we see a real correlation. Sometimes they are more divided:

Technical Indicator Correlations

Like a soldier, the carpenter sharpens his own tools. He carries his equipment in his tool box, and works under the direction of his foreman. He makes columns and girders with an axe, shapes floorboards and shelves with a plane, cuts fine openwork and carvings accurately, giving as excellent a finish as his skill will allow. This is the craft of the carpenters.
- Miyamoto Musashi, The Book of Five Rings. 1645.

A technical indicator is just a tool and they come in many varieties and uses. I don't use them much but my favorite ones would have to be from J. Welles Wilder Jr.

Here are the correlation plots for different technical indicators calculated from EURUSD minute bar data:

This is the data that I will use to predict the pip movement (range) of EURUSD minute bars later on. Range is the distance between High and Low for each bar. Body, TopWick, and BotWick are the measurements of the minute candlestick features. Force is Volume*Range. ROC, MOM, ATR, and BB (Bollinger Bands) are more technical indicators with possible predictive values for range. BBrng is the difference taken of the upper and lower Bollinger Bands.

I'd like to add that range as specified here does not distinguish between up and down bars. Through a statistical analysis of 4,161,800 minute bars, the percent of Up Bars is 50.05132%. While this is very close to the 50/50 distribution stated by the random walk hypothesis, we can disprove this null hypothesis with a significant p-value of 0.0363 using binom.test in R:

So to this end, whether to buy or sell EURUSD should be at the trader's discretion. These variables will not be used to predict market direction. My aim is to only predict by how much price will move (range) based on different variables collected. And I collected 162 variables over the nine currency pairs. For fun, here is what the correlation plot looks like:

Minute Bar Range

“A man got to have a code. - Omar Little”
- Michael Lewis, Flash Boys: A Wall Street Revolt. 2014.

Now would be a good time to take a look at our response variable Y, the minute bar range. Below is a histogram for the absolute values of EURUSD minute bar pip movement (High - Low) from 2007-2018:

If there is one takeaway from this project, for me it would be this graph. I expected the movement per minute for EURUSD to be low, within 5 pips on average. It is. A pip (basis point) is FX terminology for a standardized unit measurement of change in a currency pair, in the case of EURUSD 1 pip is located at the ten-thousandths place (0.0001) of a quoted price. However, I would have assumed the histogram to be smooth, which it is visibly not. The spikes in the histogram occur every half pip (0.00005) and it shows that for each minute bar the price moves in these increments more likely than it is to move, say, 0.9 pips (0.00009) compared to it moving one pip (0.0001).

This occurs across other currency pairs as well:

For USDJPY, one pip is measured by the second decimal place (0.01) in a quoted price.

Below we have a histogram of what I'm calling the real range (up and down movements are differentiated) of USDCHF:

I don't know the ins and outs of major financial institutions, but definitely something is causing the occurrence of perfectly incremented spikes. Algorithmic trading and HFT systems with their limit orders perhaps? Pretty amazing.

Support and Resistance

     We may distinguish six kinds of terrain, to wit: (1) Accessible ground; (2) entangling ground; (3) temporizing ground; (4) narrow passes; (5) precipitous heights; (6) positions at a great distance from the enemy. Ground which can be freely traversed by both sides is called accessible...
- Sun Tzu, "Chapter 10: Terrain". The Art of War.

Support and resistance lines are not arbitrary, sometimes they are precise to the very pip, but sometimes they represent a price range where bulls (buyers) and bears (sellers) tussle. I calculated resistance points from the most common Minute Bar High values within a range, and support values from the most commonly occurring Minute Bar Low price values. Here is a chart just for reference:

Support becomes resistance and resistance lines become support lines. Of course, prices which go through the same price range over the years will have more occurrences of Highs and Lows being recorded within those ranges. Still, it is something to take into account when seeing prices break and drop or rocket.

Time Series Analysis

     TIME is the most important factor in determining market movements and by studying the past records of the averages or individual stocks you will be able to prove for yourself that history does repeat and that by knowing the past you can tell the future. There is a definite relation between TIME and PRICE. …
- WD Gann

For time series analysis I used the daily price data (price close) for 9 currency pairs from 2007-2018. I took the inverse values of EURUSD, XAUUSD, GBPUSD, AUDUSD, and NZDUSD so that the prices reflect an amount worth for one USD. So now 1 USD is worth 0.75 EUR, etc. Using a hierarchical clustering technique in Python allowed more insight as to how currencies cluster. Would you have guessed that they cluster based on geography and the global trade interactions of different countries?

Furthermore, using a Markov Chain we can see the probability of tomorrow being an Up, Down, or Mixed day depending on what kind of day it was for the USD today:

So now, through time series analysis we at least know the probability which a certain type of future price movement occurs based on current price movement. I won't be tempted to predict direction with machine learning in the following section.

Range Prediction with Machine Learning

Having access to an AWS EC2 instance helped immensely with the following computations. For the nine currency pairs I had a total of 162 variables altogether, with 4,461,334 rows of observations for each pair.

I lagged all the predictor variables by one row, and all 162 variables used were numerical. A pair such as XAUUSD (Gold) only trades for 23 hours a day as opposed to the 24 hours the global currency market is open, so for all rows with missing data I dropped the rows directly. I did not remove any outliers, but I did take the log1p transformation of all skewed (skew>0.75) variables. I also multiplied the response variable Range of EURUSD by 10,000, this was to have the value read as 2.3 for pip range instead of it being in the original ten-thousandths format of 0.00023.

I then trained six different ML models:
1. Random Forest Regressor to select important variables. The result was that each of the 162 variables had a positive importance so I picked the top 50 for a set I use later on. Here are the Random Forest results on the training set's Train and Test sets:

2. XGBoost model

3+4. Ridge Regression and Lasso Regression models on an "all variables" set.
5+6. Ridge Regression and Lasso Regression models on a "50 selected variables" set with the features selected by the Random Forest Regressor.

The Ridge Regression (with cross validation) for both sets of data returned to me an optimal alpha which was not penalized (alpha = 0), meaning that the best fit was just an Ordinary Least Squares Regression model!

On the other hand, the Lasso Regularization models both returned a Prediction vs True Values plot that was within a narrow band of ~20 pips:

The Lasso Regularization calculations for the "all variables" data set had taken too long to return an optimal alpha. In the end I chose a small alpha manually and the results were still viable.

Even though the Random Forest Regressor and XGBoost models were more accurate, I still chose to combine all the different models to make one final prediction test.

These are the final results of my Capstone Project, let's see how well I used machine learning algorithms to deduce the minute by minute range of EURUSD:

Thanks for reading and I wish you all good fortune!

     My Boston lawyer was so successful that his currencies became little friends. He crooned to them like a witch to her cats. "My Swissies," he would call his Swiss francs. "My Swissies were thirty-five cents when I bought them. My Swissies are backed by gold, and the inflation rate in Switzerland is lower than anywhere else. Then my Swissies were forty-five cents, and then fifty-five, and then sixty, all the time everyone else was holding ugly old dollars. Come, little darlings, seventy cents is easy; you can do it, my pets, you can do a dollar by year's end."

He had a little more trouble with his Deutsche marks; that is, he had trouble getting on an intimate basis with them. Deutschers? Deutschies? He was never quite as comfortable with the diminutive there as he was with the Swissies; sometimes he had to call them "DM." But the DM were good to him, too. His Netherlands guilders were more like cousins, but against the dollar even they looked good.
- Adam Smith, Paper Money. 1981.

The actual project can be visited at

For actual code, all my work is stored at my GitHub

About Author

Daniel Chen

Daniel Chen

Daniel Chen is the founder of multiple startups including foreign exchange brokerages. Managed cross-border development with international companies from Europe and Asia. Has even more experience now with data science and coding. Born in 1987's Los Angeles, California...
View all posts by Daniel Chen >

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp