NYC Data Science Academy| Blog
Bootcamps
Lifetime Job Support Available Financing Available
Bootcamps
Data Science with Machine Learning Flagship ๐Ÿ† Data Analytics Bootcamp Artificial Intelligence Bootcamp New Release ๐ŸŽ‰
Free Lesson
Intro to Data Science New Release ๐ŸŽ‰
Find Inspiration
Find Alumni with Similar Background
Job Outlook
Occupational Outlook Graduate Outcomes Must See ๐Ÿ”ฅ
Alumni
Success Stories Testimonials Alumni Directory Alumni Exclusive Study Program
Courses
View Bundled Courses
Financing Available
Bootcamp Prep Popular ๐Ÿ”ฅ Data Science Mastery Data Science Launchpad with Python View AI Courses Generative AI for Everyone New ๐ŸŽ‰ Generative AI for Finance New ๐ŸŽ‰ Generative AI for Marketing New ๐ŸŽ‰
Bundle Up
Learn More and Save More
Combination of data science courses.
View Data Science Courses
Beginner
Introductory Python
Intermediate
Data Science Python: Data Analysis and Visualization Popular ๐Ÿ”ฅ Data Science R: Data Analysis and Visualization
Advanced
Data Science Python: Machine Learning Popular ๐Ÿ”ฅ Data Science R: Machine Learning Designing and Implementing Production MLOps New ๐ŸŽ‰ Natural Language Processing for Production (NLP) New ๐ŸŽ‰
Find Inspiration
Get Course Recommendation Must Try ๐Ÿ’Ž An Ultimate Guide to Become a Data Scientist
For Companies
For Companies
Corporate Offerings Hiring Partners Candidate Portfolio Hire Our Graduates
Students Work
Students Work
All Posts Capstone Data Visualization Machine Learning Python Projects R Projects
Tutorials
About
About
About Us Accreditation Contact Us Join Us FAQ Webinars Subscription An Ultimate Guide to
Become a Data Scientist
    Login
NYC Data Science Acedemy
Bootcamps
Courses
Students Work
About
Bootcamps
Bootcamps
Data Science with Machine Learning Flagship
Data Analytics Bootcamp
Artificial Intelligence Bootcamp New Release ๐ŸŽ‰
Free Lessons
Intro to Data Science New Release ๐ŸŽ‰
Find Inspiration
Find Alumni with Similar Background
Job Outlook
Occupational Outlook
Graduate Outcomes Must See ๐Ÿ”ฅ
Alumni
Success Stories
Testimonials
Alumni Directory
Alumni Exclusive Study Program
Courses
Bundles
financing available
View All Bundles
Bootcamp Prep
Data Science Mastery
Data Science Launchpad with Python NEW!
View AI Courses
Generative AI for Everyone
Generative AI for Finance
Generative AI for Marketing
View Data Science Courses
View All Professional Development Courses
Beginner
Introductory Python
Intermediate
Python: Data Analysis and Visualization
R: Data Analysis and Visualization
Advanced
Python: Machine Learning
R: Machine Learning
Designing and Implementing Production MLOps
Natural Language Processing for Production (NLP)
For Companies
Corporate Offerings
Hiring Partners
Candidate Portfolio
Hire Our Graduates
Students Work
All Posts
Capstone
Data Visualization
Machine Learning
Python Projects
R Projects
About
Accreditation
About Us
Contact Us
Join Us
FAQ
Webinars
Subscription
An Ultimate Guide to Become a Data Scientist
Tutorials
Data Analytics
  • Learn Pandas
  • Learn NumPy
  • Learn SciPy
  • Learn Matplotlib
Machine Learning
  • Boosting
  • Random Forest
  • Linear Regression
  • Decision Tree
  • PCA
Interview by Companies
  • JPMC
  • Google
  • Facebook
Artificial Intelligence
  • Learn Generative AI
  • Learn ChatGPT-3.5
  • Learn ChatGPT-4
  • Learn Google Bard
Coding
  • Learn Python
  • Learn SQL
  • Learn MySQL
  • Learn NoSQL
  • Learn PySpark
  • Learn PyTorch
Interview Questions
  • Python Hard
  • R Easy
  • R Hard
  • SQL Easy
  • SQL Hard
  • Python Easy
Data Science Blog > Capstone > Studying Data to Forecast Cryptocurrency Price Trends

Studying Data to Forecast Cryptocurrency Price Trends

Jinsoo Kim, Fangye Shi and Xu Huang
Posted on Sep 24, 2018
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

1. Introduction

Data from Wikipedia says: "A cryptocurrency is a digital asset designed to work as a medium of exchange that uses strong cryptography to secure financial transactions, control the creation of additional units, and verify the transfer of assets. Cryptocurrencies are a kind of alternative currency and digital currency. Cryptocurrencies use decentralized control as opposed to centralized digital currency and central banking systems. The decentralized control of each cryptocurrency works through distributed ledger technology, typically a blockchain, that serves as a public financial transaction database."

A cryptocurrency's price is mainly influenced by security problems of the blockchain technology, new policies of governments (for regulation or boosting) and public opinion from news and forums. The plot below describes that the price trends of Bitcoin (the first cryptocurrency and the one with largest market capitalization) and the frequencies of key words from news. You can see the period from late 2017 to early 2018 when the price of Bitcoin steeply went up and suddenly collapsed. 

Studying Data to Forecast Cryptocurrency Price Trends1.1 Market Environment

The cryptocurrencies are increasingly adopted as a means of payment in real life. Thus, each government has been considering various regulations in different ways.

In case of US, government was basically in the position to regulate cryptocurrencies within the framework of existing financial regulations. As several large asset managers consider investments in the cryptocurrency market, the government is trying to tighten regulations for financial supervision.

In case of Japan, cryptocurrency transactions and exchanges were prevalent during early days of cryptocurrency. So the government has created policies and refined them since 2014. Similarly, Singapore government also has refined policies about cryptocurrency. The government defined the cryptocurrency as โ€œGood purchased product for purchasing goodsโ€. And they created specific tax policy imposing on transactions of cryptocurrencies since 2014.

On the other hand, China and South Korea have considered several regulations such as prohibition of exchange of cryptocurrency and some of the regulations have been implemented. Of course, the complete prohibition was impossible.

Studying Data to Forecast Cryptocurrency Price TrendsMarket Capitalization

Currently, the market capitalization of cryptocurrency is 199 billion dollars. (for comparison, the 2017 US Defense Budget was 590 billion dollars)

Studying Data to Forecast Cryptocurrency Price Trends

The cryptocurrency has not only negative side, but also has positive side. On the positive side, it can be an alternative to the existing financial system. In case of XRP (as known as Ripple), They entered remittance business by taking advantage of the fact that there is no commission fee for oversea transactions.

On the negative side, the cryptocurrency still has security problems. One of the largest exchange, Bithumb, was hacked and 31 million dollars were stolen. Bithumb is an exchange ranked 6 in the world.

1.2 Data Exploration

  • Features
    Studying Data to Forecast Cryptocurrency Price Trends

  • Hierarchical Clustering

  • Visualize the adjacency of top 30+ cryptocurrencies based on the rank of the total market capitalization. Most of the cryptocurrencies among the list have history less than a year. Their daily prices of open-high-low-close (โ€œOHLCโ€) were collected. Their close price information were used in daily lag and transformed into log difference.Here each cryptocurrency was treated as one vector and all of the log returns among ~290 days of this cryptocurrency were the coordinates of the vector.
  • The horizontal axis indicates the computed maximum distance between clusters: d(u, v) = max(dist(u[i], v[j])) for all points i in cluster u and j in cluster v. The hierarchical structure shows that most of the daily price movements of the selected cryptocurrencies are closely related with Bitcoin (BTC), especially WAVES, Litecoin (LTC) and Ethereum (ETH). Ripple (XRP) presents another distinct small cluster, while Bytecoin (BCN) is far away from all other cryptocurrencies.

Findings

Those features show some interesting trends comparing with Dow 30, which shows more distinct market movement clusters based on the different industry background, while cryptocurrencies are mostly lead by BTC. And if we look closer to their background, XRP price is mainly used for remittance and BCN is based on anonymous exchange. The differences among price movements among cryptocurrencies are related to their origins, too.

  • K-means & PCA

  • Visualize the โ€œBull, Mixed & Bearish Statesโ€ of 8 long-history cryptocurrencies starting from Aug 2015 with about 3000 date points. Data were collected from Yahoo Finance. The close prices were used and transformed as daily log returns. But this time we treated each of the 8 cryptocurrencies as one feature and the days as samples. Then we apply kmeans (k=3) to all the days to form 3 clusters, where each of the clusters contains the days that show similar market movement (up, down and mixed).

Then the centers (mean) of each of the 3 clusters were computed and their relative distances were evaluated by cosine function: 1- cos(๐žฑ), taking values in [0, 2].

Here we can see from the heated map that each of the 3 clusters is perfectly collinear with itself (distance = 0), cluster 1 and 2 are mostly anti-collinear (anti-proportional), while cluster 0 is mixed. Then we labeled 3 clusters as D (down), M (mixed) and U (up) based on the corresponding position of the cluster center, then computed the possibilities of D/M/U of the next day based on todayโ€™s label.

Observations

We can conclude that for each case, the prediction of M is always the majority case.Then we continue to use PCA to visualize the division of the 3 clusters as shown above, including the compositions (weights) of the first and second principle components (PC) of each of the 8 features (cryptocurrencies). The scatter plot shows how the ~3000 points (days) in colors are determined by the 1st and 2nd PC. The boundary is not parallel to either axes, indicating both PCโ€™s are import.

The details of both PCโ€™s are shown in bar plots. In both cases BCN has the largest weight, while other features are in similar scale with either same or opposite sign. It is consistent with previous hierarchical results that BCN has distinct market movement from most of the rest and ends up with a special indicator.

However, the leading factor, BTC, only shows as the least weighted features in both of the PCโ€™s. It is probably because of the fact that BTC has much longer history than the others do. Therefore has less daily log return fluctuation than those that were just created for the time period selected for the analysis.

2. Classical Time Series Modeling

2.1 SARIMA 

We chose Bitcoin price to do time series analysis because it has longest history and is the "bellwether" of cryptocurrency market. We firstly applied classical (linear) time serise model, which requires stationary time serise with constant statistical properties (mean, variance, etc...) to make better predictions. Therefore we transformed the original data into log and log difference to examine the stationary property by the p-value of the Dickey-Fuller test. The figure below shows the results for Bitcoin price in monthly average.


The hidden periodicity inside the original and transformed data were decomposed by STL (Seasonal and Trend decomposition using Loess method). The third row shows that there is strong yearly trend inside the Bitcoin price movements. The original data shows highest Dickey Fuller's test.

However, further transformations did not continue to reduce the p-value under 0.05 to reject the null hypothesis. Therefore we let the integrate part in ARIMA (autoregressive integrated moving average) model to do the differencing & integrating for us on the log-transformed data.

Examining ACF and PACF

We also examined the ACF and PACF to see how many days' lag would impact current Bitcoin price. The ACF plot in the left below shows that there is still strong correlation between current price and the one about 30 months ago, and that there are some periodic pattern among those days that have negative correlations. Since long-term effect can hide and incorporated into recent correlations, the PACF in the right reveals the true correlation: There are suprisingly huge correlation between current price and those several years ago. To include the yearly trend and the long-term influence of the time series, we incorporated the "seasonal" factor into our ARIMA model, known as "SARIMAX". Therefore besides the fundamental (p, d, q) for the ARIMA part, there is also a set of (P, D, Q, 12) for the seasonal part (here "12" corresponds to the yearly trend shown in the STL plot since the data used here are monthly average).

Residuals After Applying Parameters

After grid search we found the optimized parameter of our SARIMAX model: (1, 1, 2) ร— (0, 1, 1, 12). The four plots below analyze the residual after applying the chosen parameters. There is no long- or short-term trend remaining in the ACF. However, the histogram plot (upper right) shows that the residual is not perfectly normally distributed: it has a long right tail. 

Comparison Between Real and Predicted Bitcoin Price

The plot below shows the comparison between the real and predicted Bitcoin price. The log prices in the middle panel is predicted by the SARIMA model, the upper one is the prices transformed back by exponential, and the bottom one is log return, the signs of which indicate the increase/decrease of the price. The blue shadows indicates the 95% probability band, which literally says that any price is possible for the actual price in the upper panel in the next few months. However, besides the absolute price values, we are also interested in how the model predicts the price increase/decrease. The overall accuracy of the predicted signs of the monthly return is 0.58, with more details included in the table.

2.2 VARMAX

The main shortage in (S)ARIMA model is that its predictions are only based on the price of a certain cryptocurrency itself. However, in reality there are many outside factors that can have huge impacts on the Bitcoin price, such as stock indices, market volatility and metal prices. The news, media explosure and people's reaction can also influence/reflect in the cryptocurrency prices. VARMAX model can incorporate different outside factors to improve the predictions.

Meanwhile, since Bitcoin is the leading factor among most of the other cryptocurrencies, using Bitcoin itself as an inner factor to help predicting the prices of other cryptocurrencies may also be a useful method.
We applied the VARMAX model to predict Litecoin (LTC) and Bytecoin (BCN) prices with & without using Bitcoin (BTC) as inner factor. The outside factors we use are: BTC volume, S&P 500, Nikkei 225, Stoxx Europe 600, DXSQ, VIX Volatility Index, and the prices of gold and silver. Since VARMAX model does not include the integrate part, we directly used weekly log return as the time series data.

Findings

The comparison between the real and predicted weekly log return values for both LTC and BCN are shown in the plots. The overall accuracy of the predicted signs of weekly log return are shown in the left table. Despite the fact that LTC has very similar market movement while BCN doesn't, both of their prediction accuracy are improved after using BTC as an inner factor. However, the improvement is marginal.

One possible reason is that we took log return for all the vectors. But for metal prices which do not have significant fluctuations, they may not need log transformations. Therefore both of our VARMAX model and inputs need further careful optimizations.

3. Recurrent Neural Network

Finally, we tried to use recurrent neural network, long short-term memory (LSTM) network specifically, to model the price movement of Bitcoin. We made use of both numerical data (Bitcoin price, volume, international stock index prices, commodity prices, interest rates and CDS, volatility index) and text data (news articles scraped from bitcoin.com and bitcoinist.com). For the numerical data, we took log return to make the scales uniform.

For the text data, we used a pretrained module to embed the sentences into 128 dimensional vectors. Then we trained a neural network using the sentence vectors to predict price movement and we extracted the last hidden layer (a 16 dimensional vector) and add it to the numerical data to form the input of the LSTM network.

We considered two different architectures for the LSTM network. The first one considered the entire Bitcoin price history as a long chain: the LSTM network would remember all the intermediate states. The second one only considered rolling windows of a fixed size: the LSTM network would start over with clean states for each window. The graphic representations of the architectures could be found below:


However, our models did not provide meaningful predictions on test set : the models tended to fit the training set very well and failed to generalize to unseen data.

4. Conclusion and Future Work

  • For the models to work, we need better feature engineering: knowledge about relationships among the currency prices and various economics indicators are crucial; text data may be helpful: news, forums threads, social media posts etc, which require clever manipulations to be incorporated into the models.
  • Classic time series methods, unsupervised learning (clustering,PCA etc) provided good insights. On contrary, deep learning models were not working: this could result from the intrinsic nature of the data, which does not follow the same distribution across the time frame. However, this could also result from our lack of experience in choosing the predictors, models and targets.
  • If we had more time, we would do a more thorough research on the subject and tried larger varieties of models that were suggested in the literature.

For interested readers, our codes and notebooks could be found here.

About Authors

Jinsoo Kim

Jinsoo Kim has an engineer career in steel manufacturing. While working for POSCO, he has participated in the Smart Factory project. He holds a bachelor's degree in mechanical engineering from Inha University, South Korea.
View all posts by Jinsoo Kim >

Fangye Shi

Fangye graduated from Indiana University at Bloomington with PhD in mathematics. He loves solving problems and learning new things in the rapid growing field like data science!
View all posts by Fangye Shi >

Xu Huang

Xu Huang got PhD in Computational Chemistry from University of Iowa and B.S. in Chemistry from Peking University. Her study includes developing & testing the computational code to improve the accuracy for the modeling of battery material &...
View all posts by Xu Huang >

Leave a Comment

Cancel reply

You must be logged in to post a comment.

No comments found.

View Posts by Categories

All Posts 2399 posts
AI 7 posts
AI Agent 2 posts
AI-based hotel recommendation 1 posts
AIForGood 1 posts
Alumni 60 posts
Animated Maps 1 posts
APIs 41 posts
Artificial Intelligence 2 posts
Artificial Intelligence 2 posts
AWS 13 posts
Banking 1 posts
Big Data 50 posts
Branch Analysis 1 posts
Capstone 206 posts
Career Education 7 posts
CLIP 1 posts
Community 72 posts
Congestion Zone 1 posts
Content Recommendation 1 posts
Cosine SImilarity 1 posts
Data Analysis 5 posts
Data Engineering 1 posts
Data Engineering 3 posts
Data Science 7 posts
Data Science News and Sharing 73 posts
Data Visualization 324 posts
Events 5 posts
Featured 37 posts
Function calling 1 posts
FutureTech 1 posts
Generative AI 5 posts
Hadoop 13 posts
Image Classification 1 posts
Innovation 2 posts
Kmeans Cluster 1 posts
LLM 6 posts
Machine Learning 364 posts
Marketing 1 posts
Meetup 144 posts
MLOPs 1 posts
Model Deployment 1 posts
Nagamas69 1 posts
NLP 1 posts
OpenAI 5 posts
OpenNYC Data 1 posts
pySpark 1 posts
Python 16 posts
Python 458 posts
Python data analysis 4 posts
Python Shiny 2 posts
R 404 posts
R Data Analysis 1 posts
R Shiny 560 posts
R Visualization 445 posts
RAG 1 posts
RoBERTa 1 posts
semantic rearch 2 posts
Spark 17 posts
SQL 1 posts
Streamlit 2 posts
Student Works 1687 posts
Tableau 12 posts
TensorFlow 3 posts
Traffic 1 posts
User Preference Modeling 1 posts
Vector database 2 posts
Web Scraping 483 posts
wukong138 1 posts

Our Recent Popular Posts

AI 4 AI: ChatGPT Unifies My Blog Posts
by Vinod Chugani
Dec 18, 2022
Meet Your Machine Learning Mentors: Kyle Gallatin
by Vivian Zhang
Nov 4, 2020
NICU Admissions and CCHD: Predicting Based on Data Analysis
by Paul Lee, Aron Berke, Bee Kim, Bettina Meier and Ira Villar
Jan 7, 2020

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day ChatGPT citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay football gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income industry Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI

NYC Data Science Academy

NYC Data Science Academy teaches data science, trains companies and their employees to better profit from data, excels at big data project consulting, and connects trained Data Scientists to our industry.

NYC Data Science Academy is licensed by New York State Education Department.

Get detailed curriculum information about our
amazing bootcamp!

Please enter a valid email address
Sign up completed. Thank you!

Offerings

  • HOME
  • DATA SCIENCE BOOTCAMP
  • ONLINE DATA SCIENCE BOOTCAMP
  • Professional Development Courses
  • CORPORATE OFFERINGS
  • HIRING PARTNERS
  • About

  • About Us
  • Alumni
  • Blog
  • FAQ
  • Contact Us
  • Refund Policy
  • Join Us
  • SOCIAL MEDIA

    ยฉ 2025 NYC Data Science Academy
    All rights reserved. | Site Map
    Privacy Policy | Terms of Service
    Bootcamp Application