Forecasting Cryptocurrency Price Trends
From Wikipedia: "A cryptocurrency is a digital asset designed to work as a medium of exchange that uses strong cryptography to secure financial transactions, control the creation of additional units, and verify the transfer of assets. Cryptocurrencies are a kind of alternative currency and digital currency. Cryptocurrencies use decentralized control as opposed to centralized digital currency and central banking systems. The decentralized control of each cryptocurrency works through distributed ledger technology, typically a blockchain, that serves as a public financial transaction database."
A cryptocurrency's price is mainly influenced by security problems of the blockchain technology, new policies of governments (for regulation or boosting) and public opinion from news and forums. The plot below describes that the price trends of Bitcoin (the first cryptocurrency and the one with largest market capitalization) and the frequencies of key words from news. You can see the period from late 2017 to early 2018 when the price of Bitcoin steeply went up and suddenly collapsed.
The cryptocurrencies are increasingly adopted as a means of payment in real life. Thus, each government has been considering various regulations in different ways.
In case of US, government was basically in the position to regulate cryptocurrencies within the framework of existing financial regulations. As several large asset managers consider investments in the cryptocurrency market, the government is trying to tighten regulations for financial supervision.
In case of Japan, cryptocurrency transactions and exchanges were prevalent during early days of cryptocurrency. So the government has created policies and refined them since 2014. Similarly, Singapore government also has refined policies about cryptocurrency. The government defined the cryptocurrency as “Good purchased product for purchasing goods”. And they created specific tax policy imposing on transactions of cryptocurrencies since 2014.
On the other hand, China and South Korea have considered several regulations such as prohibition of exchange of cryptocurrency and some of the regulations have been implemented. Of course, the complete prohibition was impossible.
The cryptocurrency has not only negative side, but also has positive side. On the positive side, it can be an alternative to the existing financial system. In case of XRP (as known as Ripple), They entered remittance business by taking advantage of the fact that there is no commission fee for oversea transactions.
On the negative side, the cryptocurrency still has security problems. One of the largest exchange, Bithumb, was hacked and 31 million dollars were stolen. Bithumb is an exchange ranked 6 in the world.
1.2 Data Exploration
- Hierarchical Clustering
Visualize the adjacency of top 30+ cryptocurrencies based on the rank of the total market capitalization. Most of the cryptocurrencies among the list have history less than a year. Their daily prices of open-high-low-close (“OHLC”) were collected. Their close price information were used in daily lag and transformed into log difference.Here each cryptocurrency was treated as one vector and all of the log returns among ~290 days of this cryptocurrency were the coordinates of the vector. The horizontal axis indicates the computed maximum distance between clusters: d(u, v) = max(dist(u[i], v[j])) for all points i in cluster u and j in cluster v. The hierarchical structure shows that most of the daily price movements of the selected cryptocurrencies are closely related with Bitcoin (BTC), especially WAVES, Litecoin (LTC) and Ethereum (ETH). Ripple (XRP) presents another distinct small cluster, while Bytecoin (BCN) is far away from all other cryptocurrencies.
Those features show some interesting trends comparing with Dow 30, which shows more distinct market movement clusters based on the different industry background, while cryptocurrencies are mostly lead by BTC. And if we look closer to their background, XRP price is mainly used for remittance and BCN is based on anonymous exchange. The differences among price movements among cryptocurrencies are related to their origins, too.
- K-means & PCA
Visualize the “Bull, Mixed & Bearish States” of 8 long-history cryptocurrencies starting from Aug 2015 with about 3000 date points. Data were collected from Yahoo Finance. The close prices were used and transformed as daily log returns. But this time we treated each of the 8 cryptocurrencies as one feature and the days as samples. Then we apply kmeans (k=3) to all the days to form 3 clusters, where each of the clusters contains the days that show similar market movement (up, down and mixed).
Then the centers (mean) of each of the 3 clusters were computed and their relative distances were evaluated by cosine function: 1- cos(𝞱), taking values in [0, 2]. Here we can see from the heated map that each of the 3 clusters is perfectly collinear with itself (distance = 0), cluster 1 and 2 are mostly anti-collinear (anti-proportional), while cluster 0 is mixed. Then we labeled 3 clusters as D (down), M (mixed) and U (up) based on the corresponding position of the cluster center, then computed the possibilities of D/M/U of the next day based on today’s label. We can conclude that for each case, the prediction of M is always the majority case.Then we continue to use PCA to visualize the division of the 3 clusters as shown above, including the compositions (weights) of the first and second principle components (PC) of each of the 8 features (cryptocurrencies). The scatter plot shows how the ~3000 points (days) in colors are determined by the 1st and 2nd PC. The boundary is not parallel to either axes, indicating both PC’s are import.
The details of both PC’s are shown in bar plots. In both cases BCN has the largest weight, while other features are in similar scale with either same or opposite sign. It is consistent with previous hierarchical results that BCN has distinct market movement from most of the rest and ends up with a special indicator. However, the leading factor, BTC, only shows as the least weighted features in both of the PC’s. It is probably because of the fact that BTC has much longer history than the others do. Therefore has less daily log return fluctuation than those that were just created for the time period selected for the analysis.
2. Classical Time Series Modeling
We chose Bitcoin price to do time series analysis because it has longest history and is the "bellwether" of cryptocurrency market. We firstly applied classical (linear) time serise model, which requires stationary time serise with constant statistical properties (mean, variance, etc...) to make better predictions. Therefore we transformed the original data into log and log difference to examine the stationary property by the p-value of the Dickey-Fuller test. The figure below shows the results for Bitcoin price in monthly average.
The hidden periodicity inside the original and transformed data were decomposed by STL (Seasonal and Trend decomposition using Loess method). The third row shows that there is strong yearly trend inside the Bitcoin price movements. The original data shows highest Dickey Fuller's test. However, further transformations did not continue to reduce the p-value under 0.05 to reject the null hypothesis. Therefore we let the integrate part in ARIMA (autoregressive integrated moving average) model to do the differencing & integrating for us on the log-transformed data.
We also examined the ACF and PACF to see how many days' lag would impact current Bitcoin price. The ACF plot in the left below shows that there is still strong correlation between current price and the one about 30 months ago, and that there are some periodic pattern among those days that have negative correlations. Since long-term effect can hide and incorporated into recent correlations, the PACF in the right reveals the true correlation: There are suprisingly huge correlation between current price and those several years ago. To include the yearly trend and the long-term influence of the time series, we incorporated the "seasonal" factor into our ARIMA model, known as "SARIMAX". Therefore besides the fundamental (p, d, q) for the ARIMA part, there is also a set of (P, D, Q, 12) for the seasonal part (here "12" corresponds to the yearly trend shown in the STL plot since the data used here are monthly average). After grid search we found the optimized parameter of our SARIMAX model: (1, 1, 2) × (0, 1, 1, 12). The four plots below analyze the residual after applying the chosen parameters. There is no long- or short-term trend remaining in the ACF. However, the histogram plot (upper right) shows that the residual is not perfectly normally distributed: it has a long right tail. The plot below shows the comparison between the real and predicted Bitcoin price. The log prices in the middle panel is predicted by the SARIMA model, the upper one is the prices transformed back by exponential, and the bottom one is log return, the signs of which indicate the increase/decrease of the price. The blue shadows indicates the 95% probability band, which literally says that any price is possible for the actual price in the upper panel in the next few months. However, besides the absolute price values, we are also interested in how the model predicts the price increase/decrease. The overall accuracy of the predicted signs of the monthly return is 0.58, with more details included in the table.
The main shortage in (S)ARIMA model is that its predictions are only based on the price of a certain cryptocurrency itself. However, in reality there are many outside factors that can have huge impacts on the Bitcoin price, such as stock indices, market volatility and metal prices. The news, media explosure and people's reaction can also influence/reflect in the cryptocurrency prices. VARMAX model can incorporate different outside factors to improve the predictions. Meanwhile, since Bitcoin is the leading factor among most of the other cryptocurrencies, using Bitcoin itself as an inner factor to help predicting the prices of other cryptocurrencies may also be a useful method.
We applied the VARMAX model to predict Litecoin (LTC) and Bytecoin (BCN) prices with & without using Bitcoin (BTC) as inner factor. The outside factors we use are: BTC volume, S&P 500, Nikkei 225, Stoxx Europe 600, DXSQ, VIX Volatility Index, and the prices of gold and silver. Since VARMAX model does not include the integrate part, we directly used weekly log return as the time series data.
The comparison between the real and predicted weekly log return values for both LTC and BCN are shown in the plots. The overall accuracy of the predicted signs of weekly log return are shown in the left table. Despite the fact that LTC has very similar market movement while BCN doesn't, both of their prediction accuracy are improved after using BTC as an inner factor. However, the improvement is marginal. One possible reason is that we took log return for all the vectors. But for metal prices which do not have significant fluctuations, they may not need log transformations. Therefore both of our VARMAX model and inputs need further careful optimizations.
3. Recurrent Neural Network
Finally, we tried to use recurrent neural network, long short-term memory (LSTM) network specifically, to model the price movement of Bitcoin. We made use of both numerical data (Bitcoin price, volume, international stock index prices, commodity prices, interest rates and CDS, volatility index) and text data (news articles scraped from bitcoin.com and bitcoinist.com). For the numerical data, we took log return to make the scales uniform. For the text data, we used a pretrained module to embed the sentences into 128 dimensional vectors. Then we trained a neural network using the sentence vectors to predict price movement and we extracted the last hidden layer (a 16 dimensional vector) and add it to the numerical data to form the input of the LSTM network.
We considered two different architectures for the LSTM network. The first one considered the entire Bitcoin price history as a long chain: the LSTM network would remember all the intermediate states. The second one only considered rolling windows of a fixed size: the LSTM network would start over with clean states for each window. The graphic representations of the architectures could be found below:
However, our models did not provide meaningful predictions on test set : the models tended to fit the training set very well and failed to generalize to unseen data.
4. Conclusion and Future Work
- For the models to work, we need better feature engineering: knowledge about relationships among the currency prices and various economics indicators are crucial; text data may be helpful: news, forums threads, social media posts etc, which require clever manipulations to be incorporated into the models.
- Classic time series methods, unsupervised learning (clustering,PCA etc) provided good insights. On contrary, deep learning models were not working: this could result from the intrinsic nature of the data, which does not follow the same distribution across the time frame. However, this could also result from our lack of experience in choosing the predictors, models and targets.
- If we had more time, we would do a more thorough research on the subject and tried larger varieties of models that were suggested in the literature.
For interested readers, our codes and notebooks could be found here.