Sprint 3 - RobertoLF/EC601Project GitHub Wiki

Data from CryptoCompare API

Below is a snapshot of the data output from the cryptoCompare API. The column definitions are as follows:

Time: Date of recording
High: Highest price at which the cryptocurrency traded during time period.
Low: Lowest price at which the cryptocurrency traded during time period.
Open: Price at the start of the time period.
VolumeFrom: The number of crypto coins traded for US dollars.
VolumeTo: The number of dollars traded (for the period) for the crypto coin.
Close: Price at the end of the time period.

Long Short-Term Memory (LSTM) Neural Network

What is LSTM?

Long Short-Term Memory neural networks are a type of recurrent neural network (RNN), which are capable of learning long-term dependencies. Unlike traditional feed-forward neural networks which only allow signals to move from input to output, recurrent neural networks introduce feedback loops to the neural networks. This allows information to persist short term in the system and be used to make more complex decisions.

The downside to traditional RNNs is only recent information can reliably be used to make the next prediction. As the gap between relevant information in the neural network grows, the RNN struggles to connect the information. Long short-term memory neural networks solve this problem! LSTMs are made to remember information for a long time.

Pros

Longer information retention should allow for more accurate predictions, even in large/complex neural networks and datasets.
Information can be added and dropped from memory based on logic gates.

Cons

Take longer to train.
Require more memory to train.
LSTMS are prone to overfitting data.
Dropout is harder to implement.

Building Machine Learning Model for Price Prediction

Libraries Used

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import mean_absolute_error
from keras.models import Sequential
from keras.layers import Activation, Dense, Dropout, LSTM

Keras is a deep learning API written in Python, running on top of the open-source machine learning platform TensorFlow (developed by Google).

Train and test data

# Splits dataset into 80% train, 20% test
def train_test_split(df, test_size=0.2):
    split_row = len(df) - int(test_size * len(df))
    train_data = df.iloc[:split_row]
    test_data = df.iloc[split_row:]
    return train_data, test_data

train, test = train_test_split(df, test_size=0.2)

Building LSTM Model

def build_lstm_model(input_data, output_size, neurons=100, activ_func='linear',
                     dropout=0.2, loss='mse', optimizer='adam'):
    model = Sequential() # Initializes our sequential model
    model.add(LSTM(neurons, input_shape=(input_data.shape[1], input_data.shape[2])))
    model.add(Dropout(dropout))
    model.add(Dense(units=output_size))
    model.compile(loss=loss, optimizer=optimizer)
    model.summary()
    return model

Price Prediction Results

Mean Absolute Error = 0.03433

Sprint 4 Objectives

Evaluate the effect of various time frames and number of data points on the model.
Implement forecasting to the model to predict future prices that have not occurred yet.
Compare the accuracy of the predictions for various cryptocurrencies. Do certain cryptos have better accuracy? If so, why?