Tutorial_qNetworkDiscrete - sankhaMukherjee/RLalgos GitHub Wiki
This is the
.md
version of this Notebook
cd ../src
/home/sankha/Documents/programs/ML/RLalgos/src
qNetworkDiscrete
This network is a simple sequential network that takes a 1D vector and is able to learn a multi-valued function. This is useful and can act as a discrete Q-Network because, one can think of it as something that takes a 1D state, and returns a Q-value, one for each discrete action. So, lets see this in action:
from lib.agents import qNetwork as qN
import matplotlib.pyplot as plt
import numpy as np
import torch
import torch.nn.functional as F
from tqdm import tqdm_notebook as tqdm
if torch.cuda.is_available():
device = 'cuda:0'
else:
device = 'cpu'
First, let us create some dummy data, and see whether our network is able to detect it ...
X = np.random.rand(1000, 2) - 0.5
Y = np.array([
X[:,0]*2 + X[:,1]*3,
X[:,0]*5 + X[:,1]*6
]).T
print(X.shape, Y.shape)
Xt = torch.as_tensor(X.astype(np.float32)).to(device)
Yt = torch.as_tensor(X.astype(np.float32)).to(device)
(1000, 2) (1000, 2)
Let us create a Q-network and see wheter we are able to represent this function.
network = qN.qNetworkDiscrete(2, 2, layers=[10, 5], activations=[F.tanh, F.tanh], batchNormalization = False, lr=0.01).to(device)
errors = []
for i in tqdm(range(1000)):
y = network.forward( Xt )
network.step(Yt, y)
e = ((y - Yt)**2).mean()
errors.append(e.cpu().detach().numpy())
errors = np.array(errors)
plt.plot(errors)
plt.yscale('log')
plt.xlabel('numbers')
plt.ylabel('MSE')
plt.show()
HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))