Home - beekbin/RQ-VAE-Recommender GitHub Wiki
Welcome to the RQ-VAE-Recommender wiki!
Overview
It has some related techniques:
- Variational Autoencoder (VAE)
- Residual-Quantized Variational AutoEncoder (RQ-VAE)
Variational Autoencoder (VAE)
Jim Wang's blog about VAE gives an excellent explanation of VAE. Following content is from Jim's blog
A Variational Autoencoder (VAE) is a type of generative model in machine learning that is used to learn a compressed representation of data while also being able to generate new, synthetic data points. VAEs are a form of autoencoder, but they incorporate probabilistic inference, allowing them to generate new samples by sampling from a learned latent distribution.
class VAE(nn.Module):
def __init__(self):
super(VAE, self).__init__()
# Encoder: Input -> Hidden layers
self.fc1 = nn.Linear(28*28, 400) # Flattened input
self.fc21 = nn.Linear(400, 20) # Mean of the latent space
self.fc22 = nn.Linear(400, 20) # Log-variance of the latent space
# Decoder: Latent space -> Output
self.fc3 = nn.Linear(20, 400)
self.fc4 = nn.Linear(400, 28*28) # Output size
def encode(self, x):
h1 = F.relu(self.fc1(x))
return self.fc21(h1), self.fc22(h1) # Mean and log-variance
def reparameterize(self, mu, logvar):
std = torch.exp(0.5 * logvar) # Standard deviation
eps = torch.randn_like(std) # Sample epsilon
return mu + eps * std # Reparameterization trick
def decode(self, z):
h3 = F.relu(self.fc3(z))
return torch.sigmoid(self.fc4(h3))
def forward(self, x):
mu, logvar = self.encode(x.view(-1, 28*28)) # Flatten input
z = self.reparameterize(mu, logvar)
return self.decode(z), mu, logvar
# VAE Loss Function
def loss_function(recon_x, x, mu, logvar):
BCE = F.binary_cross_entropy(recon_x, x.view(-1, 28*28), reduction='sum')
# KL Divergence term for regularizing the latent prediction
KLD = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())
return BCE + KLD
Explanation:
- Encoder: Takes the input data (28x28 images) and passes it through fully connected layers to produce mean (mu) and log-variance (logvar) for the latent space representation.
- Reparameterization: A random noise (eps) is sampled and combined with the mean and variance to generate a latent vector z (using the reparameterization trick).
- Decoder: The latent vector z is passed through the decoder to reconstruct the image.
- Loss Function : The loss consists of a reconstruction loss (how well the VAE reconstructs the input) and a KL divergence term (regularizing the latent space to be close to a unit Gaussian).
KL divergence between two univariate Gaussians
- Training: The model is trained using backpropagation, with Adam as the optimizer.
Residual-Quantized Variational AutoEncoder (RQ-VAE)
According to google's paper:Recommender Systems with Generative Retrieval, the Residual-Quantized Variational AutoEncoder (RQ-VAE) is a multi-level vector quantizer that applies quantization on residuals to generate a tuple of codewords (aka Semantic IDs in the google's paper). The Autoencoder is jointly trained by updating the quantization codebook and the DNN encoder-decoder parameters.