[3.1] Recommendation System - FEUP-MEIC-DS-2025-26/madeinportugal.store GitHub Wiki

Recommendation System

NOTE: This is the documentation relative to the current Recommendation System implementation. If you are looking for the prototype documentation, please go to this page.

Development Guide

  • Instructions: You can find the instructions on how to compile, test and run the project here.
  • Documentation: After successfully running the code locally, you can open http://localhost:8080/api-docs/ to view the OpenAPI documentation in Swagger UI.

Vision

The long-term vision for the Recommendation System is to evolve into a fully integrated and intelligent personalization layer that enhances the shopping experience across the madeinportugal.store platform. Our aim is to transform recommendations into a core capability that drives engagement, significantly enhances product discoverability, and improves user satisfaction. The system will be built on a scalable, cloud-native architecture to support future growth and model complexity.

Overview

The Recommendation System is a microservices-based platform designed to deliver personalized product recommendations to marketplace users. It integrates machine learning capabilities with a scalable architecture to provide intelligent, data-driven product suggestions, and consists of three core components: a machine learning service built with the Surprise library that generates recommendations based on user behavior patterns; a Node.js API backend that serves recommendations through RESTful endpoints; and a React-based micro-frontend module that displays recommended products to users. The system leverages Redis for high-performance caching of recommendation data and PostgreSQL for storing user interaction data such as wishlists and reviews. Additionally, since it is deployed on Google Cloud Platform using containerized infrastructure, the platform is designed to handle recommendations at scale while providing fallback mechanisms that display popular products when personalized recommendations are unavailable.

Purpose

The purpose of the Recommendation System is to enhance the user shopping experience by providing intelligent, personalized product suggestions based on individual user behavior and preferences. By analyzing user interactions including wishlists, reviews, and seller follows, the system aims to surface products that align with each user's interests, increasing engagement and conversion rates on the marketplace. Additionally, by providing a fallback mechanism to popular products, the system ensures that even new users without interaction history receive relevant product recommendations, creating a consistent and valuable experience across the entire user base.

Key Features

  • Dual-Model Machine Learning System: Employs two complementary SVD models, one trained on explicit review ratings and another on implicit order history, which are intelligently merged with a 60/40 weighting to generate up to 50 personalized recommendations per user.
  • High-Performance Redis Caching: Pre-computes and stores personalized recommendations in Redis with per-user keys, enabling sub-millisecond response times and reducing computational overhead on the API layer.
  • Jumpseller API Integration: Features direct synchronization with the Jumpseller API to fetch real-time customer data, product information, reviews, and order history, maintaining up-to-date recommendation models aligned with marketplace inventory.
  • Automated Daily Refresh: Leverages Google Cloud Scheduler to trigger ML model retraining at 2:00 AM daily via Cloud Run Jobs, ensuring recommendations stay current with the latest user interactions.
  • Explainable Recommendations: Generates context-aware, confidence-scored explanations for each recommendation using collaborative filtering patterns, with varied messaging based on predicted rating strength and user history.
  • Production-Ready Frontend: Fully deployed React micro-frontend integrated with the live backend API, featuring responsive grid layouts, dynamic user routing, and real-time recommendation fetching from production Cloud Run services.
  • Cloud-Native Deployment: Fully containerized using Docker with complete Google Cloud Platform deployment via Terraform, including Cloud SQL, Memorystore Redis, Cloud Run services, and Cloud Scheduler jobs for scalable production operations.
  • Simple Intuitive UI: Clean and intuitive interface displaying personalized product recommendations with images, prices, and explanatory tooltips for transparency.

UI Description

Product Recommendation Page

The Product Recommendation Page features a clean, responsive design that displays personalized product suggestions based on the user's activity across both reviews and purchase history. Both the landing page section and the specific page are only visible and acessible if the user is logged in. The page uses dynamic routing with user-specific URLs to fetch and display tailored recommendations for each customer. The main content is organized in a responsive grid layout that adapts from single-column on mobile devices to up to four columns on larger screens, ensuring optimal viewing across all devices.

Each product card displays comprehensive information including the product image, name, price, and an interactive information icon. When users hover over the icon, a tooltip appears showing the explanation for why that specific product was recommended. These explanations provide transparency into the recommendation logic and help build user trust in the system's suggestions.

The frontend is fully integrated with the production backend API deployed on Google Cloud Run, fetching live recommendation data that combines insights from both the review-based and order-based ML models. If a user has insufficient interaction history (fewer than 7 recommendations), the system gracefully handles this by linking the user to the best-selling products landing page section.

Technologies

Web Application

The Recommendation System Page is built as a React-based micro-frontend using Rsbuild and Module Federation. React is a JavaScript library designed for building dynamic user interfaces with its Virtual DOM and component-oriented architecture, allowing dynamic updates without full page reloads. Rsbuild is a high-performance Rspack-based build tool that provides faster compilation times and optimized production builds compared to traditional bundlers like Webpack.

The frontend leverages TypeScript as its primary programming language, which enhances JavaScript with static type checking and analysis. For UI components, we use Material-UI (MUI), a comprehensive React component library that implements Google's Material Design principles, providing pre-built, accessible, and customizable components such as cards, icons, and layouts.

The most distinctive architectural decision is the adoption of Module Federation, a Webpack/Rspack feature that enables micro-frontend architecture. This allows our recommendation page component to be exposed as a federated module that can be dynamically imported and rendered by the main marketplace host application at runtime. This approach provides several advantages: independent deployment cycles, isolated development environments, and the ability to share dependencies like React across different micro-frontends efficiently.

ML Integration

The machine learning component employs a sophisticated dual-model architecture built with Python and the Surprise (Simple Python RecommendatIon System Engine) library. The system trains two separate SVD (Singular Value Decomposition) models in parallel:

  1. Reviews Model: Trained on explicit user feedback with ratings from 1-5 stars, capturing strong preference signals from users who actively review products.
  2. Orders Model: Trained on implicit feedback from purchase history, where each order is converted to an implicit rating of 5.0, capturing behavioral signals even from users who don't leave reviews.

Both models implement advanced data preprocessing, including per-user limits (up to 50 interactions) and cold-start filtering (minimum 2 interactions required). The recommendation generation process produces up to 50 predictions per user, which are then intelligently merged using a weighted combination: 60% weight for review-based predictions and 40% for order-based predictions. This hybrid approach ensures comprehensive coverage across different user segments.

The entire pipeline runs as an automated Cloud Run Job triggered daily at 2:00 AM by Google Cloud Scheduler. Upon completion, all recommendations with their explanations are serialized and stored in Redis with per-user keys for fast retrieval by the API layer.

Microservice Communication

The recommendation service has evolved its integration strategy to support a more comprehensive data pipeline. The API backend integrates directly with the Jumpseller API to synchronize multiple data sources including customer profiles, product catalogs, review ratings, and complete order history with line items. This integration uses HTTP Basic Authentication and provides a dedicated endpoint for on-demand data synchronization, accepting credentials in the request body for flexible deployment configurations.

The synchronization process now captures the full spectrum of user interactions: explicit feedback through the reviews table (customer_id, product_id, rating, reviewed_at) and implicit feedback through the order and order_item tables (order_id, customer_id, product_id, quantity, ordered_at). This comprehensive data collection enables the dual-model ML system to generate more accurate recommendations by analyzing both what users say (reviews) and what they do (purchases).

For future microservice communication, the architecture is designed to support Google Cloud Pub/Sub for event-driven updates.

Infrastructure and Deployment

The system is fully deployed on Google Cloud Platform (GCP) using a comprehensive infrastructure-as-code approach with Terraform, managing all resources including networking, compute, storage, and scheduling components. The deployment architecture consists of three containerized services: the Node.js API backend, the Python ML batch job, and the React frontend, all packaged with Docker and pushed to Google Artifact Registry.

The backend API is deployed as a Cloud Run service (backend-service), providing a fully managed serverless platform that automatically scales based on incoming traffic. The service connects to Cloud SQL (PostgreSQL 16) via private IP networking through a VPC Connector, ensuring secure database access. The caching layer utilizes Google Cloud Memorystore for Redis with LRU eviction policy and 100MB memory limit, storing pre-computed recommendations for instant retrieval.

The ML batch job is deployed as a Cloud Run Job (surprise-job) with automated execution via Google Cloud Scheduler. The scheduler triggers model retraining daily at 2:00 AM UTC (0 2 * * * cron expression), ensuring recommendations stay fresh with the latest user interactions. The job processes both review and order data, trains dual SVD models, merges predictions, and updates Redis cache, all fully automated without manual intervention.

Security is managed through Google Secret Manager, which stores sensitive credentials including database passwords and Redis authentication tokens. Service accounts are configured with least-privilege IAM roles (Cloud SQL Client, Secret Manager Accessor, Cloud Run Invoker) following security best practices. The entire infrastructure is version-controlled in Terraform, enabling reproducible deployments and easy environment management across development, staging, and production.

Architecture

The data flow begins with the React client, which calls the backend API (Recommendation API) running on Cloud Run. For recommendations, the API first checks the Redis cache (Memorystore). If a pre-computed list exists, it is fetched and enriched with product data from the database (Cloud SQL, accessed via a VPC Connector). The recommendations in Redis are generated and periodically updated by a separate ML service (Surprise), also on Cloud Run, ensuring the backend remains focused on request handling and data serving.

Jumpseller is used because it is the standard platform adopted across the system, allowing easy access to the product data required for generating recommendations. The data obtained from Jumpseller is then processed to prepare it for model training. This preprocessing step removes unnecessary columns and formats the information in a way that supports efficient and accurate recommendation generation.

GCP Pub/Sub is incorporated to handle incremental data updates. Instead of repeatedly fetching the full dataset from Jumpseller, Pub/Sub delivers only new or modified data, making the update process significantly more efficient.

Surprise is a Python library designed to generate product recommendations based on user interactions. Because it includes a wide range of built-in models and algorithms, it offers a more efficient solution than developing new recommendation components from scratch. This approach reduces development effort and allows attention to be directed toward other aspects of the project. The Surprise module now implements a dual-model system that trains both a reviews-based model (explicit ratings) and an orders-based model (implicit feedback), intelligently merging their predictions with a 60/40 weighting. The system includes a Cloud Run Job triggered by Cloud Scheduler that trains both models every day at 2:00 AM UTC, processes all active users, generates up to 50 recommendations per user with confidence-scored explanations, and refreshes the data in Redis automatically.

Redis functions as an in-memory cache that is periodically refreshed, ensuring rapid data retrieval and support for high-frequency updates. This makes caching especially valuable when handling negative reviews, which can influence existing recommendation results and may require immediate action. Since Redis is inherently designed for key-value storage and low-latency access, it offers an efficient and lightweight solution perfectly suited for the system's needs, directly enabling the daily update and fast data recovery crucial for the recommendation service.

Architecture Diagram

Our architecture follows the diagram below:

C4 Model Framework

The C4 model framework for our system can be found below and in the main Excalidraw Whiteboard.

Platform Integration

We have made significant progress in platform integration during Sprint 2. The system now fully integrates with the Jumpseller API and Google Cloud Pub/Sub to fetch and synchronize customer data, product information, reviews, and order history. The API provides a comprehensive endpoint that populates the local database with all necessary data for recommendation generation.

Remaining integrations with other microservices include:

  • Product Wishlist [6.4]: To incorporate wishlist data as an additional signal in the recommendation models

Project Management

Sprint 1 Overview

In Sprint 1, our team planned to work on the following issues:

During the sprint, we had to refine the board and remove the issue #323, since we were dependent on the work of another team in order to finalize it. Besides #323, which we are waiting for the other team input, we were able to finish all the planned issues.

Sprint 1 Review

Key takeaways from the sprint review:

  • Well-implemented features, but still using mock data. Next focus should be on integrating with other services using Pub/Sub.
  • For Sprint 2, teams should develop and integrate from the start, as most teams now have working services.

Sprint 2 Overview

Building on Sprint 1 feedback, our team planned to work on the following issues in Sprint 2:

Which we were able to fully complete during the sprint.

Sprint 2 Review

Key takeaways from the sprint review:

  • Features were correctly developed and integrated from the start.
  • Need to increase the number of Integration and Acceptance tests.
  • Development Guide needs improvements to be more clear and better organized (Step-by-Step Instructions in the README.md).

Changelog

The changelog is available in the main repository, here

Sprint Retrospectives

The sprint retrospectives are available in this page

Future Work

  • Wishlist Integration: Incorporate wishlist data from the Product Wishlist service [6.4] as an additional high-value signal in the recommendation models.
  • Not Interested Functionality: Implement a "Not Interested" button, so that users can better personalize their recommended products.
  • Recommendation Diversity: Implement diversity algorithms to ensure recommendations don't become too narrow or repetitive.