[7.3 Prototype] AI‐Powered Spam Malicious Reviews Detection - FEUP-MEIC-DS-2025-26/madeinportugal.store GitHub Wiki

Overview

The Automated Review Spam Detection system is a background service designed to protect seller reputation. It acts as a middleware between the Jumpseller marketplace and the public storefront, intercepting reviews to analyze them for malicious content using AI before they gain visibility.

Purpose

Spam and malicious reviews can unfairly damage a product's reputation, mislead genuine customers, and skew ratings. Manually moderating every single review is time-consuming and inefficient.

Our objective is to build an automated system that proactively identifies and isolates suspicious reviews for moderation, ensuring that only authentic feedback is displayed publicly.

Key Features

Polling: Scheduler triggers the service to query the Jumpseller API for all reviews created or modified.

AI-Powered Scoring: Automated calculation of a "spam score" based on content and metadata, flagging as Safe, Moderation, Spam.

Admin Moderation Queue: A dedicated interface for administrators to review flagged content.

UI Description

Moderation Dashboard

While the detection happens in the background, the Moderation Queue is the visual component. It presents a list of reviews flagged by the AI, displaying the review content, the calculated spam score, and the reasoning provided by the AI.

Technologies

Backend Service

The detection service is built using Python, leveraging the Flask framework. Python was chosen for its robust ecosystem of AI and Data Science libraries, allowing for seamless integration with our chosen analysis models.

AI Integration

The system utilizes ** OpenAI** or Spacy AI or even a ScikitLearn Model AI to analyze text.

Scoring Logic: The AI returns a JSON object containing a spam score and a justification string.

Micro-service Communication

Google Cloud Pub/Sub: Used for asynchronous communication between the Reviews Service and the Spam Detection Service. We utilize a Push Subscription model to trigger our Cloud Run container only when new data is available, ensuring zero latency and efficient resource usage.

Infrastructure

Deployed on Google Cloud Run using Docker containers.

Architecture

System Workflow

The process from review submission to publication will follow these automated steps:

System Workflow (Event-Driven): The process operates in real-time using an event-driven architecture:

  1. Review Creation: A customer submits a review via the frontend (managed by the Reviews Team).

  2. Event Publishing: The Reviews Service publishes a message to the new-review-created Google Pub/Sub topic containing the review content and metadata.

  3. Push Notification: Google Pub/Sub automatically triggers a Push Subscription, sending a POST request to our Spam Detection Service running on Cloud Run.

  4. AI Analysis: For every new review found, the AI model analyzes the content and metadata to calculate a "spam score."

  5. Flagging Decision:

    • Score < Threshold: No action taken.

    • Score > Threshold: The review is flagged.

  6. Manual Moderation: A system administrator reviews the flagged review in a dedicated moderation queue. They can either Approve it (making it public) or Reject it (deleting it).

Sequence Diagram

Project Management

Sprint 1

This sprint focused on establishing the core logic of the system, moving from a prototype to a defined product.

Sprint Overview

In this sprint, our team focused on the following issues from our GitHub Project Board:

336 - Optimize AI integration.

287 - Integrate the spam filtering with JumpSeller

290 - Admin Page review detection integration.

What Went Well: We managed to improve from the prototype resulting in a product that successfully retrieves reviews and correctly evaluates them using the AI model.

What Went Wrong:

  • Time Management: The team struggeled with time management for the tasks.
  • Communication: The team could've communicated more.

Future Work

  • Pub-sub integration for when the reviews are created.
  • Optimize the AI prompts to support multilinguals.