[7.2] Image Search Service - FEUP-MEIC-DS-2025-26/madeinportugal.store GitHub Wiki

Image Search Service

The Image Search Service enables users to search for visually similar products by uploading an image instead of typing keywords. This is particularly useful because users may have seen a product online or in a store and wish to find a similar item but do not know the exact name. This service provides a seamless experience by intelligently comparing the uploaded image with those in the database, returning the most relevant matches.

Vision

Our goal is to empower users with a natural and intuitive way to discover products through visual search, removing the barriers of inaccurate keyword queries or vague descriptions. By leveraging advanced AI-driven image embeddings and efficient vector similarity search, the service ensures fast, relevant, and scalable product retrieval. This enhances user engagement and satisfaction by delivering precise matches tailored to their visual input.

Key Features

Image-based Product Search: Upload any product image and retrieve visually similar products without needing exact keywords.
Update Scheduler: Continuously sync new and removed product images from the Jumpseller API, ensuring the search index is always fresh.
Multi-API Access: Support both JSON REST API and gRPC endpoints to integrate with various client applications and services.
User-friendly Frontend: Interactive Next.js React app styled with Tailwind CSS that enables easy image uploads and displays search results dynamically.

How to use

For instructions on how to start and use the service, please refer to the README file in the source code.

Page UI

The interface is designed to be simple and user-friendly, featuring an image input that supports drag-and-drop as well as traditional file selection. Users can specify the number of top results to display using a dedicated input field and initiate the search with a clear search button. Upon clicking the search button, the page enters a loading state that displays traditional Portuguese symbols as a culturally engaging visual indicator. Once the search completes, results are shown in a grid layout where each item includes a similarity score represented by a colored progress bar that transitions from green (high similarity) to red (low similarity), providing users with an immediate and intuitive sense of how closely each product matches the uploaded image.

page Figure 1: Image Search Page

Architecture Overview

Image search is divided into two services: the Update Scheduler and the actual Search Service. The update scheduler is responsible for fetching new, unseen images from the Jumpseller API and updating the image search database, whereas the search service handles the API endpoints which allow the user to query the system with an image and receive the closest match. A detailed description of these processes is described in the following sections.

Figure 2: Architectural Diagram

The diagram shows the architecture of the prototype-the Update Scheduler depicted in blue and the Search Service in red. The PostgreSQL DB and the VertexAI API are common points of access to both processes.

Update Scheduler

The update scheduler is responsible for monitoring changes in the Jumpseller API and ensuring that the image search PostgreSQL database is up to date. For instance, when a seller adds a new image to a product, the image search DB must reflect this change. Conversely, when a product image is removed from the website, the image search DB must also delete the entry, as it is no longer valid for search.

To this end, the scheduler periodically queries all images from the Jumpseller API and compare them to those in the image search database. When a new image is found, it uses the VertexAI API on Google Cloud to generate its vector embedding (using a multimodal AI model) and then stores the entry in the PostgreSQL DB hosted at Neon. The update frequency of this service is configurable.

Search Service

This is the core service as it exposes a JSON and gRPC endpoint to allow for image search functionality. When a request is issued to one of these endpoints with an input image, the process first computes the embedding of the image using the VertexAI API. It then queries the PostgreSQL DB to calculate the Euclidean distance between the embedding of the input image and all stored images, ultimately returning the closest matches, i.e., the most similar images.

Class Diagram

The UML class diagram of the PostgreSQL database is depicted below. The embeddings are in pgvector form with an index to optimize similarity queries.

Figure 3: Class diagram

C4 Diagram

Our C4 diagram is in Product Discovery section https://excalidraw.com/#room=ea9b8e8ab91ec41aa71d,unD1AesOM0sSfVzioxfF7g

CI/CD Workflow

Continuous Integration (CI)

Runs on pushes and pull requests to the main branch.
Checks out code, sets up Python, installs dependencies.
Executes tests using pytest.
Performs linting with flake8 and pylint.
Automatically formats code with black and isort, committing changes if needed.

Continuous Deployment (CD)

Runs on pushes to the main branch.
Checks out code and creates a timestamp-based version tag.
Authenticates with Google Cloud and Docker registry.
Builds and pushes Docker images for:
- JSON API
- gRPC API
- Cron Job
- Frontend
- PubSub Worker
(All images tagged with latest and version tag)
Runs Terraform to import existing infrastructure and apply updates using the new image versions.
Deploys updated services to Cloud Run and manages related infrastructure.

Figure 4: Deployment Architecture Diagram

Image Search API Documentation

This document describes two APIs for image similarity search: a JSON REST API and a gRPC API. Both accept an image input and return visually similar products ranked by similarity.

JSON API

Endpoint: `POST /search-image/`

Description:
Upload an image and receive the top-k visually similar products.

Parameters:

image (file, required): Image file (JPG, PNG, WEBP).
top_k (int, optional): Number of top results to return (default: 5).
min_similarity (float, optional): Minimum similarity threshold between 0 and 1 (default: 0).

Response:

{
  "results": [
    {
      "product": {
        "id": 123,
        "name": "string",
        "categories": [
          { "id": 1, "name": "Category Name" }
        ],
        "description": "Detailed product description"
      },
      "image": {
        "id": 456,
        "url": "string",
        "position": 0
      },
      "similarity": 0.95
    }
  ]
}

Endpoint: `POST /search-text/`

Description:
Submit a text query and receive the top-k visually similar products.

Parameters:

text (string, required): Search query text.
top_k (int, optional): Number of top results to return (default: 5).
min_similarity (float, optional): Minimum similarity threshold between 0 and 1 (default: 0).

Response:

{
  "results": [
    {
      "product": {
        "id": 123,
        "name": "string",
        "categories": [
          { "id": 1, "name": "Category Name" }
        ],
        "description": "Detailed product description"
      },
      "image": {
        "id": 456,
        "url": "string",
        "position": 0
      },
      "similarity": 0.95
    }
  ]
}

gRPC API

Service: `ImageSearchService`

Method: `SearchImage`

Request:

image_data (bytes, optional): Raw image bytes.
text (string, optional): Text query.
top_k (int32): Number of top results (default: 5).
min_similarity (float): Minimum similarity threshold (0 to 1, default: 0).

Note: Either image_data or text must be provided (oneof).

Response:

results (repeated SearchResult): List of search results.

Messages:

Message	Fields
`SearchImageRequest`	`oneof query { image_data (bytes), text (string) }`, `top_k (int32)`, `min_similarity (float)`
`SearchImageResponse`	`results` (repeated `SearchResult`)
`SearchResult`	`image` (Image), `product` (Product), `similarity` (float)
`Image`	`id` (int64), `url` (string), `position` (int32)
`Product`	`id` (int64), `name` (string), `categories` (repeated Category), `description` (string)
`Category`	`id` (int64), `name` (string)

Pub/Sub API

Clients send SearchImageRequest protobuf messages containing either raw image bytes or a text query, along with parameters like top_k and min_similarity, to the image-search-requests Pub/Sub topic.

The worker service processes these requests, performs similarity search, and publishes SearchImageResponse messages with ranked results to the image-search-replies topic.

Each request and response includes a request_id for correlation.

Request (`SearchImageRequest`)

syntax = "proto3";
package imagesearch;

message SearchImageRequest {
  string request_id = 1;

  oneof query {
    bytes image_data = 2;
    string text = 5;
  }

  int32 top_k = 3;
  float min_similarity = 4;
}

Response (`SearchImageResponse`)

syntax = "proto3";

message SearchImageResponse {
  string request_id = 1;
  repeated SearchResult results = 2;
}

Technology Stack

Languages

Python (backend)
TypeScript (frontend)

Backend

FastAPI (JSON API)
gRPC (remote procedure calls)
PostgreSQL with pgvector extension (database & vector embeddings)
Google Cloud Vertex AI API (image embedding generation)
Jumpseller Product Image API (product images)

Frontend

Next.js (React framework with TypeScript)
Tailwind CSS (styling)

Infrastructure & Tooling

Docker (containerization of backend services and jobs)
Terraform (infrastructure as code for cloud resource management)
Google Cloud Pub/Sub (asynchronous messaging for image and text search requests and responses)

Project Management

Sprint 1

Goal

Our main goal for sprint 1 was to integrate pub sub and to fix input verification.

Initial Plan

#282 - Product Database Updates and System reliance improvement to support categories and description
#280 - Image Upload Validation
#292 - PubSub Integration
#281 - Image Processing and Similarity Search Improvement

What actually got done

We were able to complete all issues except for #281, which was postponed to the next sprint due to time constraints. The current search functionality is already working well, so this issue was not a priority.

Sprint 1 Retrospective

What Went Well

Code reviews showed clear improvement compared to Sprint 0.
The Pub/Sub system was successfully implemented and integrated.
Image validation was completed, resolving previous related issues.
Other prototypes integrated smoothly with our Pub/Sub system.
Communication with other teams was productive, including coordination outside our class.

What Went Wrong

Some planned tasks were not completed, such as improving the search distance metric and enhancing search relevance.
Team communication was strong in the first week but declined later due to overlapping deadlines from other courses.
There was noticeable reliance on certain team members, which created bottlenecks when they were unavailable.

What Is Still a Problem

A few technical improvements, particularly related to search relevance and system refinement, remain incomplete.
Some integration-related issues and enhancements were postponed and still require attention.
The workload distribution problem persists, with uneven task ownership continuing to pose a risk for future sprints.

Sprint 2

Goal

Our main goal for sprint 2 was to add new arguments to image search request such as min similarity and text input.

Initial Plan

#409 - Fix Cron Job Failure Error in Image Search
#407 - Implement similarity filter
#404 - Avoid cold starts
#405 - Image search via text

What actually got done

All initially planned issues were completed Screenshot from 2025-11-29 14-10-53

Sprint 2 Retrospective

What Went Well

Productivity increased significantly compared to Sprint 1. The team successfully completed all planned issues, showing stronger coordination and efficiency.
Communication improved within the team and with other groups, resulting in quicker decision-making and smoother integrations.
The team handled external factors better, such as delays caused by other course deliverables or dependencies. These no longer caused major bottlenecks.
Workload distribution was more balanced, reducing reliance on specific members and improving overall progress.

What Went Wrong

Despite better handling, external academic workload still caused occasional delays and scheduling conflicts.
Some tasks required additional clarification due to evolving requirements, though they were resolved more quickly than in previous sprints.

What Is Still a Problem

External dependencies and overlapping course demands remain a potential source of disruption.
Certain system areas still need refinement, particularly regarding search performance and relevance.

Future Work

Explore and research different similarity metrics for vector search to improve search accuracy and relevance.

[7.2] Image Search Service - FEUP-MEIC-DS-2025-26/madeinportugal.store GitHub Wiki

Image Search Service

Vision

Key Features

How to use

Page UI

Architecture Overview

Update Scheduler

Search Service

Class Diagram

C4 Diagram

CI/CD Workflow

Continuous Integration (CI)

Continuous Deployment (CD)

Image Search API Documentation

JSON API

Endpoint: POST /search-image/

Endpoint: POST /search-text/

gRPC API

Service: ImageSearchService

Method: SearchImage

Messages:

Pub/Sub API

Request (SearchImageRequest)

Response (SearchImageResponse)

Technology Stack

Languages

Backend

Frontend

Infrastructure & Tooling

Other Solutions

Project Management

Sprint 1

Goal

Initial Plan

What actually got done

Sprint 1 Retrospective

What Went Well

What Went Wrong

What Is Still a Problem

Sprint 2

Goal

Initial Plan

What actually got done

Sprint 2 Retrospective

What Went Well

What Went Wrong

What Is Still a Problem

Future Work

Endpoint: `POST /search-image/`

Endpoint: `POST /search-text/`

Service: `ImageSearchService`

Method: `SearchImage`

Request (`SearchImageRequest`)

Response (`SearchImageResponse`)