Analytics Pipeline - G33-Moviles-2026-1/Wiki GitHub Wiki

Overview

Our analytics pipeline follows a lightweight collect → transform → serve → visualize architecture.

Flow:

Application (event collection)FastAPI backend (event ingestion + analytics endpoints)PostgreSQL operational tablesanalytics transformations / aggregation logicPower BI dashboards

We did not build a separate data warehouse, event bus, or scheduled ETL job.
Instead, we used the existing backend, operational database, and analytics endpoints to compute the business answers required by the project.


1. Data Collection Layer

Analytics data is collected directly from the mobile application.

Event source

The App frontend sends POST /analytics/events requests whenever key user actions occur.

Events implemented

We instrumented analytics for the following business questions:

  • Screen browsing behavior
    • open_screen_timestamp
  • Filters usage
    • home_filters_opened
  • Bookings / booking purpose
    • booking_created
  • Schedule import funnel
    • schedule_import_step

Example tracked fields

Each analytics event contains fields such as:

  • session_id
  • device_id
  • user_email
  • event_name
  • screen
  • duration_ms
  • props_json

The props_json field stores event-specific metadata, for example:

  • schedule import method (ics, manual)
  • schedule import step (started, file_selected, completed, failed, abandoned)
  • booking purpose
  • building
  • time window
  • filter selections

2. Storage Layer

All events are stored in the existing backend database using the analytics events table managed by the FastAPI backend.

Storage characteristics

  • We used the same operational PostgreSQL database as the application backend.
  • We did not create a separate analytics warehouse.
  • We did not use batch jobs, cron jobs, Kafka, or external streaming tools.

This means analytics is built on top of the same source of truth already used by the application.


3. Transformation / Aggregation Layer

Instead of exporting raw events to an external BI warehouse, we implemented the transformation logic inside the backend.

Strategy used

The backend exposes analytics-focused GET endpoints that aggregate raw events and return already processed data for visualization.

Implemented analytics endpoints

GET /analytics/screen-time-stats

Computes the total number of seconds spent on each screen.

Used to answer:

Where in the app do students spend the most time browsing?

Output example:

  • screen
  • total_seconds

GET /analytics/events?event_name=...

Returns filtered raw analytics events by event type.

Used to support:

  • filters analysis
  • booking purpose analysis
  • custom Power BI transformations

Examples used in the project:

  • event_name=home_filters_opened
  • event_name=booking_created

GET /analytics/schedule-import-funnel

Computes the funnel for schedule import behavior.

Used to answer:

What is the most common way users upload/import their schedule (ICS/PDF file, manual), and which method has the highest drop-off by step?

Returned metrics include:

  • most_common_method
  • highest_dropoff_method
  • per-method funnel steps
  • users reached per step
  • drop-off from previous step

4. Serving Layer

The backend acts as the analytics serving layer.

Instead of letting Power BI query raw database tables directly, Power BI consumes backend endpoints that already return structured analytics results.

Why this design was used

This approach gave us:

  • simpler Power BI integration
  • less transformation logic inside Power BI
  • reuse of backend business logic
  • consistency with the application’s architecture

5. Visualization Layer

We used Power BI as the dashboard and reporting layer.

Power BI data sources

Power BI connects to the backend API endpoints such as:

  • /analytics/screen-time-stats
  • /analytics/schedule-import-funnel
  • /analytics/events?event_name=home_filters_opened
  • /analytics/events?event_name=booking_created

Dashboards / visualizations created

A. Screen Time Dashboard

Business question:

Where in the app do students spend the most time browsing?

Visuals:

  • summary table by screen
  • bar chart of total seconds per screen

B. Filters Usage Dashboard

Business question:

Which filters (place, time, utilities, close to me) are applied most often?

Visuals:

  • filter usage summary
  • bar chart of most frequently applied filters

C. Booking Tags Dashboard

Business question:

Which are the most used tags in bookings by building and time window?

Visuals:

  • grouped bar chart by building
  • tag legend
  • time-window slicer

D. Schedule Import Funnel Dashboard

Business question:

What is the most common way users upload/import their schedule (ICS/PDF file, manual), and which method has the highest drop-off by step?

Visuals:

  • drop-off percentage by step and method
  • users started vs. completed by method
  • funnel / progression by step
  • cards for:
    • most common method
    • method with highest drop-off

6. Architecture Rationale

This pipeline was chosen because it matched the current project constraints and implementation status.

Benefits

  • fast to implement
  • easy to connect with Power BI
  • no need for a separate analytics warehouse
  • uses the existing backend and database
  • supports multiple business questions with the same stack

Trade-offs

  • not a full enterprise analytics architecture
  • no event streaming
  • no dedicated data warehouse
  • no scheduled ETL layer
  • transformations are partly embedded in backend analytics endpoints

Even with these trade-offs, the solution is valid for the project because it supports real event collection, real backend aggregation, and real BI visualization.


Analytics Pipeline Diagram