Analytics Pipeline - G33-Moviles-2026-1/Wiki GitHub Wiki

Overview

Our analytics pipeline follows a lightweight collect → transform → serve → visualize architecture.

Flow:

Application (event collection) → FastAPI backend (event ingestion + analytics endpoints) → PostgreSQL operational tables → analytics transformations / aggregation logic → Power BI dashboards

We did not build a separate data warehouse, event bus, or scheduled ETL job.
Instead, we used the existing backend, operational database, and analytics endpoints to compute the business answers required by the project.

1. Data Collection Layer

Analytics data is collected directly from the mobile application.

Event source

The App frontend sends POST /analytics/events requests whenever key user actions occur.

Events implemented

We instrumented analytics for the following business questions:

Screen browsing behavior
- open_screen_timestamp
Filters usage
- home_filters_opened
Bookings / booking purpose
- booking_created
Schedule import funnel
- schedule_import_step

Example tracked fields

Each analytics event contains fields such as:

session_id
device_id
user_email
event_name
screen
duration_ms
props_json

The props_json field stores event-specific metadata, for example:

schedule import method (ics, manual)
schedule import step (started, file_selected, completed, failed, abandoned)
booking purpose
building
time window
filter selections

2. Storage Layer

All events are stored in the existing backend database using the analytics events table managed by the FastAPI backend.

Storage characteristics

We used the same operational PostgreSQL database as the application backend.
We did not create a separate analytics warehouse.
We did not use batch jobs, cron jobs, Kafka, or external streaming tools.

This means analytics is built on top of the same source of truth already used by the application.

3. Transformation / Aggregation Layer

Instead of exporting raw events to an external BI warehouse, we implemented the transformation logic inside the backend.

Strategy used

The backend exposes analytics-focused GET endpoints that aggregate raw events and return already processed data for visualization.

Implemented analytics endpoints

`GET /analytics/screen-time-stats`

Computes the total number of seconds spent on each screen.

Used to answer:

Where in the app do students spend the most time browsing?

Output example:

screen
total_seconds

`GET /analytics/events?event_name=...`

Returns filtered raw analytics events by event type.

Used to support:

filters analysis
booking purpose analysis
custom Power BI transformations

Examples used in the project:

event_name=home_filters_opened
event_name=booking_created

`GET /analytics/schedule-import-funnel`

Computes the funnel for schedule import behavior.

Used to answer:

What is the most common way users upload/import their schedule (ICS/PDF file, manual), and which method has the highest drop-off by step?

Returned metrics include:

most_common_method
highest_dropoff_method
per-method funnel steps
users reached per step
drop-off from previous step

4. Serving Layer

The backend acts as the analytics serving layer.

Instead of letting Power BI query raw database tables directly, Power BI consumes backend endpoints that already return structured analytics results.

Why this design was used

This approach gave us:

simpler Power BI integration
less transformation logic inside Power BI
reuse of backend business logic
consistency with the application’s architecture

5. Visualization Layer

We used Power BI as the dashboard and reporting layer.

Power BI data sources

Power BI connects to the backend API endpoints such as:

/analytics/screen-time-stats
/analytics/schedule-import-funnel
/analytics/events?event_name=home_filters_opened
/analytics/events?event_name=booking_created

Dashboards / visualizations created

A. Screen Time Dashboard

Business question:

Where in the app do students spend the most time browsing?

Visuals:

summary table by screen
bar chart of total seconds per screen

B. Filters Usage Dashboard

Business question:

Which filters (place, time, utilities, close to me) are applied most often?

Visuals:

filter usage summary
bar chart of most frequently applied filters

C. Booking Tags Dashboard

Business question:

Which are the most used tags in bookings by building and time window?

Visuals:

grouped bar chart by building
tag legend
time-window slicer

D. Schedule Import Funnel Dashboard

Business question:

What is the most common way users upload/import their schedule (ICS/PDF file, manual), and which method has the highest drop-off by step?

Visuals:

drop-off percentage by step and method
users started vs. completed by method
funnel / progression by step
cards for:
- most common method
- method with highest drop-off

6. Architecture Rationale

This pipeline was chosen because it matched the current project constraints and implementation status.

Benefits

fast to implement
easy to connect with Power BI
no need for a separate analytics warehouse
uses the existing backend and database
supports multiple business questions with the same stack

Trade-offs

not a full enterprise analytics architecture
no event streaming
no dedicated data warehouse
no scheduled ETL layer
transformations are partly embedded in backend analytics endpoints

Even with these trade-offs, the solution is valid for the project because it supports real event collection, real backend aggregation, and real BI visualization.

Analytics Pipeline - G33-Moviles-2026-1/Wiki GitHub Wiki

Overview

1. Data Collection Layer

Event source

Events implemented

Example tracked fields

2. Storage Layer

Storage characteristics

3. Transformation / Aggregation Layer

Strategy used

Implemented analytics endpoints

GET /analytics/screen-time-stats

GET /analytics/events?event_name=...

GET /analytics/schedule-import-funnel

4. Serving Layer

Why this design was used

5. Visualization Layer

Power BI data sources

Dashboards / visualizations created

A. Screen Time Dashboard

B. Filters Usage Dashboard

C. Booking Tags Dashboard

D. Schedule Import Funnel Dashboard

6. Architecture Rationale

Benefits

Trade-offs

Analytics Pipeline Diagram

`GET /analytics/screen-time-stats`

`GET /analytics/events?event_name=...`

`GET /analytics/schedule-import-funnel`