Analytics Pipeline - G33-Moviles-2026-1/Wiki GitHub Wiki
Overview
Our analytics pipeline follows a lightweight collect → transform → serve → visualize architecture.
Flow:
Application (event collection) → FastAPI backend (event ingestion + analytics endpoints) → PostgreSQL operational tables → analytics transformations / aggregation logic → Power BI dashboards
We did not build a separate data warehouse, event bus, or scheduled ETL job.
Instead, we used the existing backend, operational database, and analytics endpoints to compute the business answers required by the project.
1. Data Collection Layer
Analytics data is collected directly from the mobile application.
Event source
The App frontend sends POST /analytics/events requests whenever key user actions occur.
Events implemented
We instrumented analytics for the following business questions:
- Screen browsing behavior
open_screen_timestamp
- Filters usage
home_filters_opened
- Bookings / booking purpose
booking_created
- Schedule import funnel
schedule_import_step
Example tracked fields
Each analytics event contains fields such as:
session_iddevice_iduser_emailevent_namescreenduration_msprops_json
The props_json field stores event-specific metadata, for example:
- schedule import method (
ics,manual) - schedule import step (
started,file_selected,completed,failed,abandoned) - booking purpose
- building
- time window
- filter selections
2. Storage Layer
All events are stored in the existing backend database using the analytics events table managed by the FastAPI backend.
Storage characteristics
- We used the same operational PostgreSQL database as the application backend.
- We did not create a separate analytics warehouse.
- We did not use batch jobs, cron jobs, Kafka, or external streaming tools.
This means analytics is built on top of the same source of truth already used by the application.
3. Transformation / Aggregation Layer
Instead of exporting raw events to an external BI warehouse, we implemented the transformation logic inside the backend.
Strategy used
The backend exposes analytics-focused GET endpoints that aggregate raw events and return already processed data for visualization.
Implemented analytics endpoints
GET /analytics/screen-time-stats
Computes the total number of seconds spent on each screen.
Used to answer:
Where in the app do students spend the most time browsing?
Output example:
screentotal_seconds
GET /analytics/events?event_name=...
Returns filtered raw analytics events by event type.
Used to support:
- filters analysis
- booking purpose analysis
- custom Power BI transformations
Examples used in the project:
event_name=home_filters_openedevent_name=booking_created
GET /analytics/schedule-import-funnel
Computes the funnel for schedule import behavior.
Used to answer:
What is the most common way users upload/import their schedule (ICS/PDF file, manual), and which method has the highest drop-off by step?
Returned metrics include:
most_common_methodhighest_dropoff_method- per-method funnel steps
- users reached per step
- drop-off from previous step
4. Serving Layer
The backend acts as the analytics serving layer.
Instead of letting Power BI query raw database tables directly, Power BI consumes backend endpoints that already return structured analytics results.
Why this design was used
This approach gave us:
- simpler Power BI integration
- less transformation logic inside Power BI
- reuse of backend business logic
- consistency with the application’s architecture
5. Visualization Layer
We used Power BI as the dashboard and reporting layer.
Power BI data sources
Power BI connects to the backend API endpoints such as:
/analytics/screen-time-stats/analytics/schedule-import-funnel/analytics/events?event_name=home_filters_opened/analytics/events?event_name=booking_created
Dashboards / visualizations created
A. Screen Time Dashboard
Business question:
Where in the app do students spend the most time browsing?
Visuals:
- summary table by screen
- bar chart of total seconds per screen
B. Filters Usage Dashboard
Business question:
Which filters (place, time, utilities, close to me) are applied most often?
Visuals:
- filter usage summary
- bar chart of most frequently applied filters
C. Booking Tags Dashboard
Business question:
Which are the most used tags in bookings by building and time window?
Visuals:
- grouped bar chart by building
- tag legend
- time-window slicer
D. Schedule Import Funnel Dashboard
Business question:
What is the most common way users upload/import their schedule (ICS/PDF file, manual), and which method has the highest drop-off by step?
Visuals:
- drop-off percentage by step and method
- users started vs. completed by method
- funnel / progression by step
- cards for:
- most common method
- method with highest drop-off
6. Architecture Rationale
This pipeline was chosen because it matched the current project constraints and implementation status.
Benefits
- fast to implement
- easy to connect with Power BI
- no need for a separate analytics warehouse
- uses the existing backend and database
- supports multiple business questions with the same stack
Trade-offs
- not a full enterprise analytics architecture
- no event streaming
- no dedicated data warehouse
- no scheduled ETL layer
- transformations are partly embedded in backend analytics endpoints
Even with these trade-offs, the solution is valid for the project because it supports real event collection, real backend aggregation, and real BI visualization.