Analytics Architecture - bcgov/eagle-dev-guides GitHub Wiki
The EPIC platform integrates with Penguin Analytics, a dedicated microservice for capturing user interaction events and generating insights through time-series analysis. This page covers EPIC-specific integration patterns. For complete Penguin Analytics architecture, see the Penguin Analytics wiki.
graph TB
User["User Interactions<br/>eagle-admin / eagle-public"]
Rproxy["rproxy (nginx)<br/>Routing layer"]
EagleAPI["eagle-api<br/>Port 3000<br/>MongoDB"]
Analytics["penguin-analytics-api<br/>Port 3000<br/>TimescaleDB"]
Metabase["Metabase<br/>Analytics dashboards"]
User -->|"/api/*"| Rproxy
User -->|"/analytics"| Rproxy
Rproxy -->|proxy_pass| EagleAPI
Rproxy -->|proxy_pass| Analytics
Analytics -->|SQL queries| Metabase
style User fill:#e3f2fd
style Rproxy fill:#f3e5f5
style EagleAPI fill:#fff4e6
style Analytics fill:#e8f5e9
style Metabase fill:#fce4ec
In local development, proxy.conf.js routes both /api and /analytics to eagle-api.
Eagle-api has an /analytics Express route that proxies requests to penguin-analytics
(via ANALYTICS_SERVICE_URL, default http://localhost:3001):
graph LR
Browser["Browser :4200"] -->|"/api/*"| DevServer["Angular Dev Server<br/>proxy.conf.js"]
Browser -->|"/analytics"| DevServer
DevServer -->|proxy| EagleAPI["eagle-api :3000"]
EagleAPI -->|"/analytics proxy"| PA["penguin-analytics :3001"]
style Browser fill:#e3f2fd
style DevServer fill:#f3e5f5
style EagleAPI fill:#fff4e6
style PA fill:#e8f5e9
1. Microservice Independence
- Penguin Analytics is a completely separate service:
- Different repository: digitalspace/penguin-analytics
- Different database: TimescaleDB (not MongoDB)
- Independent deployment lifecycle and versioning
- Independent scaling characteristics
2. No Authentication Required
- Analytics endpoints are anonymous and don't require Keycloak JWT
- eagle-public: Fully anonymous tracking (no user identification)
- eagle-admin: Tracks authenticated users but stores anonymized GUIDs
- Separating from
/apiclarifies that no auth headers are needed
3. Performance Isolation
- Analytics is write-heavy with high-volume event ingestion
- Time-series database optimized for inserts (not MongoDB's use case)
- Analytics failures should not impact core API operations
- Independent scaling based on event volume, not API traffic
4. Technology Choice
- TimescaleDB: Purpose-built for time-series data (auto partitioning, compression, fast aggregations)
- MongoDB: Document database optimized for EPIC project data (not time-series)
5. Same-Origin for Ad Blocker Bypass
- Using
/analyticson same domain prevents CORS preflight requests - Same-origin requests less likely to be blocked by ad blockers
- Browser security features work seamlessly
Prior to v2.4.1, env.js defaulted to ANALYTICS_API_URL = '/api/analytics'. This was corrected to /analytics but some clients cached the old env.js for up to 1 year. As of rproxy v1.0.5, both /analytics and /api/analytics route to penguin-analytics.
| Aspect | /api |
/analytics |
|---|---|---|
| Routing | Direct OpenShift route | Through rproxy |
| Authentication | Keycloak JWT required | No authentication |
| Service | eagle-api (Node.js) | penguin-analytics-api (Node.js) |
| Database | MongoDB | TimescaleDB (PostgreSQL) |
| Request Pattern | Read/write CRUD operations | Write-heavy event ingestion |
| Response Time | 100-500ms (database queries) | < 50ms (fire-and-forget) |
| Data Retention | Permanent (project records) | Time-series (compressed after 30 days) |
Frontend configuration in src/env.js:
window.__env = {
// ... other config
ANALYTICS_API_URL: '/analytics', // Same-origin path
ANALYTICS_TRAFFIC_TRACKING: true, // UTM params, referrer, traffic channel
ANALYTICS_ENHANCED_TRACKING: false // Browser fingerprinting (prod default)
};Traffic Source Tracking (ANALYTICS_TRAFFIC_TRACKING):
- Captures UTM parameters and referrer on first visit
- Persists in localStorage via
@analytics/original-source-plugin - Determines traffic channel:
chatbot,direct,email,internal,referral,search,social,other - Data sent with Page Viewed events:
traffic_channel,traffic_source,traffic_medium, etc.
Analytics service usage:
// Auto-tracked events:
// - Page Viewed (route changes)
// - Button Clicked (button elements)
// - Link Clicked (anchor links)
// - User Active (30-second heartbeat)
// Manual tracking for custom events:
this.analyticsService.track('Comment Submitted', {
projectId: 'abc123',
commentLength: 250
});Additional step for user identification after login:
// After Keycloak authentication
this.analyticsService.identify(user.guid, {
username: user.username,
roles: user.roles
});
// On logout
this.analyticsService.reset(); // Tracks "Session Ended" eventImportant: eagle-admin uses Keycloak for authentication but analytics tracking is still sent to unauthenticated /analytics endpoint. User context is included in event properties, not as JWT headers.
Standard events tracked across EPIC applications:
| Event | Triggered By | Key Properties |
|---|---|---|
User Identified |
Login (eagle-admin only) |
traits.username, traits.roles[]
|
Page Viewed |
Route navigation |
page_name, path, url
|
Button Clicked |
Button clicks |
button_text, section
|
Link Clicked |
Link clicks |
link_url, link_text
|
User Active |
30-second heartbeat |
is_active, seconds_since_activity
|
Session Ended |
Logout (eagle-admin only) | session_end |
Traffic source properties (included in Page Viewed when ANALYTICS_TRAFFIC_TRACKING=true):
| Property | Description | Example |
|---|---|---|
traffic_channel |
Derived channel category |
search, social, direct
|
traffic_source |
UTM source or referrer |
google, facebook
|
traffic_medium |
UTM medium |
cpc, email, organic
|
traffic_campaign |
UTM campaign | spring_sale |
traffic_content |
UTM content | banner_a |
traffic_term |
UTM term | environmental assessment |
For complete event schema documentation, see Penguin Analytics Event Schema.
Analytics data is visualized through Metabase dashboards configured per application:
eagle-admin: Staff usage patterns, feature adoption, admin activity
eagle-public: Public traffic, popular projects, search trends
Dashboards are defined in YAML configuration files:
scripts/configs/eagle-admin.yamlscripts/configs/eagle-public.yaml
For dashboard configuration patterns and Metabase setup, see Penguin Analytics Metabase Configuration.
EPIC analytics uses a two-tier privacy model that gives operators granular control over data collection:
flowchart TB
subgraph Client["Client Tier (Browser)"]
User[User visits eagle-public]
Config[Fetch /api/config]
Flag1{ANALYTICS_ENHANCED_TRACKING?}
Enhanced[Send full browser context]
Minimal[Send minimal data only]
end
subgraph Server["Server Tier (penguin-analytics)"]
Receive[Receive event]
Flag2{GEO_ENRICHMENT_ENABLED?}
HasEnhanced{Has screen_width?}
Enrich[Enrich with country/city/ISP]
Store[Store in TimescaleDB]
end
User --> Config
Config --> Flag1
Flag1 -->|true| Enhanced
Flag1 -->|false| Minimal
Enhanced --> Receive
Minimal --> Receive
Receive --> Flag2
Flag2 -->|true| HasEnhanced
Flag2 -->|false| Store
HasEnhanced -->|yes| Enrich
HasEnhanced -->|no| Store
Enrich --> Store
style Client fill:#e3f2fd
style Server fill:#e8f5e9
| Tier | Flag | Service | Controls |
|---|---|---|---|
| Client | ANALYTICS_TRAFFIC_TRACKING |
eagle-public | UTM params, referrer, traffic channel |
| Client | ANALYTICS_ENHANCED_TRACKING |
eagle-api | Browser fingerprinting: screen, device, network, timezone |
| Server | GEO_ENRICHMENT_ENABLED |
penguin-analytics | IP geolocation: country, city, ISP, ASN |
| Environment | ANALYTICS_TRAFFIC_TRACKING | ANALYTICS_ENHANCED_TRACKING | GEO_ENRICHMENT_ENABLED |
|---|---|---|---|
| Dev | true |
true |
true |
| Test | true |
true |
true |
| Prod | true |
false |
false |
Important: Production defaults are privacy-safe. Enabling enhanced tracking requires explicit approval.
When ANALYTICS_ENHANCED_TRACKING=true (client-side):
{
"url": "https://projects.eao.gov.bc.ca/p/abc123",
"title": "Project Name",
"referrer": "https://google.com",
"screen_width": 3440, "screen_height": 1440,
"viewport_width": 1720, "viewport_height": 900,
"pixel_ratio": 2, "color_depth": 24,
"platform": "Linux", "browser": "Brave", "browser_version": "120",
"mobile": false, "touch_points": 0,
"timezone": "America/Vancouver",
"language": "en-CA",
"connection_type": "4g", "connection_downlink": 10
}When GEO_ENRICHMENT_ENABLED=true (server-side, added by penguin-analytics):
{
"country": "CA", "country_name": "Canada",
"region": "BC", "city": "Victoria",
"isp": "TELUS Communications Inc.", "asn": 852,
"client_ip_hash": "a1b2c3d4..."
}Privacy mode (ANALYTICS_ENHANCED_TRACKING=false):
{
"url": "/p/abc123",
"title": "Project Name"
}penguin-analytics uses MaxMind GeoLite2 databases for IP geolocation:
Architecture: InitContainer downloads databases on pod startup
- Databases: GeoLite2-City.mmdb (54MB) + GeoLite2-ASN.mmdb (11MB)
- Startup: ~60 seconds to download and extract
- Updates: Automatic monthly (1st of month at 3am UTC)
- Privacy: Raw IPs are hashed, private IPs (192.168.x.x) skipped
Database Update Workflow:
# Manual trigger
gh workflow run update-geoip-databases.yml -f environment=dev --repo digitalspace/penguin-analytics
# Or restart pods to download fresh databases
oc rollout restart deployment/penguin-analytics-api -n 6cdc9e-dev| Application | ANALYTICS_ENHANCED_TRACKING | Behavior |
|---|---|---|
| eagle-public | Respected | Follows flag - privacy mode in prod |
| eagle-admin | Ignored | Always sends full context (staff app) |
eagle-admin always sends full browser context because it's an authenticated staff application where usage tracking is expected.
Penguin Analytics is deployed as a separate OpenShift application in the same namespace as EPIC services:
Pods:
-
penguin-analytics-api(2 replicas, with GeoIP initContainer) -
penguin-analytics-database(TimescaleDB) -
penguin-analytics-metabase(Metabase)
Routes:
-
/analytics→ penguin-analytics-api (via rproxy) - Metabase accessible at dedicated route for authorized users
For deployment procedures, see Penguin Analytics Deployment.
- API Architecture - EPIC platform architecture
- Configuration Management - ConfigService pattern
- Local Development - Local development setup
- Penguin Analytics Wiki - Complete analytics documentation