Typesense Search - bcgov/eagle-dev-guides GitHub Wiki

Typesense Search

Full-text search for eagle-public powered by Typesense. Two indexes are maintained:

Index Collection alias Frontend component
Projects projects TypesenseProjectSearchComponent
Documents documents TypesenseDocumentSearchComponent

Architecture

flowchart TB
    subgraph eagle-api["eagle-api (OpenShift)"]
        MongoDB[("MongoDB\nepic collection")]
        FullSync["typesense-sync/full-sync.js\nNightly full rebuild"]
        ChangeStream["typesense-sync/index.js\nChange stream listener"]
        Config["typesense-sync/config.js\nShared buildMongoUri()"]
    end

    subgraph Typesense["typesense (Helm chart)"]
        TSNode["Typesense node\nPort 8108"]
    end

    subgraph eagle-public["eagle-public (browser)"]
        ProjWrapper["ProjectListWrapperComponent\nHealth check + fallback"]
        DocWrapper["SearchWrapperComponent\nHealth check + fallback"]
        ProjSearch["TypesenseProjectSearchComponent"]
        DocSearch["TypesenseDocumentSearchComponent"]
        Service["TypesenseService\nShared client + stale cache"]
    end

    MongoDB --> FullSync
    MongoDB --> ChangeStream
    FullSync -->|projects + documents| TSNode
    ChangeStream -->|projects + documents| TSNode
    ProjWrapper -->|GET /search-api/health| TSNode
    DocWrapper -->|GET /search-api/health| TSNode
    ProjSearch --> Service
    DocSearch --> Service
    Service -->|search requests| TSNode
Loading

How Records Are Kept in Sync

Nightly Full Sync (typesense-sync/full-sync.js)

Runs as a Kubernetes CronJob in the same namespace. Rebuilds the Typesense projects collection from scratch each night:

  1. Creates a new collection with a timestamped alias (e.g. projects_20260407_0200)
  2. Streams all Project documents from MongoDB
  3. Transforms each document via transform.js (flattens nested fields, converts centroid)
  4. Switches the projects alias to the new collection atomically
  5. Drops the old collection

Why full sync: Avoids managing partial updates for schema migrations or bulk field changes. The alias swap is zero-downtime.

Real-Time Change Stream (typesense-sync/index.js)

Runs as a Deployment (always-on). Consumes a MongoDB replica set change stream:

  • insert → upsert document in Typesense
  • update / replace → upsert transformed document
  • delete → remove from Typesense

Requires MongoDB in replica set mode (enforced by the eagle-api Helm chart's mongodb-deployment.yaml).

Shared Config (typesense-sync/config.js)

buildMongoUri() is the single source of truth for constructing the MongoDB connection string. Both full-sync.js and index.js import from this module.

Environment Variable Default Description
MONGODB_USERNAME MongoDB user
MONGODB_PASSWORD MongoDB password
MONGODB_DATABASE epic Database name
MONGODB_HOST localhost Hostname
MONGODB_PORT 27017 Port
MONGODB_AUTHSOURCE admin Auth source database
MONGODB_DIRECT Set true for directConnection (port-forward)

Document Transform (typesense-sync/transform.js)

Project transform (transformProject)

transformProject(doc, listLookup) converts a MongoDB Project document to a flat Typesense record. Key transformations:

  • Reads from the legislation-year sub-object (doc.legislation_2018 or doc.legislation_2002)
  • centroid extracted via parseCentroid(c) — validates BC-bounds coordinates
  • ObjectId refs (e.g. currentPhaseName, eacDecision) resolved to display names via listLookup
  • Dates stored as Unix timestamps (int64) for numeric range filtering

Document transform (transformDocument)

transformDocument(doc, listLookup, projectLookup) converts a MongoDB Document record:

  • ObjectId refs (type, milestone, documentAuthorType, projectPhase) resolved to display names via listLookup
  • legislation (int32) stored as integer year (2002, 2018, etc.) for faceting
  • project ObjectId resolved to projectName via projectLookup
  • Unresolvable ObjectId refs (deleted List items) are suppressed — regex /^[0-9a-f]{24}$/i detects orphaned IDs and returns undefined instead of indexing raw hex strings

List Lookup (buildListLookup)

const listLookup = await buildListLookup(db);
// → Map<string, string>  (ObjectId hex → display name)
// Built from all List + Organization documents in the epic collection

Both transforms accept listLookup as a parameter. The change-stream listener rebuilds it whenever a List or Organization document changes.

Safety guard: full-sync.js aborts if listLookup.size < MIN_LOOKUP_SIZE (currently 50). This prevents overwriting good data with raw IDs if MongoDB connectivity is degraded.

Project Lookup (buildProjectLookup)

const projectLookup = await buildProjectLookup(db);
// → Map<string, string>  (project ObjectId hex → project name)
// Used only by transformDocument to populate projectName

parseCentroid(c) is a named function (extracted from an IIFE in the original) for testability:

function parseCentroid(c) {
  if (!Array.isArray(c) || c.length < 2) return {};
  const lng = parseFloat(c[0]);
  const lat = parseFloat(c[1]);
  if (isNaN(lng) || isNaN(lat) || lat < 48 || lat > 60 || lng < -139 || lng > -114) return {};
  return { centroid: [lng, lat] };
}

Helm Deployment

Typesense is deployed as a separate Helm chart in helm/typesense/. It runs in the same OpenShift namespace as eagle-api.

# Deploy / upgrade Typesense
helm upgrade --install typesense ./helm/typesense \
  -f ./helm/typesense/values-{env}.yaml \
  --set secrets.apiKey=<key>

# Deploy eagle-api with Typesense env vars
helm upgrade --install eagle-api ./helm/eagle-api \
  -f ./helm/eagle-api/values-{env}.yaml \
  --set secrets.typesenseApiKey=<key>

The typesense-sync CronJob and Deployment are included in the helm/eagle-api chart and share the same secrets.

eagle-public Integration

Health Check and Fallback

Both ProjectListWrapperComponent (projects) and SearchWrapperComponent (documents) share the same pattern:

  1. Check TYPESENSE_ENABLED config flag and TYPESENSE_SEARCH_KEY
  2. GET /search-api/health with a 3 s timeout via firstValueFrom(...pipe(timeout(3000)))
  3. If healthy → render Typesense component; if not → fallback to legacy component

Both use a static cachedResult so only one health check fires per page session (survives component destroy/recreate, cleared only on page reload).

TypesenseService — Shared Client + Cache

TypesenseService (src/app/services/typesense.service.ts) is a singleton (providedIn: 'root') that:

  1. Caches the Typesense HTTP connection pool — prevents cold-start per navigation
  2. Stores the last search results per index alias for stale-while-revalidate
  3. Stores the last facet items per (index, attribute) pair
  4. Provides direct REST methods for data fetching outside of InstantSearch widgets
// Retrieve stale hits on remount (shown instantly before fresh search arrives)
const staleHits = this.typesense.getLastHits('documents');
this.hits.set(staleHits);

// After fresh search: update cache
this.typesense.setLastHits('documents', newHits);

// Facets
const cached = this.typesense.getLastFacets('documents', 'type');
// After refinement list fires:
this.typesense.setLastFacets('documents', 'type', items);

TypesenseService REST Methods

In addition to the InstantSearch adapter (used for the unified search and document search widgets), TypesenseService exposes direct REST methods for data that doesn't need faceting or pagination widgets. These use Angular's HttpClient via a private searchCollection() helper rather than the typesense-instantsearch-adapter.

Method Collection Used by Notes
getTopActivities(n) activities HomeComponent Home page recent activities
getAllProjects() projects ProjectsComponent Full project list for map; paginates at 250/page
getFeaturedDocuments(projId) documents (removed — MongoDB used instead) MongoDB is authoritative for isFeatured
getProjectActivities(projId, page, size, sort, kw) activities ProjectActivitesComponent Activities tab with keyword search
getProjectSuggestions(query) projects ProjlistFiltersComponent Autocomplete dropdown on map filter

When to use REST methods vs InstantSearch:

  • REST methods: Fetching data that renders as a table or list with no faceting widgets (e.g. project activities, featured docs, home page feed)
  • InstantSearch adapter: Faceted search with refinement lists, pagination widgets, and highlight rendering (unified search, document search)

searchCollection private helper:

// All REST methods delegate to this — builds the Typesense URL from config
private searchCollection(collection: string, params: Record<string, string>): Observable<any> {
  const config = this.configService.config();
  const apiKey = config.TYPESENSE_SEARCH_KEY || '';
  const baseUrl = this.buildSearchUrl(); // handles path-only vs absolute URL
  return this.http.get<any>(
    `${baseUrl}/collections/${collection}/documents/search?${new URLSearchParams(params)}`,
    { headers: { 'X-TYPESENSE-API-KEY': apiKey } }
  );
}

Date fields: Typesense stores dates as Unix seconds (int64). multiply by 1000 before passing to Angular's DatePipe or new Date().

Project Name Autocomplete

getProjectSuggestions(query) drives the autocomplete dropdown on the map/list filter input. It fetches up to 250 matching project IDs for accurate marker filtering, but the dropdown shows the top 5. The matched IDs are stored in FilterStateService via updateSuggestionIds() so ProjectFilterService can filter map markers using Typesense's fuzzy match rather than a simple substring comparison.

// ProjlistFiltersComponent
this.search$.pipe(
  debounceTime(120),
  distinctUntilChanged(),
  switchMap(q => this.typesenseService.getProjectSuggestions(q).pipe(catchError(() => of([])))),
).subscribe(results => {
  this.suggestions.set(results);
  this.filterState.updateSuggestionIds(results.map(r => r.id));
});

FilterStateService.typesenseSuggestionIdsFilter() returns:

  • null — no active text search (substring fallback in use)
  • string[] — Typesense results; ProjectFilterService checks ids.includes(project._id)

Config-Driven Facet Pattern

Both document and project search use a FACET_DEFS array to drive all facet panels without duplication:

interface FacetDef {
  attribute: string;   // Typesense field name
  heading: string;     // Sidebar panel heading
  listType: string;    // MongoDB List._schemaName type for legislation grouping
  sorter: (a: DisplayItem, b: DisplayItem) => number;
}

const FACET_DEFS: FacetDef[] = [
  { attribute: 'type',               heading: 'Type',          listType: 'doctype',      sorter: sortByName },
  { attribute: 'milestone',          heading: 'Milestone',     listType: 'label',        sorter: sortByName },
  { attribute: 'documentAuthorType', heading: 'Author Type',   listType: 'author',       sorter: sortByName },
  { attribute: 'projectPhase',       heading: 'Project Phase', listType: 'projectPhase', sorter: sortByPhaseOrder },
];

Each FacetDef drives:

  • A WritableSignal<DisplayItem[]> for items
  • A WritableSignal<Map<string, number>> for legislation year lookups
  • A Signal<LegislationGroup[]> computed that calls groupByLegislation(..., f.sorter)
  • A connectRefinementList widget registration
  • A single @for (facet of facetDefs; ...) loop in the template

Legislation Year Grouping

Facet options are grouped by the legislation field on the corresponding MongoDB List item:

2002 Act Terms
  ├── Application Review
  ├── Evaluation
  └── ...
2018 Act Terms
  ├── Application Development and Review
  └── ...
(ungrouped — year 0 or unresolved)

groupByLegislation(items, lookup, sorter) buckets items by year using a Map<string, number> lookup built from configService.lists. Groups render in LEG_ORDER = [2002, 2018, 1996] order.

Project Phase Sort Order

sortByPhaseOrder sorts phases by canonical PHASE_ORDER (2002 Act phases first, then 2018), matching the mongo list listOrder field. Unknown phases fall back to alphabetical at the end. The same PHASE_ORDER array is defined in both TypesenseProjectSearchComponent and TypesenseDocumentSearchComponent.

Stale-While-Revalidate in Refinement Lists

When connectRefinementList fires with an empty items array on initial mount (before the first search response), skip the update if the master map already has cached data:

connectRefinementList(renderOptions => {
  this.refineFns[f.attribute] = renderOptions.refine;
  if (renderOptions.items.length === 0 && this.facetMasters[f.attribute].size > 0) return;
  this.zone.run(() => {
    this.facetItems[f.attribute].set(mergeItems(this.facetMasters[f.attribute], renderOptions.items, f.sorter));
    this.typesense.setLastFacets(INDEX_NAME, f.attribute, renderOptions.items);
  });
})({ attribute: f.attribute, operator: 'or', limit: 100 })

Configuration

Variable Set In Description
TYPESENSE_SEARCH_HOST /api/config URL/path for browser → Typesense. Use /search-api in production (routed through rproxy). Use http://localhost:8108 for local dev.
TYPESENSE_SEARCH_KEY /api/config Read-only search API key (safe to expose to browser)
TYPESENSE_API_KEY eagle-api env Admin API key for typesense-sync writes
TYPESENSE_HOST eagle-api env Internal Kubernetes hostname for typesense-sync
TYPESENSE_PORT eagle-api env Internal port (default 8108)

Local Development

Typesense does not run in the local docker-compose.yml — it lives only on the dev cluster. Use oc port-forward to reach it locally:

# Forward dev cluster Typesense to localhost:8108
oc port-forward svc/typesense-typesense 8108:8108 -n 6cdc9e-dev

# proxy.conf.js routes /search-api → localhost:8108 (already configured)

Do not route /search-api through the dev cluster's rproxy — the rproxy enforces HTTP Basic Auth on all locations, which causes a browser login dialog.

Troubleshooting

Typesense health check returns 401

The /search-api proxy target is pointed at the dev cluster rproxy instead of a direct port-forward. Set TYPESENSE_SEARCH_HOST to /search-api in production (rproxy has no basic auth there) or use a port-forward in local dev.

Facet items show raw ObjectIds (e.g. 5e1678b5671c58bddc5cdb57)

The indexed document has a type/milestone/projectPhase/documentAuthorType value pointing to a deleted List item (orphaned reference). The resolve() function in transform.js suppresses any 24-character hex string not found in listLookup — raw IDs indicate the lookup was stale or the document was indexed before the fix.

To repair:

  1. Verify listLookup is healthy: check List lookup loaded: N entries in the full-sync log — should be ≥ 50
  2. Re-index the affected documents or run a full sync

Filter facets blank on back-navigation

TypesenseService.lastFacetsCache is populated per navigation. If blank, the port-forward may have died. Restart with oc port-forward svc/typesense-typesense 8108:8108 -n 6cdc9e-dev.

Change stream not receiving updates

The change stream requires a MongoDB replica set. Verify the replica set is initialized:

oc exec deploy/eagle-api -n 6cdc9e-{env} -- mongo --eval "rs.status()"

Force a full re-index

# Trigger the CronJob manually
oc create job typesense-fullsync-manual \
  --from=cronjob/typesense-full-sync \
  -n 6cdc9e-{env}

# Watch progress
oc logs -f job/typesense-fullsync-manual -n 6cdc9e-{env}

Search Popularity Ranking

Search results can be ranked by a 30-day rolling popularity score derived from real user interactions tracked via penguin-analytics. This keeps stale or historic documents from permanently outranking newer content.

How It Works

flowchart LR
    Browser["eagle-public\nbrowser"] -- "Search Result Clicked\nSearch Download Clicked" --> Penguin["penguin-analytics\nTimescaleDB"]
    Penguin -- "nightly SQL query\n(30-day window)" --> Script["popularity-sync.js\n3 AM CronJob"]
    Script -- "PATCH popularity field\naction: update" --> Typesense["Typesense\nprojects / documents"]
    Typesense -- "sort_by: popularity:desc" --> Browser
Loading
  1. eagle-public sends Search Result Clicked and Search Download Clicked events to penguin-analytics via the existing getanalytics.io pipeline.
  2. popularity-sync.js runs nightly at 3 AM (one hour after the full re-index at 2 AM).
  3. It queries penguin for the past 30 days of events and computes a weighted score per document/project.
  4. It batch-patches the live Typesense collections using action: update — only the popularity field is changed; all other fields remain untouched.

Scoring Weights

Event Weight Rationale
Search Download Clicked 3 Strong intent — user committed to downloading
Search Result Clicked (project or document) 1 Mild interest

Scores are the sum of weighted events over the 30-day window. A document downloaded 10 times and clicked 5 times gets popularity = 35.

Using Popularity in Sort

Add popularity:desc to the sort_by parameter in a Typesense search request to rank popular results higher:

// In TypesenseService or the search widget params
sort_by: 'popularity:desc,_text_match:desc,updatedDate:desc'

The field is int32, optional — documents with no popularity data sort to the bottom (treated as null < 0), so new content is not penalised.

Deployment

Prerequisites:

  • penguin-analytics must be deployed in the same OpenShift namespace (or accessible via NetworkPolicy)
  • A penguin-analytics-db secret must exist with the DB credentials
# Create the secret (per environment)
oc create secret generic penguin-analytics-db \
  --from-literal=PENGUIN_DB_HOST=penguin-analytics-database \
  --from-literal=PENGUIN_DB_PORT=5432 \
  --from-literal=PENGUIN_DB_NAME=analytics \
  --from-literal=PENGUIN_DB_USER=analytics_user \
  --from-literal=PENGUIN_DB_PASSWORD=<password> \
  -n 6cdc9e-{env}

Enable in Helm values:

# helm/typesense/values-{env}.yaml
popularity:
  enabled: true
  windowDays: 30   # Tune as needed

Run manually:

# Trigger the CronJob manually
oc create job typesense-popularity-manual \
  --from=cronjob/typesense-popularity \
  -n 6cdc9e-{env}

# Watch progress
oc logs -f job/typesense-popularity-manual -n 6cdc9e-{env}

Expected output:

Starting popularity sync: 2026-04-20T03:00:01.234Z
  Window: 30 days
Querying penguin-analytics for click scores...
  Found 142 scored documents/projects
  Projects to update: 38
  Documents to update: 104
Patching "projects"... 38 patched, 0 failed
Patching "documents"... 104 patched, 0 failed
Popularity sync complete: 2026-04-20T03:00:04.891Z

Tuning

Variable Default Description
POPULARITY_WINDOW_DAYS 30 Rolling window for score aggregation
POPULARITY_BATCH_SIZE 100 Documents per Typesense update request

Troubleshooting

No popularity data found — Expected on first run. Events accumulate in penguin over time; the first meaningful scores appear after 24–48 hours of real traffic.

Job fails with connection error — Verify the penguin-analytics-db secret exists and the penguin-analytics-database service is reachable from the 6cdc9e-{env} namespace. Check NetworkPolicy allows egress to the penguin namespace.

Popularity field missing after re-index — The full re-index (2 AM) creates a fresh collection with no popularity data. The popularity job (3 AM) repopulates it. There is a ~1 hour window each night where scores are absent. If this matters, reduce reindex.schedule to 1 AM or increase the gap.

Force re-patch without re-indexing:

oc create job typesense-popularity-manual \
  --from=cronjob/typesense-popularity \
  -n 6cdc9e-{env}
⚠️ **GitHub.com Fallback** ⚠️