Typesense Search - bcgov/eagle-dev-guides GitHub Wiki
Full-text search for eagle-public powered by Typesense. Two indexes are maintained:
| Index | Collection alias | Frontend component |
|---|---|---|
| Projects | projects |
TypesenseProjectSearchComponent |
| Documents | documents |
TypesenseDocumentSearchComponent |
flowchart TB
subgraph eagle-api["eagle-api (OpenShift)"]
MongoDB[("MongoDB\nepic collection")]
FullSync["typesense-sync/full-sync.js\nNightly full rebuild"]
ChangeStream["typesense-sync/index.js\nChange stream listener"]
Config["typesense-sync/config.js\nShared buildMongoUri()"]
end
subgraph Typesense["typesense (Helm chart)"]
TSNode["Typesense node\nPort 8108"]
end
subgraph eagle-public["eagle-public (browser)"]
ProjWrapper["ProjectListWrapperComponent\nHealth check + fallback"]
DocWrapper["SearchWrapperComponent\nHealth check + fallback"]
ProjSearch["TypesenseProjectSearchComponent"]
DocSearch["TypesenseDocumentSearchComponent"]
Service["TypesenseService\nShared client + stale cache"]
end
MongoDB --> FullSync
MongoDB --> ChangeStream
FullSync -->|projects + documents| TSNode
ChangeStream -->|projects + documents| TSNode
ProjWrapper -->|GET /search-api/health| TSNode
DocWrapper -->|GET /search-api/health| TSNode
ProjSearch --> Service
DocSearch --> Service
Service -->|search requests| TSNode
Runs as a Kubernetes CronJob in the same namespace. Rebuilds the Typesense projects
collection from scratch each night:
- Creates a new collection with a timestamped alias (e.g.
projects_20260407_0200) - Streams all
Projectdocuments from MongoDB - Transforms each document via
transform.js(flattens nested fields, converts centroid) - Switches the
projectsalias to the new collection atomically - Drops the old collection
Why full sync: Avoids managing partial updates for schema migrations or bulk field changes. The alias swap is zero-downtime.
Runs as a Deployment (always-on). Consumes a MongoDB replica set change stream:
-
insert→ upsert document in Typesense -
update/replace→ upsert transformed document -
delete→ remove from Typesense
Requires MongoDB in replica set mode (enforced by the eagle-api Helm chart's
mongodb-deployment.yaml).
buildMongoUri() is the single source of truth for constructing the MongoDB connection
string. Both full-sync.js and index.js import from this module.
| Environment Variable | Default | Description |
|---|---|---|
MONGODB_USERNAME |
— | MongoDB user |
MONGODB_PASSWORD |
— | MongoDB password |
MONGODB_DATABASE |
epic |
Database name |
MONGODB_HOST |
localhost |
Hostname |
MONGODB_PORT |
27017 |
Port |
MONGODB_AUTHSOURCE |
admin |
Auth source database |
MONGODB_DIRECT |
— | Set true for directConnection (port-forward) |
transformProject(doc, listLookup) converts a MongoDB Project document to a flat Typesense record.
Key transformations:
- Reads from the legislation-year sub-object (
doc.legislation_2018ordoc.legislation_2002) -
centroidextracted viaparseCentroid(c)— validates BC-bounds coordinates - ObjectId refs (e.g.
currentPhaseName,eacDecision) resolved to display names vialistLookup - Dates stored as Unix timestamps (int64) for numeric range filtering
transformDocument(doc, listLookup, projectLookup) converts a MongoDB Document record:
- ObjectId refs (
type,milestone,documentAuthorType,projectPhase) resolved to display names vialistLookup -
legislation(int32) stored as integer year (2002, 2018, etc.) for faceting -
projectObjectId resolved toprojectNameviaprojectLookup - Unresolvable ObjectId refs (deleted List items) are suppressed — regex
/^[0-9a-f]{24}$/idetects orphaned IDs and returnsundefinedinstead of indexing raw hex strings
const listLookup = await buildListLookup(db);
// → Map<string, string> (ObjectId hex → display name)
// Built from all List + Organization documents in the epic collectionBoth transforms accept listLookup as a parameter. The change-stream listener rebuilds it whenever a List or Organization document changes.
Safety guard: full-sync.js aborts if listLookup.size < MIN_LOOKUP_SIZE (currently 50). This prevents overwriting good data with raw IDs if MongoDB connectivity is degraded.
const projectLookup = await buildProjectLookup(db);
// → Map<string, string> (project ObjectId hex → project name)
// Used only by transformDocument to populate projectNameparseCentroid(c) is a named function (extracted from an IIFE in the original)
for testability:
function parseCentroid(c) {
if (!Array.isArray(c) || c.length < 2) return {};
const lng = parseFloat(c[0]);
const lat = parseFloat(c[1]);
if (isNaN(lng) || isNaN(lat) || lat < 48 || lat > 60 || lng < -139 || lng > -114) return {};
return { centroid: [lng, lat] };
}Typesense is deployed as a separate Helm chart in helm/typesense/. It runs in the same
OpenShift namespace as eagle-api.
# Deploy / upgrade Typesense
helm upgrade --install typesense ./helm/typesense \
-f ./helm/typesense/values-{env}.yaml \
--set secrets.apiKey=<key>
# Deploy eagle-api with Typesense env vars
helm upgrade --install eagle-api ./helm/eagle-api \
-f ./helm/eagle-api/values-{env}.yaml \
--set secrets.typesenseApiKey=<key>The typesense-sync CronJob and Deployment are included in the helm/eagle-api chart and
share the same secrets.
Both ProjectListWrapperComponent (projects) and SearchWrapperComponent (documents) share the same pattern:
- Check
TYPESENSE_ENABLEDconfig flag andTYPESENSE_SEARCH_KEY - GET
/search-api/healthwith a 3 s timeout viafirstValueFrom(...pipe(timeout(3000))) - If healthy → render Typesense component; if not → fallback to legacy component
Both use a static cachedResult so only one health check fires per page session (survives component destroy/recreate, cleared only on page reload).
TypesenseService (src/app/services/typesense.service.ts) is a singleton (providedIn: 'root') that:
- Caches the Typesense HTTP connection pool — prevents cold-start per navigation
- Stores the last search results per index alias for stale-while-revalidate
-
Stores the last facet items per
(index, attribute)pair - Provides direct REST methods for data fetching outside of InstantSearch widgets
// Retrieve stale hits on remount (shown instantly before fresh search arrives)
const staleHits = this.typesense.getLastHits('documents');
this.hits.set(staleHits);
// After fresh search: update cache
this.typesense.setLastHits('documents', newHits);
// Facets
const cached = this.typesense.getLastFacets('documents', 'type');
// After refinement list fires:
this.typesense.setLastFacets('documents', 'type', items);In addition to the InstantSearch adapter (used for the unified search and document search
widgets), TypesenseService exposes direct REST methods for data that doesn't need faceting
or pagination widgets. These use Angular's HttpClient via a private searchCollection()
helper rather than the typesense-instantsearch-adapter.
| Method | Collection | Used by | Notes |
|---|---|---|---|
getTopActivities(n) |
activities |
HomeComponent |
Home page recent activities |
getAllProjects() |
projects |
ProjectsComponent |
Full project list for map; paginates at 250/page |
getFeaturedDocuments(projId) |
documents |
(removed — MongoDB used instead) | MongoDB is authoritative for isFeatured
|
getProjectActivities(projId, page, size, sort, kw) |
activities |
ProjectActivitesComponent |
Activities tab with keyword search |
getProjectSuggestions(query) |
projects |
ProjlistFiltersComponent |
Autocomplete dropdown on map filter |
When to use REST methods vs InstantSearch:
- REST methods: Fetching data that renders as a table or list with no faceting widgets (e.g. project activities, featured docs, home page feed)
- InstantSearch adapter: Faceted search with refinement lists, pagination widgets, and highlight rendering (unified search, document search)
searchCollection private helper:
// All REST methods delegate to this — builds the Typesense URL from config
private searchCollection(collection: string, params: Record<string, string>): Observable<any> {
const config = this.configService.config();
const apiKey = config.TYPESENSE_SEARCH_KEY || '';
const baseUrl = this.buildSearchUrl(); // handles path-only vs absolute URL
return this.http.get<any>(
`${baseUrl}/collections/${collection}/documents/search?${new URLSearchParams(params)}`,
{ headers: { 'X-TYPESENSE-API-KEY': apiKey } }
);
}Date fields: Typesense stores dates as Unix seconds (int64). multiply by 1000 before
passing to Angular's DatePipe or new Date().
getProjectSuggestions(query) drives the autocomplete dropdown on the map/list filter
input. It fetches up to 250 matching project IDs for accurate marker filtering, but
the dropdown shows the top 5. The matched IDs are stored in FilterStateService via
updateSuggestionIds() so ProjectFilterService can filter map markers using
Typesense's fuzzy match rather than a simple substring comparison.
// ProjlistFiltersComponent
this.search$.pipe(
debounceTime(120),
distinctUntilChanged(),
switchMap(q => this.typesenseService.getProjectSuggestions(q).pipe(catchError(() => of([])))),
).subscribe(results => {
this.suggestions.set(results);
this.filterState.updateSuggestionIds(results.map(r => r.id));
});FilterStateService.typesenseSuggestionIdsFilter() returns:
-
null— no active text search (substring fallback in use) -
string[]— Typesense results;ProjectFilterServicechecksids.includes(project._id)
Both document and project search use a FACET_DEFS array to drive all facet panels without duplication:
interface FacetDef {
attribute: string; // Typesense field name
heading: string; // Sidebar panel heading
listType: string; // MongoDB List._schemaName type for legislation grouping
sorter: (a: DisplayItem, b: DisplayItem) => number;
}
const FACET_DEFS: FacetDef[] = [
{ attribute: 'type', heading: 'Type', listType: 'doctype', sorter: sortByName },
{ attribute: 'milestone', heading: 'Milestone', listType: 'label', sorter: sortByName },
{ attribute: 'documentAuthorType', heading: 'Author Type', listType: 'author', sorter: sortByName },
{ attribute: 'projectPhase', heading: 'Project Phase', listType: 'projectPhase', sorter: sortByPhaseOrder },
];Each FacetDef drives:
- A
WritableSignal<DisplayItem[]>for items - A
WritableSignal<Map<string, number>>for legislation year lookups - A
Signal<LegislationGroup[]>computed that callsgroupByLegislation(..., f.sorter) - A
connectRefinementListwidget registration - A single
@for (facet of facetDefs; ...)loop in the template
Facet options are grouped by the legislation field on the corresponding MongoDB List item:
2002 Act Terms
├── Application Review
├── Evaluation
└── ...
2018 Act Terms
├── Application Development and Review
└── ...
(ungrouped — year 0 or unresolved)
groupByLegislation(items, lookup, sorter) buckets items by year using a Map<string, number> lookup built from configService.lists. Groups render in LEG_ORDER = [2002, 2018, 1996] order.
sortByPhaseOrder sorts phases by canonical PHASE_ORDER (2002 Act phases first, then 2018), matching the mongo list listOrder field. Unknown phases fall back to alphabetical at the end. The same PHASE_ORDER array is defined in both TypesenseProjectSearchComponent and TypesenseDocumentSearchComponent.
When connectRefinementList fires with an empty items array on initial mount (before the first search response), skip the update if the master map already has cached data:
connectRefinementList(renderOptions => {
this.refineFns[f.attribute] = renderOptions.refine;
if (renderOptions.items.length === 0 && this.facetMasters[f.attribute].size > 0) return;
this.zone.run(() => {
this.facetItems[f.attribute].set(mergeItems(this.facetMasters[f.attribute], renderOptions.items, f.sorter));
this.typesense.setLastFacets(INDEX_NAME, f.attribute, renderOptions.items);
});
})({ attribute: f.attribute, operator: 'or', limit: 100 })| Variable | Set In | Description |
|---|---|---|
TYPESENSE_SEARCH_HOST |
/api/config |
URL/path for browser → Typesense. Use /search-api in production (routed through rproxy). Use http://localhost:8108 for local dev. |
TYPESENSE_SEARCH_KEY |
/api/config |
Read-only search API key (safe to expose to browser) |
TYPESENSE_API_KEY |
eagle-api env | Admin API key for typesense-sync writes |
TYPESENSE_HOST |
eagle-api env | Internal Kubernetes hostname for typesense-sync |
TYPESENSE_PORT |
eagle-api env | Internal port (default 8108) |
Typesense does not run in the local docker-compose.yml — it lives only on the dev cluster.
Use oc port-forward to reach it locally:
# Forward dev cluster Typesense to localhost:8108
oc port-forward svc/typesense-typesense 8108:8108 -n 6cdc9e-dev
# proxy.conf.js routes /search-api → localhost:8108 (already configured)Do not route /search-api through the dev cluster's rproxy — the rproxy enforces
HTTP Basic Auth on all locations, which causes a browser login dialog.
The /search-api proxy target is pointed at the dev cluster rproxy instead of a direct
port-forward. Set TYPESENSE_SEARCH_HOST to /search-api in production (rproxy has no
basic auth there) or use a port-forward in local dev.
The indexed document has a type/milestone/projectPhase/documentAuthorType value pointing to a deleted List item (orphaned reference). The resolve() function in transform.js suppresses any 24-character hex string not found in listLookup — raw IDs indicate the lookup was stale or the document was indexed before the fix.
To repair:
- Verify
listLookupis healthy: checkList lookup loaded: N entriesin the full-sync log — should be ≥ 50 - Re-index the affected documents or run a full sync
TypesenseService.lastFacetsCache is populated per navigation. If blank, the
port-forward may have died. Restart with oc port-forward svc/typesense-typesense 8108:8108 -n 6cdc9e-dev.
The change stream requires a MongoDB replica set. Verify the replica set is initialized:
oc exec deploy/eagle-api -n 6cdc9e-{env} -- mongo --eval "rs.status()"# Trigger the CronJob manually
oc create job typesense-fullsync-manual \
--from=cronjob/typesense-full-sync \
-n 6cdc9e-{env}
# Watch progress
oc logs -f job/typesense-fullsync-manual -n 6cdc9e-{env}Search results can be ranked by a 30-day rolling popularity score derived from real user interactions tracked via penguin-analytics. This keeps stale or historic documents from permanently outranking newer content.
flowchart LR
Browser["eagle-public\nbrowser"] -- "Search Result Clicked\nSearch Download Clicked" --> Penguin["penguin-analytics\nTimescaleDB"]
Penguin -- "nightly SQL query\n(30-day window)" --> Script["popularity-sync.js\n3 AM CronJob"]
Script -- "PATCH popularity field\naction: update" --> Typesense["Typesense\nprojects / documents"]
Typesense -- "sort_by: popularity:desc" --> Browser
-
eagle-public sends
Search Result ClickedandSearch Download Clickedevents to penguin-analytics via the existing getanalytics.io pipeline. -
popularity-sync.jsruns nightly at 3 AM (one hour after the full re-index at 2 AM). - It queries penguin for the past 30 days of events and computes a weighted score per document/project.
- It batch-patches the live Typesense collections using
action: update— only thepopularityfield is changed; all other fields remain untouched.
| Event | Weight | Rationale |
|---|---|---|
Search Download Clicked |
3 | Strong intent — user committed to downloading |
Search Result Clicked (project or document) |
1 | Mild interest |
Scores are the sum of weighted events over the 30-day window. A document
downloaded 10 times and clicked 5 times gets popularity = 35.
Add popularity:desc to the sort_by parameter in a Typesense search request to rank popular results higher:
// In TypesenseService or the search widget params
sort_by: 'popularity:desc,_text_match:desc,updatedDate:desc'The field is int32, optional — documents with no popularity data sort to the
bottom (treated as null < 0), so new content is not penalised.
Prerequisites:
- penguin-analytics must be deployed in the same OpenShift namespace (or accessible via NetworkPolicy)
- A
penguin-analytics-dbsecret must exist with the DB credentials
# Create the secret (per environment)
oc create secret generic penguin-analytics-db \
--from-literal=PENGUIN_DB_HOST=penguin-analytics-database \
--from-literal=PENGUIN_DB_PORT=5432 \
--from-literal=PENGUIN_DB_NAME=analytics \
--from-literal=PENGUIN_DB_USER=analytics_user \
--from-literal=PENGUIN_DB_PASSWORD=<password> \
-n 6cdc9e-{env}Enable in Helm values:
# helm/typesense/values-{env}.yaml
popularity:
enabled: true
windowDays: 30 # Tune as neededRun manually:
# Trigger the CronJob manually
oc create job typesense-popularity-manual \
--from=cronjob/typesense-popularity \
-n 6cdc9e-{env}
# Watch progress
oc logs -f job/typesense-popularity-manual -n 6cdc9e-{env}Expected output:
Starting popularity sync: 2026-04-20T03:00:01.234Z
Window: 30 days
Querying penguin-analytics for click scores...
Found 142 scored documents/projects
Projects to update: 38
Documents to update: 104
Patching "projects"... 38 patched, 0 failed
Patching "documents"... 104 patched, 0 failed
Popularity sync complete: 2026-04-20T03:00:04.891Z
| Variable | Default | Description |
|---|---|---|
POPULARITY_WINDOW_DAYS |
30 |
Rolling window for score aggregation |
POPULARITY_BATCH_SIZE |
100 |
Documents per Typesense update request |
No popularity data found — Expected on first run. Events accumulate in penguin over time; the first meaningful scores appear after 24–48 hours of real traffic.
Job fails with connection error — Verify the penguin-analytics-db secret
exists and the penguin-analytics-database service is reachable from the
6cdc9e-{env} namespace. Check NetworkPolicy allows egress to the penguin namespace.
Popularity field missing after re-index — The full re-index (2 AM) creates
a fresh collection with no popularity data. The popularity job (3 AM) repopulates
it. There is a ~1 hour window each night where scores are absent. If this matters,
reduce reindex.schedule to 1 AM or increase the gap.
Force re-patch without re-indexing:
oc create job typesense-popularity-manual \
--from=cronjob/typesense-popularity \
-n 6cdc9e-{env}