edovo search issues - DE4II/advocacy-tools GitHub Wiki

Improving Edovo's Information Retrieval System: Technical Analysis & Recommendations

Introduction

Edovo’s mission is to deliver accessible educational, vocational, and personal development resources to incarcerated populations. For many users, Edovo is the best—or only—source of information available. Unfortunately, the platform’s current Information Retrieval (IR) system suffers from significant design and implementation flaws that hinder the discoverability of content and the usability of the search experience.

This paper analyzes the primary shortcomings of Edovo’s IR system, explains their practical impact on content discovery, and offers detailed, implementable recommendations.


1. Limited Index Scope (Title & Description Only)

Current Issue:
The IR index contains references only to the title and manually entered description of each document. Neither the full text of documents nor their metadata (e.g., author, creation date, tags) is indexed.

Impact:

  • Users cannot locate relevant documents unless their search term appears in the title or description.
  • High-value content that contains relevant information in its body text remains invisible in search results.
  • Metadata such as subject area, education level, or source organization could significantly enhance retrieval precision, but is unused.

Recommendation:

  • Implement full-text indexing using tools such as Elasticsearch, Solr, or Meilisearch.
  • Include structured metadata indexing (e.g., author, topic, publication date, difficulty level) to enable faceted search and filtering.
  • Use analyzers and tokenizers optimized for the reading level and linguistic patterns of the incarcerated population.

2. Manually Entered Descriptions

Current Issue:
All descriptions are entered by content curators or staff, with no automated enrichment or extraction from document text.

Impact:

  • High labor cost and variability in quality across descriptions.
  • Potential omission of key terms users might search for, making documents undiscoverable.
  • Inconsistent coverage for large document sets.

Recommendation:

  • Deploy automated description generation using natural language processing (NLP) to extract summaries from the document body.
  • Allow manual editing for quality assurance, but ensure every document has at least a machine-generated fallback description.

3. Lack of SEO Best Practices in Descriptions

Current Issue:
Descriptions do not consistently follow search engine optimization (SEO) best practices such as keyword inclusion, readability, and scannability.

Impact:

  • Search engine rankings within Edovo’s platform are weaker than they could be.
  • Critical keywords may be absent, reducing recall.
  • Users may overlook relevant results due to vague or uninformative snippets.

Recommendation:

  • Establish internal SEO guidelines for descriptions, adapted for prison education contexts.
  • Integrate a description quality checker that flags missing keywords, low word counts, or vague phrasing.

4. Missing, Brief, or Generic Descriptions

Current Issue:
Some documents lack descriptions entirely. Others have overly short or generic descriptions like “Learn more” or “Educational resource.”

Impact:

  • Reduces the IR system’s ability to match queries with relevant documents.
  • Diminishes user confidence in result relevance.
  • Creates user friction—more clicks required to identify useful resources.

Recommendation:

  • Require a minimum description length and specificity.
  • Use NLP-based content summarization for automatic fallbacks.
  • Include key concepts, terms, and subject matter in the description.

5. Suboptimal Ranking Algorithm

Current Issue:
The IR algorithm sometimes ranks documents with near-spellings (fuzzy matches) above those with exact spellings.

Impact:

  • Users searching for a precise term may see irrelevant or less-relevant results above exact matches.
  • Increases time to discovery and can frustrate users.

Recommendation:

  • Modify the ranking function to prioritize exact matches over fuzzy matches.
  • Apply fuzzy matching only when exact matches are absent or below a certain relevance threshold.
  • Use weighted scoring to balance precision and recall.

6. Minimal Result Snippets

Current Issue:
Result snippets contain only:

  • Document name (or portion thereof)
  • Image
  • Type
  • Visited status

They exclude descriptions or any preview text.

Impact:

  • Users lack context to evaluate results before clicking.
  • Increases unnecessary page loads and user frustration.
  • Reduces the perceived richness of the search experience.

Recommendation:

  • Display search term–highlighted snippets from either the description or the relevant section of the full text.
  • Include metadata indicators (e.g., topic, reading level).

7. No Sorting or Filtering in SERP

Current Issue:
Users cannot sort by relevance, date, or popularity, nor can they filter by topic, difficulty, or document type.

Impact:

  • Users with specific needs (e.g., “newest vocational training material”) cannot efficiently narrow results.
  • All users must manually sift through all results regardless of relevance.

Recommendation:

  • Add sorting controls (e.g., relevance, date, alphabetical).
  • Implement filters/facets for metadata such as subject, publisher, content type, and difficulty.

8. Inefficient Infinite Scrolling

Current Issue:
The SERP uses a "load more" button that appends 10 results at a time.

Impact:

  • Slows browsing for users seeking many results.
  • Creates additional clicks and interrupts browsing flow.

Recommendation:

  • Use pagination for predictable navigation.

9. Autoscroll Reset Behavior

Current Issue:
When new results are appended, the SERP scrolls back to the top, forcing users to manually scroll past already-viewed results.

Impact:

  • Disrupts cognitive flow.
  • Significantly increases time to reach new content.
  • Particularly frustrating on devices with slower scrolling.

Recommendation:

  • Maintain scroll position when appending results.
  • Ensure new results load in-place below the existing content.

10. SERP Context Loss Between Views

Current Issue:
The SERP does not retain context if the user navigates away and then returns (e.g., from a document view back to results).

Impact:

  • Users must re-run their search and scroll through results again to find their place.
  • Time-consuming and discouraging for deeper exploration.

Recommendation:

  • Persist search state (query, filters, scroll position) in the URL or local storage.
  • Implement “back to results” navigation that restores state seamlessly.

Conclusion

Edovo’s current IR system design significantly limits the discoverability of its content, reducing both the usability of the platform and its alignment with Edovo’s mission. By expanding indexing capabilities, enforcing content quality standards, enhancing ranking logic, and improving the SERP interface, Edovo can greatly improve the user experience for incarcerated learners.

The recommended changes will:

  • Increase recall and precision in search results.
  • Enhance user confidence in result relevance.
  • Reduce cognitive and navigational friction in exploring content.
  • Maximize the educational impact of Edovo’s platform in correctional environments.