Agent‐Canvas Backup Explorer - RutgersGRID/VIDAHub GitHub Wiki

Canvas Backup Explorer

Overview

The Canvas Backup Explorer is an AI-powered tool developed by Rick Anderson as part of the VIDA project ecosystem. This sophisticated application provides comprehensive analysis and search capabilities for Canvas course content, enabling educators to explore, analyze, and understand their course materials through advanced AI-driven search and content processing.

Purpose

The Canvas Backup Explorer addresses several key challenges faced by educators:

  • Course Content Analysis: Provides deep insights into course structure and content
  • Comprehensive Search: Enables concept-based searching across all course materials
  • Content Backup: Creates accessible backups of Canvas course data
  • Content Organization: Provides systematic organization and cataloging of course materials
  • Course Optimization: Helps identify content gaps, duplications, and improvement opportunities

Current Features

File Processing & Analysis

  • Multi-format Support: Processes diverse file types including PDFs, images, documents, and multimedia
  • Automatic OCR: Extracts text from images and scanned documents using DocLing
  • Content Conversion: Converts all materials to searchable markdown format
  • Intelligent Filtering: Automatically skips system files (CSS, JS, XML, ZIP) while processing educational content

Advanced Search Capabilities

  • Concept-based Search: Goes beyond keyword matching to find conceptually related content
  • Multi-type Queries: Searches across text, images, and other content types simultaneously
  • Relevance Scoring: Color-coded match quality indicators (green/yellow/red)
  • Content Chunking: Breaks down large documents into searchable segments

Course Analysis

  • Content Mapping: Identifies and catalogs course elements (assignments, quizzes, discussions)
  • Media Detection: Locates charts, graphs, images, and multimedia content
  • Metadata Extraction: Finds due dates, schedules, contact information, and structural elements
  • Raw Data Access: Provides direct access to underlying course database

Technical Architecture

Core Technology Stack

  • Retrieval Augmented Generation (RAG): Advanced AI search and analysis system
  • DocLing Integration: Automated document processing and OCR
  • Search Indexing: Comprehensive content indexing for fast retrieval
  • Markdown Conversion: Standardized content format for analysis

Processing Pipeline

  1. Course Import: Upload Canvas course backup files
  2. Content Analysis: Process and categorize all course materials
  3. Index Building: Create searchable database of course content
  4. Search Interface: Enable queries across all processed content

Search Capabilities

Comprehensive Content Discovery

  • Assignment Analysis: Locate all assignments, rubrics, and grading criteria
  • Assessment Content: Find quizzes, exams, and assessment materials
  • Discussion Topics: Identify discussion prompts and collaborative activities
  • Multimedia Resources: Discover images, charts, graphs, and visual content
  • Course Structure: Map modules, lessons, and learning pathways
  • Administrative Elements: Find due dates, schedules, contact information, and policies

Advanced Search Features

  • Concept-based Queries: Search by educational concepts rather than just keywords
  • Multi-format Analysis: Simultaneous search across text, images, and documents
  • Relevance Scoring: Intelligent ranking of search results with quality indicators
  • Content Type Filtering: Focus searches on specific material types
  • Cross-reference Analysis: Find relationships between different course elements

Typical Processing Capacity

  • Course Scale: Handles comprehensive course backups (~400+ files)
  • Content Analysis: Processes ~70% of educational materials successfully
  • Format Support: Analyzes text documents, PDFs, images, and web content
  • Intelligent Filtering: Automatically excludes system files while preserving educational content

Potential Use Cases

For Faculty

  • Course Review: Comprehensive analysis of course content and structure
  • Content Discovery: Find specific elements quickly across large courses
  • Accessibility Audit: Identify content that may need accessibility improvements
  • Course Mapping: Visualize course structure and content relationships

For Instructional Designers

  • Quality Assurance: Systematic review of course materials
  • Content Gap Analysis: Identify missing or inconsistent content
  • Template Development: Extract successful patterns for reuse
  • Course Structure Analysis: Understand content organization and flow

For Administrators

  • Course Backup: Reliable backup and archival of course content
  • Content Analytics: Understand content patterns across courses
  • Resource Management: Identify commonly used resources and materials

Planned Enhancements

Short-term Development

  • Enhanced Search Accuracy: Refinement of concept-based search algorithms
  • Improved User Interface: More intuitive interface for search and analysis results
  • Content Export Features: Better tools for extracting and sharing analysis results

Future Vision

  • Chat Agent Interface: Conversational queries using Claude AI integration
  • Code Agent Capabilities: System that can write its own queries for advanced analysis
  • Course Editing: Direct editing capabilities through the course database
  • Collaborative Features: Sharing and collaboration tools for instructional teams
  • ALLY Integration: Potential future integration with accessibility analysis tools

VIDA Project Alignment

Guiding Pillars Integration

  • Faculty Empowerment & Agency: Provides powerful analysis tools without requiring technical expertise
  • Educational Pedagogy Integration: Course concept mapping supports evidence-based instructional design
  • Augmentation, Not Replacement: Enhances rather than replaces human analysis and decision-making
  • Knowledge Sharing and Community Building: Enables sharing of course analysis insights and best practices

Development Framework

  • Built using VIDA's rapid development framework
  • Follows VIDA's standardized deployment processes
  • Integrates with VIDA's AI service pipelines
  • Maintains VIDA's accessibility and usability standards

Current Status

Development Phase: Active exploration and refinement
Primary Developer: Rick Anderson
Status: Core functionality implemented, search capabilities being refined
Next Steps: Collaboration with instructional designers to define use cases and requirements

Getting Started

Prerequisites

  • Canvas course backup file
  • Access to VIDA development environment
  • Understanding of course content structure

Basic Workflow

  1. Export Course: Download Canvas course backup
  2. Upload to Explorer: Import course backup into the tool
  3. Processing: Allow system to analyze and index content (~284 files typically processed)
  4. Search & Analyze: Use search interface to explore course content
  5. Review Results: Analyze findings with color-coded relevance scoring

Integration Opportunities

Canvas LMS

  • Direct Integration: Potential for direct Canvas API integration
  • Workflow Integration: Seamless integration with Canvas course development workflow
  • Real-time Analysis: Live analysis of course content as it's developed

Future Accessibility Tools

  • ALLY Integration: Potential future integration with ALLY accessibility reports for enhanced analysis
  • Accessibility Auditing: Could support accessibility analysis workflows in future iterations
  • Compliance Reporting: Potential for automated accessibility compliance documentation

Support and Documentation

Resources

  • Technical documentation available in VIDA development wiki
  • Integration guides for ALLY and Canvas systems
  • Best practices for course analysis and optimization

Collaboration

  • Active development with input from instructional designers
  • Collaboration opportunities with Dena and ID team
  • Feedback integration for continuous improvement

Contact

  • Primary Developer: Rick Anderson
  • Project Integration: VIDA Project Team
  • Instructional Design Consultation: Available through VIDA project channels

Technical Notes

File Processing Statistics

  • Supported Formats: PDF, DOC, DOCX, images (JPG, PNG, GIF), HTML, TXT
  • Automatically Skipped: MP3, CSS, JS, XML, ZIP, system files
  • Processing Success Rate: ~69% of files successfully analyzed
  • Search Index: Full-text and concept-based indexing

Performance Characteristics

  • Processing Time: Varies based on course size and content complexity
  • Search Speed: Near-instantaneous search results
  • Storage Requirements: Efficient storage with markdown conversion
  • Scalability: Designed to handle large course collections

Last updated: June 2025
For questions or collaboration opportunities, contact the VIDA project team or Rick Anderson directly.