adaptive testing:related work - chunhualiao/public-docs GitHub Wiki

adaptive testing

Okay, finding mature, open-source projects that specifically combine AI-driven (LLM) question generation with adaptive testing algorithms (like IRT) is still a bit challenging, as this is a relatively new intersection. However, here's a breakdown of relevant open-source projects and tools based on the search results:

1. Established Open Source Adaptive Testing Platforms (IRT-based, non-LLM specific):

These platforms provide the core adaptive logic (IRT-based item selection, scoring) but typically expect a pre-calibrated bank of questions rather than dynamically generating them with an LLM. You could potentially integrate LLM question generation into these.

  • Concerto Platform:
    • Developed by The Psychometrics Centre at the University of Cambridge.
    • A well-regarded open-source (GPL) platform for creating various online assessments, including sophisticated IRT-based Computerized Adaptive Tests (CATs).
    • Uses the R statistical language for backend calculations, making it flexible for implementing various psychometric models.
    • Designed to be extensible, potentially allowing integration with external question sources (like an LLM API).
  • R Packages:
    • RSCAT: An R package specifically mentioned for CAT simulations and implementation. R is commonly used in psychometrics, so other relevant packages might exist.
  • OSCATS (Open Source Computerized Adaptive Testing System):
    • Mentioned by the International Association for Computerized Adaptive Testing (IACAT) as an open-source option. Finding its specific repository or current status might require further searching.

2. Open Source Projects Incorporating LLMs for Assessment/Generation (Closer to your goal):

These projects leverage LLMs in the educational assessment space, though they might not be end-to-end adaptive testing systems themselves.

  • QGen Studio:
    • Described in recent research papers (April 2025) as an adaptive question-answer generation, training, and evaluation platform using LLMs.
    • Aims to enable users to create custom QA datasets with LLMs and fine-tune models.
    • The papers state it "will be open-sourced soon."
    • An IBM/qgen-studio repository exists on GitHub under an Apache-2.0 license, suggesting an open-source release is intended or in progress. Keep an eye on this project.
  • AERA Chat:
    • Described in research papers (October 2024) as the "first open-source interactive platform explicitly designed to utilize LLMs in explainable student answer scoring."
    • Focuses on automated scoring, generating rationales for grading decisions (especially for free-text answers), and providing tools for evaluating/annotating these rationales.
    • While heavily using LLMs for the assessment part, it's not explicitly described as a full CAT engine that adaptively selects the next question based on ability estimates, but it's a significant open-source component for AI in assessment.
  • csv610/mcq_generator (GitHub):
    • A project focused specifically on generating Multiple-Choice Questions (MCQs) using LLMs (OpenAI or Ollama).
    • Allows specifying subject, difficulty, number of questions. Provides explanations and prerequisite knowledge.
    • This could serve as a component for generating questions within a larger adaptive system but isn't the adaptive engine itself.

3. Less Relevant (But Related) Open Source AI Projects:

  • Projects listed under "AI Testing Tools" (like Selenium, Appium, Robot Framework, Katalon) are primarily for software testing automation, not educational assessment.
  • Repositories like LLM-Testing/LLM4SoftwareTesting, codelion/adaptive-classifier, SakanaAI/self-adaptive-llms are research-focused on LLM applications in software testing, general classification, or LLM adaptation mechanisms, not specifically educational CAT.

In summary:

  • For the adaptive testing engine (IRT logic), Concerto is a strong, established open-source option.
  • For LLM-powered question generation, you might use libraries/projects like mcq_generator or build custom logic using LLM APIs.
  • For LLM-powered answer evaluation, AERA Chat provides an open-source framework.
  • A potentially integrated solution is QGen Studio, which aims to combine LLM QA generation and evaluation, although its full open-source release and features need monitoring.

You would likely need to combine elements from these different areas – potentially using Concerto for the adaptive framework and integrating your own LLM-based question generation and evaluation modules (perhaps inspired by or using parts of QGen Studio or AERA Chat if available/suitable).