Scaling Retrieval Practice with LLM: Improving Multiple Choice Question (MCQ) Quality through Knowledge Graphs
Teaching introductory courses in computer science faces tremendous challenges given the prevalence of AI-based code auto-completion tools. One promising solution is the use of frequent retrieval practice during lectures. Multiple-choice questions (MCQs) provide an effective form of retrieval practice that promotes active learning. However, producing high-quality MCQs at scale remains a challenge for instructors.
Recent advances in large language models (LLMs) offer a potential solution by enabling the automatic generation of MCQs, providing a scalable approach for frequent retrieval practice during teaching. In this poster, we present two preliminary studies exploring the potential and limitations of this approach. First, we evaluated the effectiveness of LLM-generated MCQs in higher education programming courses. Students who practiced with LLM-generated MCQs achieved significantly higher scores on follow-up quizzes compared to periods without retrieval practice. Despite their effectiveness, raw LLM-generated MCQs exhibited numerous quality issues, including hallucinations, ambiguous distractors, trivial items, and inconsistent formatting.
Second, we investigated a knowledge graph (KG)-guided pipeline to improve MCQ quality. By structuring key concepts and their relationships prior to LLM prompting, the KG-based pipeline produced MCQs that were more relevant, integrative, and challenging. A preliminary evaluation of over 400 MCQs shows that KG-based MCQs outperform raw text-based MCQs across multiple quality dimensions, particularly in difficulty balance and conceptual synthesis.
These preliminary studies open promising directions for future research, including automatic extraction of task-oriented knowledge graphs and adaptive generation of personalized MCQs tailored to student mastery levels.