Improving LLM-Generated Educational Content: A Case Study on Prototyping, Prompt Engineering, and Evaluating a Tool for Generating Programming Problems for Data Science
This program is tentative and subject to change.
One key challenge for instructors is creating high-quality educational content, such as programming practice questions for introductory programming courses. While Large Language Models show promise for this task, their output quality can be inconsistent, and it is often unclear how to systematically improve their performance. In this experience report, we present the development process for ContentGen, a prototype tool that generates programming questions within the context of data science instructional materials. We describe our process of designing the tool and iteratively improving the tool through prompt engineering. To evaluate our changes, we constructed a dataset of examples from our courses and developed three metrics to assess the generated questions: Correctness, Contextual Fit, and Coherence. We compare three prompting strategies and find that including an automatically generated summary of the lecture content as context in the prompt substantially improves the quality of the generated questions across our metrics. A usability study with data science instructors further suggests that our final prototype is perceived as useful and effective. Our work contributes a case study of evidence-based prompt engineering for an educational tool and offers a practical approach for instructors and tool designers to evaluate and enhance LLM-based content generation.
This program is tentative and subject to change.
Thu 19 FebDisplayed time zone: Central Time (US & Canada) change
10:40 - 12:00 | |||
10:40 20mTalk | AI-Supported Grading and Rubric Refinement for Free Response Questions Papers Victor Zhao University of Illinois, Urbana-Champaign, Max Fowler University of Illinois, Yael Gertner University of Illinois Urbana-Champaign, Seth Poulsen Utah State University, Matthew West University of Illinois at Urbana-Champaign , Mariana Silva University of Illinois at Urbana Champaign | ||
11:00 20mTalk | Creating Exercises with Generative AI for Teaching Introductory Secure Programming: Are We There Yet? Papers | ||
11:20 20mTalk | Improving LLM-Generated Educational Content: A Case Study on Prototyping, Prompt Engineering, and Evaluating a Tool for Generating Programming Problems for Data Science Papers Jiaen Yu University of California, San Diego, Ylesia Wu UC San Diego, Gabriel Cha University of California San Diego, Ayush Shah University of California San Diego, Sam Lau University of California at San Diego | ||
11:40 20mTalk | Measuring Students’ Perceptions of an Autograded Scaffolding Tool for Students Performing at All Levels in an Algorithms Class Papers Yael Gertner University of Illinois Urbana-Champaign, Brad Solomon University of Illinois Urbana-Champaign, Hongxuan Chen University of Illinois at Urbana-Champaign, Eliot Robson University of Illinois Urbana-Champaign, Carl Evans University of Illinois Urbana-Champaign, Jeff Erickson University of Illinois Urbana-Champaign | ||