Benchmarking AI Tools for Software Engineering Education: Insights into Design, Implementation, and Testing
As generative AI (Gen AI) tools reshape software engineering (SE) workflows, educators are exploring how to meaningfully integrate them into computing education. This experience report presents a structured benchmarking of widely used AI tools—such as GitHub Copilot, GPT-4, Codeium, Claude 3.5, Gemini 1.5, Supermaven, TabNine, Testim, Postman, Eraser.io, and Lucidchart AI—across key SE phases: design, implementation, debugging, and testing. Tools were selected based on industry relevance, accessibility for students, and alignment with common SE tasks. Through controlled experiments conducted by five AI-experienced evaluators with matched exposure levels, we assessed tool performance using standardized prompts, counterbalanced task roles, and a range of proxy metrics—including prompt iterations, task completion time, human correction burden, hallucination frequency, output accuracy, and cross-file consistency—to capture both cognitive load and tool limitations. While AI tools accelerated tasks such as boilerplate generation and UML sketching, they exhibited challenges in test coverage quality, cross-file coherence, and reliability under complex prompts. We discuss educational implications, including managing cognitive load, aligning tools with task types, and explicitly teaching prompt refinement and verification strategies. The paper offers actionable guidance for instructors, curriculum-ready artifacts, and a roadmap for scaling AI integration in SE classrooms, while also noting key limitations to support replication and contextual adoption.
Fri 20 FebDisplayed time zone: Central Time (US & Canada) change
15:40 - 17:00 | |||
15:40 20mTalk | Teaching Software Documentation through an Asynchronous Module: An Experience ReportGlobal Papers | ||
16:00 20mTalk | A Framework to Detect, Classify, and Prioritise Student Quality DefectsGlobal Papers Shiman Cui The University of Auckland, Paul Denny The University of Auckland, Andrew Luxton-Reilly The University of Auckland | ||
16:20 20mTalk | Turning Insight into Action: Evaluating Targeted Interventions for a Software Engineering Course Informed by Student Reflections Papers Sandra Wiktor University of North Carolina at Charlotte, Mohsen Dorodchi University of North Carolina Charlotte | ||
16:40 20mTalk | Benchmarking AI Tools for Software Engineering Education: Insights into Design, Implementation, and Testing Papers Nimisha Roy Georgia Institute of Technology, Oleksandr Horielko Georgia Institute of Technology, Fisayo Omojokun Georgia Institute of Technology | ||