Executable Exams in the Era of Generative AI: Revisiting Taxonomy, Implementation, and ProspectsGlobal
Executable exams assessments where students write code in development environments using computers with digital validation, offer a format more aligned with actual programming practice than traditional paper-based methods. Our previous work established a comprehensive taxonomy characterizing executable exam aspects including timing, feedback mechanisms, submission policies, resources, proctoring, and grading. However, the emergence of powerful generative AI tools like ChatGPT and GitHub Copilot has fundamentally transformed programming education and assessment. These tools can generate complete solutions, explain code, provide debugging assistance, and offer alternative approaches based on natural language descriptions capabilities that directly challenge traditional executable exam designs. Studies demonstrate that large language models correctly solve most introductory programming problems, making conventional assessment methods particularly vulnerable. This work revisits our original taxonomy through the lens of generative AI, examining how each characteristic must adapt to this new reality. We introduce two critical new characteristics: generative AI tool usage (spanning unrestricted, limited, filtered, and restricted approaches) and problem design in the generative AI era (encompassing AI-resistant, AI-accepting, and AI-cooperative question types). Two case studies illustrate practical implementations: a hybrid course maintaining no-AI policies with minimal changes, and an on-campus course adopting AI-resistant problem design. Survey data from CS educators reveal that while most prohibit generative AI during exams, they embrace it for pedagogical purposes. This updated taxonomy provides educators with frameworks to maintain assessment validity while acknowledging the transformative impact of AI on programming education.