LLM-Based Explainable Detection of LLM-Generated Code in Python Programming Courses (SIGCSE TS 2026 - Papers)

Wed 18 - Sat 21 February 2026 St. Louis, Missouri, United States

Who

Jeonghun Baek, Tetsuro Yamazaki, Akimasa Morihata, Junichiro Mori, Yoko Yamakata, Kenjiro Taura, Shigeru Chiba

Track

SIGCSE TS 2026 Papers

Abstract

In introductory programming courses, students increasingly submit code generated by large language models (LLMs) instead of solving problems independently. This growing reliance raises concerns about students’ development of programming skills. To reduce the overreliance and promote independent problem solving, we propose an explainable detection framework that predicts whether a code submission was generated by an LLM and provides an explanation for its prediction. Such explanations are crucial in educational settings, where transparency and feedback are essential. To support this, we construct a dataset of student-written and LLM-generated code, paired with explanations automatically produced using GPT-4o. We finetune several code-specialized LLMs using both binary labels (student-written or LLM-generated) and explanations. The resulting models achieve over 99% accuracy and generate informative explanations aligned with their predictions, both validated by human instructors. We also apply our detector to past programming course data, revealing a sharp increase in LLM-generated submissions—from nearly 0% in 2022A to over 40% in 2024A. We conclude with a discussion of false positives and suggest how explainable detection can be responsibly deployed in programming courses. Code and prompts will be released upon acceptance.

Jeonghun Baek

The University of Tokyo

Japan

Tetsuro Yamazaki

University of Tokyo

Japan

Akimasa Morihata