With the rapid surge of generative AI, many tools have been introduced, such as Google’s Gemini and OpenAI’s GPT-4, with the well-intentioned goal of supporting programmers [5, 8]. These tools can be used by professional programmers to help write code efficiently as well as support debugging and testing; however, we recently began to notice an increase in the number of novice programmers who become highly dependent on Large Language Models (LLMs) to code for them rather than using LLMs as a learning tool [2, 10]. In our CS1 course, approximately 10-15% of the students (out of~350) were cited for academic misconduct due to direct plagiarism from LLMs, many of which performed poorly due to an over-reliance on generative AI.
To address this, we developed a machine learning-based tool to detect AI-generated code. The tool utilized datasets consisting of thousands of student submissions (in C++) from introductory programming courses and we created an equal number of AI-generated solutions using carefully curated prompts based on the same programming prompts. We trained traditional ML models (Random Forest, XGBoost, etc.) on a labeled dataset, and our best-performing model achieved high precision and recall. Notably, the models remained robust even when trained with noisy data that included AI-generated samples. Our goal is to provide the community with a model that can be customized to any course program to encourage early detection and intervention of plagiarized code generated by LLMs.